Home | Download | Documentation & Benchmarking | Installation | Workflow

C-mii Performance

Benchmarking [223 mature miRNA sequences of Arabidopsis thaliana] [318 mature miRNA sequences of Arabidopsis thaliana]

Table 1 shows the number of remaining input sequences filtered for plus strand from each step of miRNA identification on the four datasets. The number of loaded TAIR10 cDNAs for the miRNA and target identifications was 30,707 and 33,597 out of 33,602 sequences due to different maximum sequence lengths accepted by the two identifications.

Table 1 Number of remaining input sequences from each step of miRNA and target identifications on the four datasets (Only 318 mature miRNA sequences of Arabidopsis thaliana from miRBase were used as source sequences for homolog search and target scanning)

	TAIR10 (all cDNAs)	TAIR10 (miRNAs)	*miRBase 18 (Arabidopsis* only)**	Rfam 10 (all plant RNAs except miRNAs)
MiRNA identification	33,602	176	291	16,219
Sequencing loading	30,707	176	291	15,822
Homolog search	4046	175	291	26
Primary miRNA folding	2052	174	287	0
Precursor miRNA folding	483	165	270	0
Target identification	33,602	176	291	16,219
Sequence loading	33,597	176	291	15,822
Target scanning	1884	108	183	70
MiRNA-target folding	1884	108	183	70
Target annotation	1005	0	0	0

Table 2 shows the number of true and false miRNAs identified by C-mii for the above datasets except TAIR10 miRNAs which is the subset of TAIR10 cDNAs.

Table 2 Number of TP, FP, FN, and TN of miRNA identification on the three datasets

	Number of sequences	Number of identified miRNAs	TP	FP	FN	TN
TAIR10 cDNAs (all sequences)	30,707	483	165	318	11	30,213
miRBase 18 (Arabidopsis only)	291	270	270	0	21	0
Rfam 10 (all plant RNAs except miRNAs)	15,822	0	0	0	0	15,822
Total	46,820	753	435	318	32	46,035

The positive predictive value (PPV) on TAIR10 cDNAs dataset = TP/(TP + FP) = 165/(165 + 318) = 34.16%
The negative predictive value (NPV) on TAIR10 cDNAs dataset = TN/(FN + TN) = 30,213/(11 + 30,213) = 93.75%
Sensitivity on TAIR10 cDNAs dataset = TP / (TP + FN) = 165/(165 + 11) = 93.75%
Specificity = TN/(FP + TN) = 30,213/(318 + 30,213) = 98.95%

To assess the efficiency of miRNA target identification, previously reported 434 sequences of miRNA-associated targets of Arabidopsis (from 183 distinct TAIR10 gene loci and 49 Arabidopsis miRNA families) were used as a benchmark for validating target identification (see Additional file 7 for these sequences). Table 3 shows the number of known target sequences remaining from each step of target identification. With the use of UniProtKB/Swiss-Prot protein database, the sensitivity of the identification calculated as TP / (TP + FN) was 0.922 and 0.933 respectively with default and customized parameter settings.

Table 3 Number of remaining input sequences from each step of target identification on the previously reported miRNA-associated target sequences of Arabidopsis

Step/Number of remaining sequences		Default parameters	Customized parameters
Number of input sequences		434	434
1. Sequence loading		434	434
2. Target scanning *		420	431
3. miRNA-target folding		420	431
4. Target annotation **	UniProtKB/Swiss-Prot (plant only)	404	410
4. Target annotation **	UniProtKB/TrEMBL (plant only)	416	428

* The default and customized binding scores for target scanning are <= 4 and <= 6, respectively.

** The default and customized BLASTX E-values for target annotation are 1e^-20 and 1e^-5, respectively.

Datasets used for system benchmarking

Add a new database into C-mii

[HowTo]

C-mii Running Example

Project Management
New Project
Load Project
MiRNA Prediction
Data Loading
Homolog Search
Primary-miRNA Folding
Precursor-miRNA Folding
Target Prediction
Data Loading
Target Scanning
Target Folding
Target Annotation
Helping features
Show database
Check for update
Recovery