Home | Download | Documentation & Benchmarking | Installation | Workflow

 

C-mii Performance

Benchmarking [223 mature miRNA sequences of Arabidopsis thaliana] [318 mature miRNA sequences of Arabidopsis thaliana]

Table 1 shows the number of remaining input sequences filtered for plus strand from each step of miRNA identification on the four datasets. The number of loaded TAIR10 cDNAs for the miRNA and target identifications was 30,707 and 33,597 out of 33,602 sequences due to different maximum sequence lengths accepted by the two identifications.

Table 1 Number of remaining input sequences from each step of miRNA and target identifications on the four datasets (Only 318 mature miRNA sequences of Arabidopsis thaliana from miRBase were used as source sequences for homolog search and target scanning)

 

 

TAIR10
(all cDNAs)

TAIR10
(miRNAs)

miRBase 18 (Arabidopsis only)
Rfam 10
(all plant RNAs except miRNAs)
MiRNA identification
33,602
176
291
16,219
Sequencing loading
30,707
176
291
15,822
Homolog search
4046
175
291
26
Primary miRNA folding
2052
174
287
0
Precursor miRNA folding
483
165
270
0
Target identification
33,602
176
291
16,219
Sequence loading
33,597
176
291
15,822
Target scanning
1884
108
183
70
MiRNA-target folding
1884
108
183
70
Target annotation
1005
0
0
0

Table 2 shows the number of true and false miRNAs identified by C-mii for the above datasets except TAIR10 miRNAs which is the subset of TAIR10 cDNAs.

Table 2 Number of TP, FP, FN, and TN of miRNA identification on the three datasets

 

 

Number of sequences

Number of identified miRNAs

TP

FP

FN

TN

TAIR10 cDNAs (all sequences)
30,707
483
165
318
11
30,213
miRBase 18 (Arabidopsis only)
291
270
270
0
21
0
Rfam 10 (all plant RNAs except miRNAs)
15,822
0
0
0
0
15,822
Total
46,820
753
435
318
32
46,035

The positive predictive value (PPV) on TAIR10 cDNAs dataset = TP/(TP + FP) = 165/(165 + 318) = 34.16%
The negative predictive value (NPV) on TAIR10 cDNAs dataset = TN/(FN + TN) = 30,213/(11 + 30,213) = 93.75%
Sensitivity on TAIR10 cDNAs dataset = TP / (TP + FN) = 165/(165 + 11) = 93.75%
Specificity = TN/(FP + TN) = 30,213/(318 + 30,213) = 98.95%


To assess the efficiency of miRNA target identification, previously reported 434 sequences of miRNA-associated targets of Arabidopsis (from 183 distinct TAIR10 gene loci and 49 Arabidopsis miRNA families) were used as a benchmark for validating target identification (see Additional file 7 for these sequences). Table 3 shows the number of known target sequences remaining from each step of target identification. With the use of UniProtKB/Swiss-Prot protein database, the sensitivity of the identification calculated as TP / (TP + FN) was 0.922 and 0.933 respectively with default and customized parameter settings.

 

Table 3 Number of remaining input sequences from each step of target identification on the previously reported miRNA-associated target sequences of Arabidopsis

Step/Number of remaining sequences

Default parameters

Customized parameters

Number of input sequences
434
434
1. Sequence loading
434
434
2. Target scanning *
420
431
3. miRNA-target folding
420
431
4. Target annotation ** UniProtKB/Swiss-Prot (plant only)
404
410
UniProtKB/TrEMBL (plant only)
416
428
* The default and customized binding scores for target scanning are <= 4 and <= 6, respectively.
** The default and customized BLASTX E-values for target annotation are 1e-20 and 1e-5, respectively.

Datasets used for system benchmarking

  1. Additional file 1: [TAIR10 cDNAs (all 33,602 sequences)]
  2. Additional file 2: [TAIR10 cDNAs (only 176 miRNAs)]
  3. Additional file 3: [miRBase release 18 (only Arabidopsis precursor miRNAs)]
  4. Additional file 4: [Rfam release 10 (all plant RNAs except miRNAs)]
  5. Additional file 5: [List of true positives (TP), false positives (FP), and false negatives (FN) of miRNA identification on TAIR10 cDNAs dataset ]
  6. Additional file 6: [List of false negatives (FN) of miRNA identification on Arabidopsis precursor miRNAs from miRBase 18 dataset ]
  7. [Arabidopsis known miRNA target]


Add a new database into C-mii

  1. [HowTo]

 

C-mii Running Example

 

   
   


Copyright 2011-2012. Information Systems Laboratory (ISL), Bioresources Technology Unit (BTU), National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand