Home | Download | Documentation & Benchmarking | Installation | Workflow

 

C-mii Performance

Benchmarking [223 mature miRNA sequences of Arabidopsis thaliana] [318 mature miRNA sequences of Arabidopsis thaliana]

Table 1 shows the number of remaining input sequences filtered for plus strand from each step of miRNA identification on the four datasets. The number of loaded TAIR10 cDNAs for the miRNA and target identifications was 30,707 and 33,597 out of 33,602 sequences due to different maximum sequence lengths accepted by the two identifications.

Table 1 Number of remaining input sequences from each step of miRNA and target identifications on the four datasets (Only 233 mature miRNA sequences of Arabidopsis thaliana from miRBase were used as source sequences for homolog search and target scanning)

 

 

TAIR10
(all cDNAs)

TAIR10
(miRNAs)

miRBase 16 (Arabidopsis only)
Rfam 10
(all plant RNAs except miRNAs)
MiRNA identification
33,602
176
213
16,219
Sequencing loading
30,707
176
213
15,822
Homolog search
1286
175
213
31
Primary miRNA folding
567
173
209
0
Precursor miRNA folding
223
164
195
0
Target identification
33,602
176
213
16,219
Sequence loading
33,597
176
213
15,822
Target scanning
1126
101
122
56
MiRNA-target folding
1126
101
122
56
Target annotation
546
0
0
0

Table 2 shows the number of true and false miRNAs identified by C-mii for the above datasets except TAIR10 miRNAs which is the subset of TAIR10 cDNAs.

Table 2 Number of TP, FP, FN, and TN of miRNA identification on the three datasets

 

 

Number of sequences

Number of identified miRNAs

TP

FP

FN

TN

TAIR10 cDNAs (all sequences)
30,707
223
164
59
12
30,472
miRBase 16 (Arabidopsis only)
213
195
195
0
18
0
Rfam 10 (all plant RNAs except miRNAs)
15,822
0
0
0
0
15,822
Total
46,742
418
359
59
30
46,294

The positive predictive value (PPV) on TAIR10 cDNAs dataset = TP/(TP + FP) = 164/(164 + 59) = 73.54%
The negative predictive value (NPV) on TAIR10 cDNAs dataset = TN/(FN + TN) = 30,472/(12 + 30,472) = 99.96%
Sensitivity on TAIR10 cDNAs dataset = TP / (TP + FN) = 164/(164 + 12) = 93.18%
Specificity = TN/(FP + TN) = 30,472/(59 + 30,472) = 99.81%


To assess the efficiency of miRNA target identification, previously reported 434 sequences of miRNA-associated targets of Arabidopsis (from 183 distinct TAIR10 gene loci and 49 Arabidopsis miRNA families) were used as a benchmark for validating target identification (see Additional file 7 for these sequences). Table 3 shows the number of known target sequences remaining from each step of target identification. With the use of UniProtKB/Swiss-Prot protein database, the sensitivity of the identification calculated as TP / (TP + FN) was 0.922 and 0.933 respectively with default and customized parameter settings.

 

Table 3 Number of remaining input sequences from each step of target identification on the previously reported miRNA-associated target sequences of Arabidopsis

Step/Number of remaining sequences

Default parameters

Customized parameters

Number of input sequences
434
434
1. Sequence loading
434
434
2. Target scanning *
410
430
3. miRNA-target folding
410
430
4. Target annotation ** UniProtKB/Swiss-Prot (plant only)
400
405
UniProtKB/TrEMBL (plant only)
406
427
* The default and customized binding scores for target scanning are <= 4 and <= 6, respectively.
** The default and customized BLASTX E-values for target annotation are 1e-20 and 1e-5, respectively.


We measured the efficiency of multi-thread management on Ubuntu 9.10 (karmic) machine with four Intel(R) Core(TM)2 Quad CPU Q6600 at 2.4 GHz, 8GB RAM. The average speed of the miRNA and target identifications on TAIR10 cDNAs (33,602 sequences) with default parameter settings was improved by 30% and 46% from single to two and four threads (see Table 4).

 

Table 4 Time usage for each step of miRNA and target identifications on TAIR10 cDNAs dataset with varied number of threads running

Number of threads

 

1

 

2

 

4

 

MiRNA identification
Homolog search
2:40:31
1:56:20
1:34:31
Primary miRNA folding
1:39:08
0:55:15
0:34:31
Precursor miRNA folding
0:18:24
0:22:10
0:17:40
Target identification
Target scanning
0:22:23
0:20:10
0:19:22
miRNA-target folding
0:22:24
0:14:15
0:14:39
Target annotation
0:29:19
0:16:09
0:10:15

Datasets used for system benchmarking

  1. Additional file 1: [TAIR10 cDNAs (all 33,602 sequences)]
  2. Additional file 2: [TAIR10 cDNAs (only 176 miRNAs)]
  3. Additional file 3: [miRBase release 16 (only Arabidopsis precursor miRNAs)]
  4. Additional file 4: [Rfam release 10 (all plant RNAs except miRNAs)]
  5. Additional file 5: [List of true positives (TP), false positives (FP), and false negatives (FN) of miRNA identification on TAIR10 cDNAs dataset ]
  6. Additional file 6: [List of false negatives (FN) of miRNA identification on Arabidopsis precursor miRNAs from miRBase 16 dataset ]
  7. [Arabidopsis known miRNA target]


Add a new database into C-mii

  1. [HowTo]

 

C-mii Running Example

 

   
   


Copyright 2011-2012. Information Systems Laboratory (ISL), Bioresources Technology Unit (BTU), National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand