|
C-mii Performance
Benchmarking [223 mature miRNA sequences of Arabidopsis thaliana] [318 mature miRNA sequences of Arabidopsis thaliana]
Table 1 shows the number of remaining input sequences filtered for plus strand from each step of miRNA identification on the four datasets. The number of loaded TAIR10 cDNAs for the miRNA and target identifications was 30,707 and 33,597 out of 33,602 sequences due to different maximum sequence lengths accepted by the two identifications.
Table 1 Number of remaining input sequences from each step of miRNA and target identifications on the four datasets (Only 233 mature miRNA sequences of Arabidopsis thaliana from miRBase were used as source sequences for homolog search and target scanning)
|
TAIR10 (all cDNAs) |
TAIR10 (miRNAs) |
miRBase 16 (Arabidopsis only) |
Rfam 10 (all plant RNAs except miRNAs) |
MiRNA identification |
33,602 |
176 |
213 |
16,219 |
Sequencing loading |
30,707 |
176 |
213 |
15,822 |
Homolog search |
1286 |
175 |
213 |
31 |
Primary miRNA folding |
567 |
173 |
209 |
0 |
Precursor miRNA folding |
223 |
164 |
195 |
0 |
Target identification |
33,602 |
176 |
213 |
16,219 |
Sequence loading |
33,597 |
176 |
213 |
15,822 |
Target scanning |
1126 |
101 |
122 |
56 |
MiRNA-target folding |
1126 |
101 |
122 |
56 |
Target annotation |
546 |
0 |
0 |
0 |
Table 2 shows the number of true and false miRNAs identified by C-mii for the above datasets except TAIR10 miRNAs which is the subset of TAIR10 cDNAs.
Table 2 Number of TP, FP, FN, and TN of miRNA identification on the three datasets
|
Number of sequences |
Number of identified miRNAs |
TP |
FP |
FN |
TN |
TAIR10 cDNAs (all sequences) |
30,707 |
223 |
164 |
59 |
12 |
30,472 |
miRBase 16 (Arabidopsis only) |
213 |
195 |
195 |
0 |
18 |
0 |
Rfam 10 (all plant RNAs except miRNAs) |
15,822 |
0 |
0 |
0 |
0 |
15,822 |
Total |
46,742 |
418 |
359 |
59 |
30 |
46,294 |
The positive predictive value (PPV) on TAIR10 cDNAs dataset = TP/(TP + FP) = 164/(164 + 59) = 73.54%
The negative predictive value (NPV) on TAIR10 cDNAs dataset = TN/(FN + TN) = 30,472/(12 + 30,472) = 99.96%
Sensitivity on TAIR10 cDNAs dataset = TP / (TP + FN) = 164/(164 + 12) = 93.18%
Specificity = TN/(FP + TN) = 30,472/(59 + 30,472) = 99.81%
To assess the efficiency of miRNA target identification, previously reported 434 sequences of miRNA-associated targets of Arabidopsis (from 183 distinct TAIR10 gene loci and 49 Arabidopsis miRNA families) were used as a benchmark for validating target identification (see Additional file 7 for these sequences). Table 3 shows the number of known target sequences remaining from each step of target identification. With the use of UniProtKB/Swiss-Prot protein database, the sensitivity of the identification calculated as TP / (TP + FN) was 0.922 and 0.933 respectively with default and customized parameter settings.
Table 3 Number of remaining input sequences from each step of target identification on the previously reported miRNA-associated target sequences of Arabidopsis
Step/Number of remaining sequences
|
Default parameters
|
Customized parameters |
Number of input sequences |
434 |
434 |
1. Sequence loading |
434 |
434 |
2. Target scanning * |
410 |
430 |
3. miRNA-target folding |
410 |
430 |
4. Target annotation ** |
UniProtKB/Swiss-Prot (plant only) |
400 |
405 |
UniProtKB/TrEMBL (plant only) |
406 |
427 |
* The default and customized binding scores for target scanning are <= 4 and <= 6, respectively. |
** The default and customized BLASTX E-values for target annotation are 1e-20 and 1e-5, respectively. |
We measured the efficiency of multi-thread management on Ubuntu 9.10 (karmic) machine with four Intel(R) Core(TM)2 Quad CPU Q6600 at 2.4 GHz, 8GB RAM. The average speed of the miRNA and target identifications on TAIR10 cDNAs (33,602 sequences) with default parameter settings was improved by 30% and 46% from single to two and four threads (see Table 4).
Table 4 Time usage for each step of miRNA and target identifications on TAIR10 cDNAs dataset with varied number of threads running
|
1
|
2
|
|
MiRNA identification |
|
Homolog search |
2:40:31 |
1:56:20 |
1:34:31 |
Primary miRNA folding |
1:39:08 |
0:55:15 |
0:34:31 |
Precursor miRNA folding |
0:18:24 |
0:22:10 |
0:17:40 |
Target identification |
|
Target scanning |
0:22:23 |
0:20:10 |
0:19:22 |
miRNA-target folding |
0:22:24 |
0:14:15 |
0:14:39 |
Target annotation |
0:29:19 |
0:16:09 |
0:10:15 |
Datasets used for system benchmarking
- Additional file 1: [TAIR10 cDNAs (all 33,602 sequences)]
- Additional file 2: [TAIR10 cDNAs (only 176 miRNAs)]
- Additional file 3: [miRBase release 16 (only Arabidopsis precursor miRNAs)]
- Additional file 4: [Rfam release 10 (all plant RNAs except miRNAs)]
- Additional file 5: [List of true positives (TP), false positives (FP), and false negatives (FN) of miRNA identification on TAIR10 cDNAs dataset ]
- Additional file 6: [List of false negatives (FN) of miRNA identification on Arabidopsis precursor miRNAs from miRBase 16 dataset ]
- [Arabidopsis known miRNA target]
Add a new database into C-mii
- [HowTo]
|