Home | Download | Documentation & Benchmarking | Installation | Workflow

C-mii Performance

Benchmarking [223 mature miRNA sequences of Arabidopsis thaliana] [318 mature miRNA sequences of Arabidopsis thaliana]

Table 1 shows the number of remaining input sequences filtered for plus strand from each step of miRNA identification on the four datasets. The number of loaded TAIR10 cDNAs for the miRNA and target identifications was 30,707 and 33,597 out of 33,602 sequences due to different maximum sequence lengths accepted by the two identifications.

Table 1 Number of remaining input sequences from each step of miRNA and target identifications on the four datasets (Only 233 mature miRNA sequences of Arabidopsis thaliana from miRBase were used as source sequences for homolog search and target scanning)

	TAIR10 (all cDNAs)	TAIR10 (miRNAs)	*miRBase 16 (Arabidopsis* only)**	Rfam 10 (all plant RNAs except miRNAs)
MiRNA identification	33,602	176	213	16,219
Sequencing loading	30,707	176	213	15,822
Homolog search	1286	175	213	31
Primary miRNA folding	567	173	209	0
Precursor miRNA folding	223	164	195	0
Target identification	33,602	176	213	16,219
Sequence loading	33,597	176	213	15,822
Target scanning	1126	101	122	56
MiRNA-target folding	1126	101	122	56
Target annotation	546	0	0	0

Table 2 shows the number of true and false miRNAs identified by C-mii for the above datasets except TAIR10 miRNAs which is the subset of TAIR10 cDNAs.

Table 2 Number of TP, FP, FN, and TN of miRNA identification on the three datasets

	Number of sequences	Number of identified miRNAs	TP	FP	FN	TN
TAIR10 cDNAs (all sequences)	30,707	223	164	59	12	30,472
miRBase 16 (Arabidopsis only)	213	195	195	0	18	0
Rfam 10 (all plant RNAs except miRNAs)	15,822	0	0	0	0	15,822
Total	46,742	418	359	59	30	46,294

The positive predictive value (PPV) on TAIR10 cDNAs dataset = TP/(TP + FP) = 164/(164 + 59) = 73.54%
The negative predictive value (NPV) on TAIR10 cDNAs dataset = TN/(FN + TN) = 30,472/(12 + 30,472) = 99.96%
Sensitivity on TAIR10 cDNAs dataset = TP / (TP + FN) = 164/(164 + 12) = 93.18%
Specificity = TN/(FP + TN) = 30,472/(59 + 30,472) = 99.81%

To assess the efficiency of miRNA target identification, previously reported 434 sequences of miRNA-associated targets of Arabidopsis (from 183 distinct TAIR10 gene loci and 49 Arabidopsis miRNA families) were used as a benchmark for validating target identification (see Additional file 7 for these sequences). Table 3 shows the number of known target sequences remaining from each step of target identification. With the use of UniProtKB/Swiss-Prot protein database, the sensitivity of the identification calculated as TP / (TP + FN) was 0.922 and 0.933 respectively with default and customized parameter settings.

Table 3 Number of remaining input sequences from each step of target identification on the previously reported miRNA-associated target sequences of Arabidopsis

Step/Number of remaining sequences		Default parameters	Customized parameters
Number of input sequences		434	434
1. Sequence loading		434	434
2. Target scanning *		410	430
3. miRNA-target folding		410	430
4. Target annotation **	UniProtKB/Swiss-Prot (plant only)	400	405
4. Target annotation **	UniProtKB/TrEMBL (plant only)	406	427

* The default and customized binding scores for target scanning are <= 4 and <= 6, respectively.

** The default and customized BLASTX E-values for target annotation are 1e^-20 and 1e^-5, respectively.

We measured the efficiency of multi-thread management on Ubuntu 9.10 (karmic) machine with four Intel(R) Core(TM)2 Quad CPU Q6600 at 2.4 GHz, 8GB RAM. The average speed of the miRNA and target identifications on TAIR10 cDNAs (33,602 sequences) with default parameter settings was improved by 30% and 46% from single to two and four threads (see Table 4).

Table 4 Time usage for each step of miRNA and target identifications on TAIR10 cDNAs dataset with varied number of threads running

Number of threads	1	2	4
MiRNA identification
Homolog search	2:40:31	1:56:20	1:34:31
Primary miRNA folding	1:39:08	0:55:15	0:34:31
Precursor miRNA folding	0:18:24	0:22:10	0:17:40
Target identification
Target scanning	0:22:23	0:20:10	0:19:22
miRNA-target folding	0:22:24	0:14:15	0:14:39
Target annotation	0:29:19	0:16:09	0:10:15

Datasets used for system benchmarking

Add a new database into C-mii

[HowTo]

C-mii Running Example

Project Management
New Project
Load Project
MiRNA Prediction
Data Loading
Homolog Search
Primary-miRNA Folding
Precursor-miRNA Folding
Target Prediction
Data Loading
Target Scanning
Target Folding
Target Annotation
Helping features
Show database
Check for update
Recovery