Overview
Approach
Benchmark

Demo Download
ZOOM Lite Download
Academic Server
Request Prices

Tech Support



Product Brochure Trial Demo

ZOOM Benchmarking

Benchmark for Illumina/Solexa data

Experiment I. Illumina / Solexa BAC Data

The dataset used in this experiment were generated using a Solexa 1G sequencer at the CSHL genome center. The sample used is two BACs provided for the sequencer validation, which covers 162kb of chromosome 6. Approximately 3.4 million reads are produced, each read has length 36bp. If all the 36 bases of each read are considered reliable, then this is a 700X coverage sequencing of the 162kb region. All computing is performed on a single AMD Opteron 275 CPU (2.2GHz). Only one core of the CPU is utilized in our test. The time required to map all the 3.4 million reads to the 162kb region, the chromosome 6, and the whole human genome is given in the following table, with 1.1G memory consumption.

A speed comparison is performed between ZOOM and ELAND, the most efficient mapping software known to these researchers. Further, each read was cut in BAC data to a fixed length from 15bp to 32bp, which are the length limits of ELAND. These data sets are mapped to chromosome 6 of human. ZOOM’s efficiency over ELAND is illuminated in the following graph.

To examine the sensitivity of ungapped ZOOM on reads with insertions/deletions, researchers use the SSearch program (Smith-Waterman algorithm) to align each read back to the 162kb region of Chromosome 6, and used the alignment with the highest score as a control set. Examine continues identifying the percentage of those alignments which are also identified with ZOOM. The following table shows the percentage (sensitivity) when the edit distance (allowing insertions and deletions) of the alignment is equal to a certain value.

When higher sensitivity is needed, ZOOM has a higher sensitivity setting that provides such results:

Experiment II. Illumina/Solexa ChIP-Seq Data

ChIP-Seq data is another important output stream of next generation sequencing technology. ZOOM is compared with ELAND(0.2.2.5) on two ChIP-Seq data sets of length 17 bp from (Robertson et al., 2007)[3]. Both ChIPSeq data sets are too large for ELAND, therefore researchers split the data set into two parts and use ELAND to map them separately. ZOOM can handle both complete ChIP-Seq data sets. Time (hh:mm:ss) and memory usage (G) is shown as following. Results of ZOOM on the unsplit data sets are provided too.

Experiment III. Simulated Solexa Data

In this experiment, human chromosome 6 was randomly broken into segments of 36bp. For each segment, two random bases are mutated randomly to simulate the sequencing error. Approximately 24 million reads are produced providing a 5X coverage of chromosome 6. With the same computing power as Experiment 1, it only took 17 min and 17 sec to map all reads back to human chromosome 6 with 6.5G of RAM. Considering the fact that chromosome 6 is an above-average-sized chromosome of human, mapping all the human chromosomes separately will take no more than one day on a single CPU computer.

Benchmark for ABI SOLiD data




coming soon...

Reference:
[3] Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., Thiessen, N., Griffith, O.L., He, A., Marra, M., Snyder, M., and Jones, S. (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 4(8), 651-657.