All Research

Here is a collection of research conducted using ZOOM by BSI and by the user community.



Our Research

Here is a collection of the main research conducted during the development of the ZOOM software.

BSI Published Research Zhang Z, Lin H, Ma B. ZOOM Lite: next-generation sequencing data mapping and visualization software. Nucleic Acids Res. 2010 Jun 8.
High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats.

BSI Published Research Lin H, Zhang Z, Zhang MQ, Ma B, Li M. ZOOM! Zillions of oligos mapped. Bioinformatics. 2008 Nov 1;24(21):2431-7.
MOTIVATION: The next generation sequencing technologies are generating billions of short reads daily. Resequencing and personalized medicine need much faster software to map these deep sequencing reads to a reference genome, to identify SNPs or rare transcripts. RESULTS: We present a framework for how full sensitivity mapping can be done in the most efficient way, via spaced seeds. Using the framework, we have developed software called ZOOM, which is able to map the Illumina/Solexa reads of 15x coverage of a human genome to the reference human genome in one CPU-day, allowing two mismatches, at full sensitivity.

 
User Research

Here is a collection of research, written by Users and other third-party members, which cite the usage or methods present in the ZOOM software.

User Published Research Mercer T, Wilhelm D, Dinger ME, Solda G, Korbie DJ, Glazov EA, Truong V, Schwenke M, Simons C, Matthaei C, Saint R, Koopman P, Mattick JS. Expression of distinct RNAs from 30 untranslated regions. Nucleic Acids Research, 2010, 1–11.
The 30 untranslated regions (30UTRs) of eukaryotic genes regulate mRNA stability, localization and translation. Here, we present evidence that large numbers of 30UTRs in human, mouse and fly are also expressed separately from the associated protein-coding sequences to which they are normally linked, likely by post-transcriptional cleavage. Analysis of CAGE (capped analysis of gene expression), SAGE (serial analysis of gene expression) and cDNA libraries, as well as microarray expression profiles, demonstrate that the independent expression of 30UTRs is a regulated and conserved genome-wide phenomenon. We characterize the expression of several 30UTR-derived RNAs (uaRNAs) in detail in mouse embryos, showing by in situ hybridization that these transcripts are expressed in a cell- and subcellular-specific manner. Our results suggest that 30UTR sequences can function not only in cis to regulate protein expression, but also intrinsically and independently in trans, likely as noncoding RNAs, a conclusion supported by a number of previous genetic studies. Our findings suggest novel functions for 30UTRs, as well as caution in the use of 30UTR sequence probes to analyze gene expression.

User Published Research Varshney RK, Nayak SN, May GD, Jackson SA. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends in Biotechnology. Volume 27, Issue 9, September 2009, Pages 522-530.
Using next-generation sequencing technologies it is possible to resequence entire plant genomes or sample entire transcriptomes more efficiently and economically and in greater depth than ever before. Rather than sequencing individual genomes, we envision the sequencing of hundreds or even thousands of related genomes to sample genetic diversity within and between germplasm pools. Identification and tracking of genetic variation are now so efficient and precise that thousands of variants can be tracked within large populations. In this review, we outline some important areas such as the large-scale development of molecular markers for linkage mapping, association mapping, wide crosses and alien introgression, epigenetic modifications, transcript profiling, population genetics and de novo genome/organellar genome assembly for which these technologies are expected to advance crop genetics and breeding, leading to crop improvement.

User Published Research Barski A, Zhao K. Genomic location analysis by ChIP-Seq. J Cell Biochem. 2009 May 1;107(1):11-8.
The interaction of a multitude of transcription factors and other chromatin proteins with the genome can influence gene expression and subsequently cell differentiation and function. Thus systematic identification of binding targets of transcription factors is key to unraveling gene regulation networks. The recent development of ChIP-Seq has revolutionized mapping of DNA-protein interactions. Now protein binding can be mapped in a truly genome-wide manner with extremely high resolution. This review discusses ChIP-Seq technology, its possible pitfalls, data analysis and several early applications.

User Published Research Batley J, Edwards D. Genome sequence data: management, storage, and visualization. Biotechniques. 2009 Apr;46(5):333-4, 336.
Over the last few years there has been a revolution in DNA sequencing technology that has brought down the cost of DNA sequencing and made the sequencing of an increasing number of genomes both feasible and cost effective. There has also been a dramatic shift in the type of sequence data being generated, with vast numbers of short reads or pairs of short reads replacing the traditional relatively long reads produced by Sanger sequencing. These changes in data quantity and format have led to a rethinking of sequence data management, storage, and visualization, and provide a challenge for bioinformatics. The vast amount of sequence data that will be generated over the next few years will require a change in what data are stored and how users query the information.

User Published Research Li, M. Modern Homology Search. GIW Dec 1-3, 2008, Brisbane.
Designing the seeds: We proved 84 lower bound theorems and constructed 84 upper bounds.