|
Here is a collection of the main research conducted during the development of the PatternHunter software.
|
Ma B, Tromp J, Li M. PatternHunter: Faster and More Sensitive Homology Search. Bioinformatics. 2002 Mar;18(3):440-5. |
|
MOTIVATION: Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation. RESULTS: We present a new homology search algorithm 'PatternHunter' that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. At Blast levels of sensitivity, PatternHunter is able to find homologies between sequences as large as human chromosomes, in mere hours on a desktop. AVAILABILITY: PatternHunter is available at www.bioinfor.com, as a commercial package. It runs on all platforms that support Java. |
|
Chen X, Li M, Ma B, Tromp J. DNACompress: Fast and Effective DNA Sequence Compression. Bioinformatics. 2002 Dec;18(12):1696-8. |
|
While achieving the best compression ratios for DNA sequences, our new DNACompress program significantly improves the running time of all previous DNA compression programs. |
|
Li M, Brown D. Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915):520-522. December 2002. |
|
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism. |
|
Li M, Ma B, Kisman D, Tromp J. PatternHunter II: Highly Sensitive and Fast Homology Search. Genome Inform. 2003;14:164-75. |
|
Extending the single optimized spaced seed of PatternHunter to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search technology back to a full circle. |
|