BSI Papers

Here's a collection of some of the research that has gone into our products. Check out the product specific pages:
[PEAKS papers] , [RAPTOR and PROSPECT papers] , [PatternHunter papers]
Note: The posted papers are either in PS (post script) or PDF (Adobe portable document format).

RAPTOR

Yu, L., Using RAPTOR to Find Homologous Protein Structures for Molecular Replacement Phasing of X-Ray Crystallography, Bioinformatics Solutions [download 45.317 Kb]

X-ray crystallography is the most commonly used method in chemistry and biochemistry for determination of protein structure. RAPTOR software can improve and speed up the molecular replacement (MR) phasing of a target protein.

PEAKS

Jiaxi Wang, Bin Ma, Weiwu Chen Disulfide bonded Dipeptide Analysis with PEAKS and Q-TOF Mass Spectrometry , ASMS 2007 poster MPK . 171 [download 269.152 Kb]

Here we present an algorithmic solution for the analysis of MS/MS data of disulfide bonded dipeptides.

PEAKS

Denis Yuen, Bin Ma, Iain Rogers Peptide Sequence Reconstruction from de novo Sequences and their Homologues, ASMS 2007 poster ThPP . 269 [download 188.051 Kb]

Here we present a technique for constructing the real peptide sequences from de novo sequences derived by PEAKS Studio and homologous entries from a database.

PEAKS

Weijie Yang, Denis Yuen, Bin Ma, Iain Rogers Improving Protein Coverage by de novo Sequence Homology Searching with SPIDER, ASMS 2007 poster MPK . 176 [download 223.669 Kb]

In this work we build and evaluate a workflow involving PEAKS auto de novo sequencing and SPIDER, a unique tool for peptide sequence tag based homology searching.

PEAKS

Denis Yuen, Bin Ma, Iain Rogers Improving de novo Sequencing Accuracy for Ion Trap data in PEAKS Software, ASMS 2007 poster MPK . 175 [download 333.416 Kb]

In this work, the optimal weighting between multiple de novo sequencing score components is trained on a large dataset, and is demonstrated to provide a significant accuracy improvement in PEAKS Studio.

PEAKS

Bin Ma, Iain Rogers, Search for the Undiscovered Peptide; Using de novo sequencing and sequence tag homology search to improve protein characterization, Biotechniques Journal, Vol. 42, No. 5, 2007. [download 255.786 Kb]

A new tool, SPIDER is used to discover hidden peptides. Using a de novo sequence and a homologous sequence from the database, SPIDER reconstructs the real peptide, highlighting mutations and allowing for de novo sequencing error.

RAPTOR

Jinbo Xu. Protein Fold Recognition by Predicted Alignment Accuracy. ACM/IEEE Transactions on Computational Biology and Bioinformatics, 2(2):157-165. 2005. [download 152.561 Kb]

One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The traditional method for template selection is called Z-score, which uses a statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for the sequence. However, the calculation of Z-scores is time-consuming and not suitable for genome-scale structure prediction. Z-scores are also hard to interpret when the scoring function is the weighted sum of several energy items of different meanings. This paper presents a Support Vector Machine (SVM) regression approach to directly predict the alignment accuracy of protein threading, which is used to rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark demonstrate that SVM regression performs much better than the composition-corrected Z-score method. SVM regression also runs much faster than the Z-score method.

RAPTOR

Jinbo Xu, Feng Jiao, Bonnie Berger. A Tree-Decomposition Approach to Protein Structure Prediction. CSB 2005, Stanford, USA. [download 156.372 Kb]

This paper proposes a tree decomposition of protein structures, which can be used to efficiently solve two key subproblems of protein structure prediction: protein threading for backbone prediction and protein side-chain prediction. To develop a unified tree-decomposition based approach to these two subproblems, we model them as a geometric neighborhood graph labeling problem. Theoretically, we can have a low-degree polynomial time algorithm to decompose a geometric neighborhood graph G = (V;E) into components with size O(jV j23 log jV j). The computational complexity of the tree-decomposition based graph labeling algorithms is O(jV jtw+1) where  is the average number of possible labels for each vertex and tw(= O(jV j23 log jV j)) the tree width of G. Empirically, tw is very small and the tree-decomposition method can solve these two problems very efficiently. This paper also compares the computational efficiency of the treedecomposition approach with the linear programming approach to these two problems and identifies the condition under which the tree-decomposition approach is more efficient than the linear programming approach. Experimental result indicates that the tree-decomposition approach is more efficient most of the time.

RAPTOR

Jinbo Xu, Ying Xu, Ming Li. Protein Threading by Linear Programming: Theoretical Analysis and Computational Results, Journal of Combinatorial Optimization, In Press 2004. [download 289.419 Kb]

In a previous paper we have used an integer programming approach to implement a protein threading program RAPTOR for protein 3D structure prediction, based on the threading model treating pairwise contacts rigorously and allowing variable gaps. We have solved the integer program by the canonical branch-and-bound method. In this paper we present why our approach is so effective. The result of cutting plane analysis is that two types of well-known cuts for this problem are already implied in the constraint set, which provides us some intuition that our formulation would be very effective. Experimental results show that for about 99 percent of real threading instances, the linear relaxations of their integer programs solve to integral optimal solutions directly. For the rest, one percent of real instances, the integral solutions can be obtained with only several branch nodes. Experimental results also show that no special template or sequence features result in more possibilities of fractional solutions. This indicates that extra effort to seek for cutting planes to strengthen the existing formulation is unnecessary.

RAPTOR

Jinbo Xu, Ming Li, Dongsup Kim, Ying Xu. RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1:1(2003) 95-117. [download 275.389 Kb]

This paper presents a novel linear programming approach to do protein 3-dimensional (3D) structure prediction via threading. Based on the contact map graph of the protein 3D structure template, the protein threading problem is formulated as a large scale integer programming (IP) problem. The IP formulation is then relaxed to a linear programming (LP) problem, and then solved by the canonical branch-and-bound method. The final solution is globally optimal with respect to energy functions. In particular, our energy function includes pairwise interaction preferences and allowing variable gaps which are two key factors in making the protein threading problem NP-hard. A surprising result is that, most of time, the relaxed linear programs generate integral solutions directly. Our algorithm has been implemented as a software package RAPTOR – Rapid Protein Threading by Operation Research technique. Large scale benchmark test for fold recognition shows that RAPTOR significantly outperforms other programs at the fold similarity level. The CAFASP3 evaluation, a blind and public test by the protein structure prediction community, ranks RAPTOR as top 1, among individual prediction servers, in terms of the recognition capability and alignment accuracy for Fold Recognition (FR) family targets. RAPTOR also performs very well in recognizing the hard Homology Modeling(HM) targets.

RAPTOR

Jinbo Xu and Ming Li. Assessment of RAPTOR's linear programming approach in CAFASP3. Proteins: Structure, Function, and Genetics, 53(S6): 579--584, Oct. 2003. Invited paper for CASP5, voted by peers as the "most innovative method in CASP5". [download 182.924 Kb]

We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein threading problem, overcomes the intractability problem of protein threading, in practice, and allows us to use existing powerful linear programming software to obtain optimal protein threading solutions. CASP5 and CAFASP3 gave us the first chance to test RAPTOR in an unbiased way. RAPTOR was ranked as the top individual (automatic) server for fold recognition by the CAFASP3 organizers. In this short paper, we descrive RAPTOR's LP formulation, assess RAPTOR's performance in CAFASP3/CASP5, explain why it has superceded other existing automatic individual methods, and point out its strengths, limitations, extensions and prospects for improvement.

PEAKS

Rogers, I., Application Note: New tools for peptide identification on high mass accuracy data. Unpublished, August 2006. [download 63.403 Kb]

A comparative performance analysis on Thermo LTQ Orbitrap data using PEAKS Protein ID search engine, and another popular search engine. PEAKS more than doubles the other's ability to explain spectra. Data will be made available, on request, to anyone wishing to reproduce this test.

PEAKS

Yang, W., Chen, W., Rogers, I., Ma, B., Bendall, S., Lajoie, G., Smith, D., PEAKS Q: Software for MS-based quantification of stable isotope labeled peptides (Bioinformatics Solutions Inc., Genome BC Proteomics Centre, University of Western Ontario) ASMS 2006 poster WP531 [download 620.172 Kb]

In this work we describe a new software, PEAKS-Q, designed to automatically identify and quantify proteins from these ICAT, SILAC and other stable isotope labeling experiments.

PEAKS

Clark Chen, Iain Rogers, Intact Peptide Charge Determination from Ion Trap MS/MS, ASMS 2006 poster MP327 [download 744.905 Kb]

This research presents an algorithm that will allow a researcher to determine a peptide’s charge using MS/MS data alone.

PEAKS

Bin Ma, Gilles Lajoie, Improved positional confidence score in MS/MS peptide de novo sequencing, ASMS 2006 poster MP348 [download 139.391 Kb]

A new “positional confidence score” is developed to indicate which parts of the de novo sequencing results are correct.

PEAKS

Clark Chen, John Morey, Iain Rogers, Filtering out MS/MS spectra of insufficient quality before database searching, ASMS 2006 poster MP329 [download 963.777 Kb]

A method of filtering out the poor quality spectra prior to de novo sequencing or database searching, so as to reduce the risk of false positives and improve search speed.

PEAKS

Rogers, I., Haskins, W., Drastically increased coverage by using four search engines for Protein Identification (Bioinformatics Solutions Inc, Genentech), ASMS 2006 poster MP328 [download 140.836 Kb]

This poster demonstrates the improvement in coverage by using more than one search engine. It should not be viewed as a benchmark comparison of search engines, as the performance shown is dependant on arbitrary score filter values. More important is the low error and high sensitivity when using a sequence tag hybrid approach (PEAKS) and a pure peptide fragment fingerprinting approach (like SEQUEST or MASCOT) together -- regardless of score!

PEAKS

Ma, B., Rogers, I.,Application Note: PEAKS de novo performance on LTQ Orbitrap data Unpublished, June 2006. [download 41.472 Kb]

A demonstration of the accuracy of PEAKS de novo sequencing on a Thermo LTQ Orbitrap mass spectrometer. 97% accuracy is acheived!

RAPTOR

Jinbo Xu, Libo Yu, Tina Li, Cassandra Wigmore. BLAST-like E-value for Protein Threading in Drug Discovery, Beyond Genome 2005. [download 1876.947 Kb]

An improvement to RAPTOR output, allowing users to more easily evaluate results.

PEAKS

Y. Han, B. Ma, and K. Zhang: SPIDER: Software for Protein Identification from Sequence Tags Containing De Novo Sequencing Error. Journal of Bioinformatics and Computational Bioliogy 3(3):697-716. 2005. [download 145.316 Kb]

In order to identify the protein by searching the de novo sequencing results in a protein database, the database search software must handle the mass gaps and the de novo sequencing errors. Accounting the de novo sequencing errors and the mass gaps, we developed a software system, SPIDER (Software Protein Identifier), for the rapid identification of proteins that contain peptides best matching the given tags. SPIDER is different and superior to the MS Blast system (Altschul et al.) as the latter does not account for the de novo sequencing errors and mass gaps.

PEAKS

Bin Ma; Gilles Lajoie (Departments of Computer Science and Biochemistry at the University of Western Ontario). Improving the de novo Sequencing Accuracy by Combining Two Independent Scoring Functions in PEAKS Software, ASMS 2005. [download 217.6 Kb]

By combining the original PEAKS scoring function and a new scoring function, the accuracy of PEAKS de novo sequencing is remarkably improved.

PEAKS

Iain Rogers. Assessment of an Amalgamative Approach to Protein Identification, ASMS 2005. [download 2034.572 Kb]

The following shows how two or more protein identification tools used in chorus, each confirming the results of the others, can improve quality of and confidence in results.

PEAKS

Jennifer Locke, Jason Rogalski, Lei Guo, Bin Ma, Juergen Kast, Gilles Lajoie (University of British Columbia, Bioinformatics Solutions Inc. & University of Western Ontario). Automated de novo Sequencing Using ToF-ToF MS/MS Data, ASMS 2005. [download 267.776 Kb]

PEAKS software works well for both de novo sequencing (with no protein database) and protein identification (with protein database) with MS/MS data obtained from MALDI ToF/ToF instrument.

RAPTOR

M. Li, T. Li, I. Rogers, C. Wigmore, J. Xu. Comprehensive Protein Functional Annotation Pipeline Combining Threading and Homology Search, Drug Discovery Technology Conferece, 2004. [download 49.642 Kb]

For those who need to find the function of a protein.

RAPTOR

T. Li, I. Rogers, C. Wigmore, J. Xu. New Approach for Protein Structure Prediction by Linear Based Threading, Protein Society Meeting Poster, 2004. [download 650.176 Kb]

The first conference poster describing RAPTORs approach.

PEAKS

Iain Rogers, Christopher Hendrie, Ming Li. Protein ID: Comparing De Novo Based and Database Search Methods, ASMS Poster, 2004. [download 208.207 Kb]

A demonstration of the de novo based protein ID approach used in PEAKS.

PEAKS

Bin Ma, Amanda Doherty-Kirby, Aaron Booy, Bob Olafson, Gilles Lajoie. A Comprehensive Comparison of the de novo Sequencing Accuracies of Peaks, and Other Software ASMS Poster, 2004. [download 33.834 Kb]

A second successful comparison between PEAKS and other de novo software tools.

PH

Uri Keich, Ming Li, Bin Ma, John Tromp. On Spaced Seeds for Similarity Search. Discrete Applied Mathematics, 2004. [download 130.957 Kb]

A dissertation on the optimal use of spaced seeds.

PH

Ming Li, Bin Ma, Derek Kisman, John Tromp. PatternHunter II: Highly Sensitive and Fast Homology Search. Journal of Bioinformatics and Computational Biology, 2004. To appear. Early version in GIW 2003. [download 301.479 Kb]

PatternHunter revolutionizes homology search with use of multipe spaced seeds.

PROSPECT

Dongsup Kim, Dong Xu, Jun-tao Guo, Kyle Ellrott and Ying Xu. PROSPECT II: protein structure prediction program for genome-scale applications. Protein Engineering, , vol. 16 no. 9 pp. 641-650, 2003. [download 233.777 Kb]

Building on earlier work, PROSPECT becomes a more complete tool.

PROSPECT

Manesh Shah, Sergei Passovets, Dongsup Kim, Kyle Ellrott, Li Wang, Inna Vokler, Philip LoCascio, Dong Xu and Ying Xu. A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics , Vol. 19 no. 15 2003, Pages 1985-1996. [download 406.495 Kb]

The authors integrate PROSPECT in a pipeline for prediction.

RAPTOR

Jinbo Xu, Ming Li. Assessment of RAPTORs linear programming approach in CAFASP3. Proteins: Structure, Function, and Genetics, 53(S6):579-584. October 2003. [download 834.532 Kb]

RAPTORs first real world test. RAPTOR captured the top spot and has held the title for several years.

PEAKS

Chengzhi Liang, Jeffrey C. Smith, Christopher Hendrie, Ming Li, K. W. Michael Siu. A Comparative Study of Peptide Sequencing Software Tools for MS/MS. ASMS Poster, 2003. [download 64.157 Kb]

PEAKS sets a standard in the first of several comparisons against other de novo software

RAPTOR

Jinbo Xu, Ming Li, Dongsup Kim, Ying Xu. RAPTOR: Optimal Protein Threading by Linear programming. Journal of Bioinformatics and Computational Biology, 1(1):95-117. 2003. [download 1981.685 Kb]

If you plan to cite RAPTOR in your work, please use this reference.

PH

Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature, 420 (6915):520-522. December 2002.

A large scale study using PatternHunter, which finished a remarkably sensitive comparison in record time! Ming Li and Bin Ma are members of MGSC.

PEAKS

Bin Ma, Kaizhong Zhang, Christopher Hendrie, Chengzhi Liang, Ming Li, Amanda Doherty-Kirby, Gilles Lajoie. PEAKS: Powerful Software for Peptide De Novo Sequencing by MS/MS. Rapid Communications in Mass Spectrometry, 17(20):2337-2342. 2003. Early version appeared in 50th ASMS Conference 2002. [download 126.345 Kb]

If you plan to cite PEAKS in your research, please refer to this paper. PEAKS has come a long way since the original version, but the principles are the same.

PH

Bin Ma, John Tromp, Ming Li. PatternHunter: faster and more sensitive homology search. Bioinformatics, 18(3):440-445. March 2002. [download 301.479 Kb]

The original PatternHunter paper.

PROSPECT

Dong Xu, Oakley H. Crawford, Philip F. LoCascio, Ying Xu. Application of PROSPECT in CASP4: Characterizing protein structures with new folds . Proteins: Structure, Function, and Genetics , Volume 45, Issue S5 , Pages 140 - 148, 2001. [download 442.997 Kb]

Evaluation of prospect on hard targets -- proteins with new folds.

PROSPECT

Ying Xu , Dong Xu. Protein threading using PROSPECT: Design and evaluation. Proteins: Structure, Function, and Genetics , Volume 40, Issue 3 , Pages 343 - 354, 2000. [download 551.38 Kb]

The design and methodology behind PROSPECT Pro

PROSPECT

Ying Xu, Dong Xu, Oakley H. Crawford, J.ralph Einstein, Frank Larimer, Ed Uberbacher, Michael A. Unseren and Ge Zhang . Protein threading by PROSPECT: a prediction experiment in CASP3. Protein Engineering, , Vol. 12, No. 11, 899-907, November 1999. [download 673.848 Kb]

This is the original PROSPECT research paper.