|
|
Our Research
Here's a list of the research that we wrote up along the way to building the software. Please share your research with us.
|   |
Yu, L., Using RAPTOR to Find Homologous Protein Structures for Molecular Replacement Phasing of X-Ray Crystallography, Bioinformatics Solutions [download 45.317 Kb] X-ray crystallography is the most commonly used method in chemistry and biochemistry for determination of protein structure. RAPTOR software can improve and speed up the molecular replacement (MR) phasing of a target protein.
|
|   |
Jinbo Xu. Protein Fold Recognition by Predicted Alignment Accuracy. ACM/IEEE Transactions on Computational Biology and Bioinformatics, 2(2):157-165. 2005. [download 152.561 Kb] One of the key components in protein structure prediction by protein threading technique is to
choose the best overall template for a given target sequence after all the optimal sequence-template
alignments are generated. The traditional method for template selection is called Z-score, which uses a
statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for
the sequence. However, the calculation of Z-scores is time-consuming and not suitable for genome-scale
structure prediction. Z-scores are also hard to interpret when the scoring function is the weighted sum
of several energy items of different meanings. This paper presents a Support Vector Machine (SVM)
regression approach to directly predict the alignment accuracy of protein threading, which is used to
rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark
demonstrate that SVM regression performs much better than the composition-corrected Z-score method.
SVM regression also runs much faster than the Z-score method.
|
|   |
Jinbo Xu, Feng Jiao, Bonnie Berger. A Tree-Decomposition Approach to Protein Structure Prediction. CSB 2005, Stanford, USA. [download 156.372 Kb] This paper proposes a tree decomposition of protein
structures, which can be used to efficiently solve two key
subproblems of protein structure prediction: protein threading for backbone prediction and protein side-chain prediction. To develop a unified tree-decomposition based approach to these two subproblems, we model them as a
geometric neighborhood graph labeling problem. Theoretically, we can have a low-degree polynomial time algorithm to decompose a geometric neighborhood graph
G = (V;E) into components with size O(jV j23 log jV j).
The computational complexity of the tree-decomposition
based graph labeling algorithms is O(jV jtw+1) where
is the average number of possible labels for each vertex
and tw(= O(jV j23 log jV j)) the tree width of G. Empirically, tw is very small and the tree-decomposition method can solve these two problems very efficiently. This paper also compares the computational efficiency of the treedecomposition approach with the linear programming approach to these two problems and identifies the condition
under which the tree-decomposition approach is more efficient than the linear programming approach. Experimental result indicates that the tree-decomposition approach is more efficient most of the time.
|
|   |
Jinbo Xu, Ying Xu, Ming Li. Protein Threading by Linear Programming: Theoretical Analysis and Computational Results, Journal of Combinatorial Optimization, In Press 2004. [download 289.419 Kb] In a previous paper we have used an integer programming approach to implement a protein threading program RAPTOR for protein 3D structure prediction, based on the threading model treating pairwise contacts rigorously and allowing variable gaps. We have solved the integer program by the canonical branch-and-bound method. In this paper we present why our approach is so effective. The result of cutting plane analysis is that two types of well-known cuts for this problem are already implied in the constraint set, which provides us some intuition that our formulation would be very effective. Experimental results show that for about 99 percent of real threading instances, the linear relaxations of their integer programs solve to integral optimal solutions directly. For the rest, one percent of real instances, the integral solutions can be obtained with only several branch nodes. Experimental results also show that no special template or sequence features result in more possibilities of fractional solutions. This indicates that extra effort to seek for cutting planes to strengthen the existing formulation is unnecessary.
|
|   |
Jinbo Xu, Ming Li, Dongsup Kim, Ying Xu. RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1:1(2003) 95-117. [download 275.389 Kb] This paper presents a novel linear programming approach to do protein 3-dimensional (3D) structure prediction via threading. Based on the contact map graph of the protein 3D structure template, the protein threading problem is formulated as a large scale integer programming (IP) problem. The IP formulation is then relaxed to a linear programming (LP) problem, and then solved by the canonical
branch-and-bound method. The final solution is globally optimal with respect to energy functions. In particular, our energy function includes pairwise interaction preferences and allowing variable gaps which are two key factors in making the protein threading problem NP-hard. A surprising result is that, most of time, the relaxed linear programs generate integral solutions directly. Our algorithm has been implemented as a software package RAPTOR – Rapid Protein Threading by Operation Research
technique. Large scale benchmark test for fold recognition shows that RAPTOR significantly outperforms other programs at the fold similarity level. The CAFASP3 evaluation, a blind and public test by the protein structure prediction community, ranks RAPTOR as top 1, among individual prediction servers, in terms of the recognition capability and alignment accuracy for Fold Recognition (FR) family targets. RAPTOR also performs very well in recognizing the hard Homology Modeling(HM) targets.
|
|   |
Jinbo Xu and Ming Li. Assessment of RAPTOR's linear programming approach in CAFASP3. Proteins: Structure, Function, and Genetics, 53(S6): 579--584, Oct. 2003. Invited paper for CASP5, voted by peers as the "most innovative method in CASP5". [download 182.924 Kb] We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein threading problem, overcomes the intractability problem of protein threading, in practice, and allows us to use existing powerful linear programming software to obtain optimal protein threading solutions. CASP5 and CAFASP3 gave us the first chance to test RAPTOR in an unbiased way. RAPTOR was ranked as the top individual (automatic) server for fold recognition by the CAFASP3 organizers. In this short paper, we descrive RAPTOR's LP formulation, assess RAPTOR's performance in CAFASP3/CASP5, explain why it has superceded other existing automatic individual methods, and point out its strengths, limitations, extensions and prospects for improvement.
|
|   |
Jinbo Xu, Libo Yu, Tina Li, Cassandra Wigmore. BLAST-like E-value for Protein Threading in Drug Discovery, Beyond Genome 2005. [download 1876.947 Kb] An improvement to RAPTOR output, allowing users to more easily evaluate results.
|
|   |
M. Li, T. Li, I. Rogers, C. Wigmore, J. Xu. Comprehensive Protein Functional Annotation Pipeline Combining Threading and Homology Search, Drug Discovery Technology Conferece, 2004. [download 49.642 Kb] For those who need to find the function of a protein.
|
|   |
T. Li, I. Rogers, C. Wigmore, J. Xu. New Approach for Protein Structure Prediction by Linear Based Threading, Protein Society Meeting Poster, 2004. [download 650.176 Kb] The first conference poster describing RAPTORs approach.
|
|   |
Dongsup Kim, Dong Xu, Jun-tao Guo, Kyle Ellrott and Ying Xu. PROSPECT II: protein structure prediction program for genome-scale applications. Protein Engineering, , vol. 16 no. 9 pp. 641-650, 2003. [download 233.777 Kb] Building on earlier work, PROSPECT becomes a more complete tool.
|
|   |
Manesh Shah, Sergei Passovets, Dongsup Kim, Kyle Ellrott, Li Wang, Inna Vokler, Philip LoCascio, Dong Xu and Ying Xu. A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics , Vol. 19 no. 15 2003, Pages 1985-1996. [download 406.495 Kb] The authors integrate PROSPECT in a pipeline for prediction.
|
|   |
Jinbo Xu, Ming Li. Assessment of RAPTORs linear programming approach in CAFASP3. Proteins: Structure, Function, and Genetics, 53(S6):579-584. October 2003. [download 834.532 Kb] RAPTORs first real world test. RAPTOR captured the top spot and has held the title for several years.
|
|   |
Jinbo Xu, Ming Li, Dongsup Kim, Ying Xu. RAPTOR: Optimal Protein Threading by Linear programming. Journal of Bioinformatics and Computational Biology, 1(1):95-117. 2003. [download 1981.685 Kb] If you plan to cite RAPTOR in your work, please use this reference.
|
|   |
Dong Xu, Oakley H. Crawford, Philip F. LoCascio, Ying Xu. Application of PROSPECT in CASP4: Characterizing protein structures with new folds . Proteins: Structure, Function, and Genetics , Volume 45, Issue S5 , Pages 140 - 148, 2001. [download 442.997 Kb] Evaluation of prospect on hard targets -- proteins with new folds.
|
|   |
Ying Xu , Dong Xu. Protein threading using PROSPECT: Design and evaluation. Proteins: Structure, Function, and Genetics , Volume 40, Issue 3 , Pages 343 - 354, 2000. [download 551.38 Kb] The design and methodology behind PROSPECT Pro
|
|   |
Ying Xu, Dong Xu, Oakley H. Crawford, J.ralph Einstein, Frank Larimer, Ed Uberbacher, Michael A. Unseren and Ge Zhang . Protein threading by PROSPECT: a prediction experiment in CASP3. Protein Engineering, , Vol. 12, No. 11, 899-907, November 1999. [download 673.848 Kb] This is the original PROSPECT research paper.
|
 
|
 |