Approach
Product Line
Online Server

Tech Support
Request Prices
Download Demo

Benchmarks
User Comments
User Research

Superior
protein structure prediction
Software

RAPTOR

Benchmarks: Lindahls and Fischer et al.

Fischer et als benchmark set consists of 68 target sequences and 301 templates. RAPTOR ranks 56 pairs out of 68 pairs as top 1, achieving about 82% prediction rate. The fold recognition performance of RAPTOR was further tested on Lindahls benchmark set consisting of 976 protein sequences. By threading them all against all, there are 976 975 threading pairs. We measured RAPTORs performance in three similarity levels: fold, superfamily and family. Results are shown in Table 1 (data of other methods are taken from Shi et als paper). Prediction correctness is assessed based on the SCOP classification.

Table 1. The performance of RAPTOR at three different similarity levels

As shown in Table 1, RAPTOR performs better than other methods at all similarity levels (especially the fold level). At the family level, RAPTORs recognition performance is comparable to that of FUGUE, the best method for family and superfamily level other than RAPTOR. We may conclude that a strict treatment of pairwise interactions is necessary for fold and superfamily level recognition. For the family level, sequence (or profile) alignment could attain satisfactory results.

Specific Examples

We now present several structure prediction examples generated by RAPTOR in CAFASP3 and LiveBench6. Most of CAFASP3 targets experimental structures are not allowed to be published so far. Therefore we chose some targets from LiveBench.

Figure 2 (taken from CAFASP3s website, generated by RasMol and MaxSub) presents the superimposition between the experimental structure (grey color) and RAPTORs predicted structure (black color) of T0136 1. According to MaxSubs evaluation, 17 of 54 servers generated correct fold recognitions for this target and RAPTOR produced the best alignment among all. MaxSub could superimpose a segment of 118 residues (sequence size is 144) of the predicted structure to the experimental structure with an RMSD of mere 1.9.

Fig. 2. The superimposition of experimental structure (grey color) and prediction structure (black color) of CAFASP3 target T0136 1.

The following two figures are generated by RasMol based on evaluation results of LiveBench6. Figure 3 shows an almost perfect prediction for target 1ll8A. The alignment accuracy score measured by MaxSub is more than 9 (scale 10). Figure 4 presents a good structure prediction for target 1j53A, with an alignment accuracy score of more than 6. Considering the length of the target sequence, this prediction is considered very successful.

Fig. 3. The experimental structure (left) and the predicted structure (right) of 1kvzA.

Fig. 4. The experimental structure (left) and the predicted structure (right) of 1j53A.

Computing Efficiency Issues

A key advantage of our algorithm is that the memory requirement is just about O(|| n2), where is the edge set of the contact graph of a protein template structure and n is the query sequence length. The observed memory usage is 100~200M for most threading pairs. In practice, the computing time does not increase exponentially with respect to target sequence size. Figure 5 shows the CPU time of threading 100 sequences (chosen randomly from Lindahls benchmark) with size ranging from 25 to 572 to a typical template 119l of length 162 (here CPU time was measured on a single 400MHz MIPS R12000 CPU of a Silicon Graphics Origin 3800 system with 20GB of RAM). It shows that the computing time of our algorithm increases very slowly with respect to sequence size. In fact, we found that for real protein data, our relaxed linear programs directly output integral solutions 99% of the times and generated only a few branch nodes when the solution was fractional.

Figure 6 shows the CPU time used for the prediction of each CAFASP3/CASP5 target sequence. There were in total 62 targets and 3236 protein templates in our template database. It shows that CPU time increased very slowly with respect to sequence size except for one target (t0174) that took about 45 hours. After careful inspection, we found that there were 30 templates, each of which took about 15 hours threading time. These templates are up for further examination.

Conclusions

In this paper, we have presented performance benchmarks of the software package RAPTOR, which adopts a novel integer programming approach to treat pairwise interactions rigorously in protein threading. Experimental results show that RAPTOR performs very well in terms of alignment accuracy and fold recognition for FR targets. As for computational efficiency, RAPTOR is also much better than algorithms that treat the pairwise potentials strictly when dealing with templates with complex interaction topology and long sequences.

Fig. 5. CPU time of threading 100 sequences to template 119l (1s=0.01s).

Fig. 6. CPU time of threading 62 CAFASP3 target sequences to 3236 templates.

References

  • J. Xu, M. Li, D. Kim and Y. Xu, RAPTOR: Optimal Protein Threading by Linear Programming, Journal of Bioinformatics and Computational Biology, Vol. 1, No. 1 (2003) 95-117
  • J. Xu and M. Li, Assessing RAPTOR's New Linear Programming Approach for Fold Recognition in CAFASP3, Proteins: Structure, Function, and Genetics, 53(S6): 579-584. 2003