Approach
Product Line
Online Server

Tech Support
Request Prices
Download Demo

Benchmarks
User Comments
User Research

Superior
protein structure prediction
Software

RAPTOR


Fold Recognition/Protein Threading Basics

This page is designed to introduce the topic of homology modeling and fold recognition. The information is not novel and can be found in most up-to-date protein structure text books and websites.

Before we look at homology modeling, we should first understand its building blocks. A protein sequence is the order of amino acid residues connected by peptide bonds. The structure of the sequence can be displayed in 3 ways.

  • Primary Structure: the exact atomic composition expressed with each amino acid represented by a letter.
  • Secondary Structure: the local arrangement of the structure, such as with the helix, beta strand and loop. These shapes are created by the hydrogen bonds of the biopolymer.
  • Tertiary Structure: the overall shape, also known as it’s fold, a specific three dimensional structure. The tertiary structure is pretty much the organization of the secondary structure. The fold is derived by the structure seeking native confirmation, or rather, a state of minimized energy.

Using each method, we can learn a lot about its particular functions; however, not every structure has been solved. This is where homology fits in.

Sequence Homology is based on the fact that proteins with similar sequences produce similar three dimensional structures.

A homology searching software, such as PatternHunter, aligns a query sequence (one without a known structure) to a sequence with a known structure, located within the Protein Data Bank (PDB). There is also the ability to search sequences in the NR database. The alignment is accomplished by intermittently placing spaces within the query sequence, until an optimal match has been derived.

For homology modeling, a software will search a sequence database by aligning (through spacing) the query sequence with each sequence in the database. Homology modeling scores each position of the alignment then summarizes the results. If the sequence homology is less than 25%, modeling will fail; however, greater than 50% homology is considered very good. Alignments are scored with such functions as z scores or e values. Z scores are preferred to be high, whereas e values are preferred to be low. The optimum alignment, along with its known template (structure) can then be found and fed to a modeling software, to produce a three dimensional structure. Remember, how similar sequences can make it appropriate to use the known sequence’s structure as a template to generate the structure of the query sequence.

Fold Recognition, also known as protein threading, uses both sequence homology and structure homology. Fold recognition is based on folds; these are families of tertiary structures or structure families. More than 1000 folds have been found in nature and most notably, proteins in the same fold have similar structures.

Structure homology includes certain key terms, such as sequence homology, secondary structure (the template), solvent accessibility (exposed/buried state) and pair-wise contact potential (interaction between two residues in a three dimensional space. Each of these attributes to a more accurate prediction of the tertiary structure.

Within structure homology, there are two types of targets: easy and hard targets. An easy target is when a sequence with similar sequences is located in the database, or when significant homology can be found.
A hard target is a sequence without any significantly similar sequences in the database.

RAPTOR makes use of both sequence homology and structure homology when generating alignments. So when there is no significant sequence homology found in the database, protein threading can still find a good structure template for the target sequence by using structure homology. Therefore RAPTOR is more effective for hard targets. Software such as PSI-BLAST does not work here as it does not consider structure homology, only sequence homology.

For fold recognition, RAPTOR threads the query sequence into each template (structure) in a known structure representative database by optimizing a scoring function, creating a bunch of alignments. This representative database is a subset of PDB with more than 8000 known sequences and structures. Using a representative database means not every structure found in the original PDB will be used to compare against the sequence, however, using the principle that similar sequences produce similar structures, we are able to predict with relatively high confidence that the representative data will produce very similar results compared with running the entire database. In short, the representative database is necessary to reduce run time and does not significantly restrict the final predicted structure. As noted before, fold recognition takes into consideration sequence homology and structure homology. Calculate the z score or e value for each alignment, then pick the best alignment and feed the optimum sequence structure alignment to a modeling application to give a three dimensional structure.

Why is RAPTOR better than the competition?

The two dominating factors include the type of programming it is comprised of, integer programming and support vector machine for ranking after threading. RAPTOR uses integer programming to thread a sequence to a template while most software uses dynamic programming. The advantage of integer programming is that when pair-wise contact potential is included in the scoring function, the integer programming algorithm can find the global optimal alignment, however, dynamic programming cannot. Further, most dynamic programming-based software ignore pair-wise contact potential in their scoring functions. The only time dynamic programming can identify a globally optimal alignment is when there is no pair-wise contact potential.

Another difference is that after threading, RAPTOR uses support vector machine (SVM), a form of machine learning from computer science, to pick out the best alignment while many others strictly use z score (a statistical method). The Support Vector Machine method, producing e values is a more sensitive solution. In short, RAPTOR can generate better alignments compared to others.

Click here to check out the RAPTOR GUI for yourself.