|
||
|
Superior
protein structure prediction
Software
RAPTOR RAPTOR User Manual 4.0 for Linux Index
Introduction
What is Homology Modeling (HM)? Suppose you know the amino acid sequence of a target protein and you want to know its three-dimensional (3D) structure, yet to be solved experimentally by X-ray crystallography or NMR. An underlying premise for homology modeling is that a set of proteins are homologous, their 3D structures are more conserved than their sequences. The homology modeling method constructs the three-dimensional structure for a target sequence by using the homologous proteins of the target. General Procedures to Create Homologous Models
Does Homology Modeling Always Work? Given a target sequence, if there are no homologous proteins found in structure database, you cannot use homology modeling. In practice, when the sequence identity in the alignment is below 25%, the homology is insignificant and you can not expect to obtain a good homologous model from homology modeling. Why Fold Recognition (Protein Threading) Fold recognition is based on the observation that the number of distinct structures do not grow as fast as the PDB, as a whole, and 90% of the new structures submitted to PDB in the past several years have similar structure folds to some structures in PDB. Currently, there are more than 1000 folds. Fold Recognition involves the following procedures:
Fold recognition is most effective for hard targets that homology modeling cannot handle. RAPTOR (RApid Protein Threading predictOR) is a protein threading software package developed by Dr. Jinbo Xu and Dr. Ming Li. It applies novel Linear Programming techniques to the protein threading problem and has achieved great success. RAPTOR minimizes the scoring function (i.e. seeks for the optimal alignment between sequence and template) by integer programming method. The scoring function used by RAPTOR rigorously takes the pair-wise contact potential into account. The threading problem is formulated as a large scale integer programming problem and RAPTOR can find a global optimal alignment. It turns out that RAPTOR can produce high accuracy alignments and is most effective for hard targets. Installation
Note that this installation guide is for Linux only. Computing Requirements To run RAPTOR, the PC must have at least 512M of memory. For time efficiency, multiple high speed CPUs are preferred. The RAPTOR package will take up to 3G of space on the hard drive. Required Files: RAPTOR1.tar.gz Executable and Template Library How to Install RAPTOR First create a temporary directory on your hard drive. Copy all the installation files to the temporary direction and enter that directory. You may need to run “chmod u+x *.sh” to make the two script files executable. If You Do Not Have RAPTOR2.tar.gz (or Want to Download REFSEQ or NR Database by Yourself) PSI-BLAST is used internally by RAPTOR. Database searched by PSI-BLAST can be either NR or REFSEQ which is a representative subset of NR and half the size of NR. By default, RAPTOR comes with REFSEQ which is compressed in RAPTOR2.tar.gz. Optionally, you can download REFSEQ or NR by yourself and install it manually, which is quite straightforward. For that, Install RAPTOR1.tar.gz first. Alternatively, you can download REFSEQ database which is much smaller than NR. Registration When you run install.sh, after the installation is finished, a registration window will pop up and you need to input the key obtained from BSI to register. If you do not register during the installation, the registration window will pop up again before you run a protein sequence. Organization of Directories (and Important files) RAPTOR
bin\                          Binaries Quick Tour
Load Sequence To test RAPTOR, you can load a test sequence and run it with RAPTOR. To do that, you click “File” in the menu and select “Load Sequence/XML File”. In the file browser, you can go to RAPTOR/data/seq/ and load one test sequence into the work space. After that, you will see an icon on the left panel and the content of the sequence will be displayed in a window on the right. Run Sequence Then you can select “Run” in the menu and select “Run Selected” from the dropdown menu. A configuration panel will pop up. The only option that you may need to change is the path of the database used by PSI-BLAST depending on how you install the database. Click “Advanced” tab and find the “Database for PSI-BLAST”. If you have installed NR database, the path should be [home directory]/RAPTOR/data/nr/nr. If you have installed RefSeq database, the path should be [home directory]/RAPTOR/data/RefSeq/refseq_protein. where [home directory] is the path of your home directory. Click “Run” and RAPTOR will start to run. It will take about one hour to run one sequence depending on the sequence length. After the sequence is finished, a tabbed window will appear on the right. You will find PSP matrix obtained by PSI-BLAST, predicted secondary structure, ranking list of templates and all the alignments. Menu System
Launch RAPTOR In RAPTOR/, run RAPTOR_GUI.sh to launch RAPTOR GUI. File File->Load File->Close Selected File->Close All File->Delete Output File->Exit Edit Edit->RAPTOR Config Run Run->Run Selected Run->Run All Window This will select different window from the drop down menu. Help This will launch a browser to allow you to read this manual or visit BSI website. Configuration Panel Basic Options Threading Method 3D Modeling Output Path Output Files Keep raw Files Advanced Options Template Settings Database for PSI-BLAST PDB File Viewer Template Ranking Method Navigation Panel and Output Panel
Navigation Panel The left hand side is the navigation panel. Each Sequence is represented by a Output Window PSI-BLAST Profile The output window is composed of a set of tab windows. The first tab window is PSI-BLAST profile. It is a 20 row matrix, each row corresponding to some amino acid. The frequency is from 0 to 1. To make it easier for you to read the profile, the frequency is divided into 10 segments. Each segment will be represented by a color. In this way, the matrix can be represented by a rectangle in the window which is composed of many small square cells. The color of cell is determined by the occurring frequency. You can easily find out the conserved residues and non-conserved residues by differentiating colors. Secondary Structure Different colors are used to represent helices, beta sheets, loops (add color in html). Some acronyms Rank by Score Top Window Each method is represented by a folder icon. If you double click it, the templates will be displayed, ranked by their E-values. The smaller the E-value, the better. Also displayed are other scored used internally. Table fields: Bottom Window If you click a template, its alignment will be displayed in a drop down window. The color of the template is consistent with its actual secondary structure and the color of the target is consistent with its predicted secondary structure. If you click “View 3D structure with RasMol”, a RasMol window will pop up and the structure will be displayed. If you click “Export pdb file”, a file browser will pop up and you can save the 3D structure in a pdb file. If you click “Functional Annotation” tab, a window will drop down and show the functional information extracted from the template pdb file. If you click the template name, a browser will pop up and connect to rcsb PDB website. Alignments The left side of the toolbar allows you to select some session(s) and specify how many templates you want to display. The right side of the tool bar allows you to compare any two alignments. To specify an alignment, you can use method name and its rank. Error This window displays the errors that occurred during the run. Due to incorrectly generated template files or other reasons, the target sequence may not be able to be threaded to some templates. These templates can be ignored and will not have significant influence on the threading results, considering their number is so small. Using RAPTOR
Input File and Output File RAPTOR accept FASTA format sequence file as input. To load a sequence file, click “File” menu and select “Load File”. In the popup file browser, select the right file filter and display all .seq files. Here is an example of FASTA format sequence: >2acy(len=98) All the raw files of RAPTOR are stored in a directory whose name is the sequence name in the output directory. Suppose the sequence name is XXXX. XXXX The structure of output directory: Where [method name] can be NoCore, NPCore, or IP. Directories embraced by <> are only generated when the corresponding checkbox is selected and the path is specified in the configuration panel. PSI-BLAST Database In RAPTOR, PSI-BLAST is used to generate position specific matrix (sequence profile) of a target sequence. By default, PSI-BLAST uses NR database, but the size of NR database is very large (1 G after compression). So an alternative database is RefSeq, which is a curetted non-redundant sequence database of genomes, transcripts and proteins maintained by NCBI. RefSeq is much smaller, about half size of NR. We conducted a comparison of the two. The profiles obtained from them are almost the same. So you can always use RefSeq to replace NR. NR database can be downloaded from ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz and ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz RefSeq can be downloaded from ftp://ftp.ncbi.nih.gov/blast/db/refseq_protein.tar.gz. Threading Methods Dynamic Programming vs. Integer Programming NoCore vs. NPCore Running One Sequence with Different Methods The fist step of RAPTOR is to run PSI-BLAST. If you already run NoCore, then when you run NPCore, this step will be skipped, as the PSI-BLAST is stored in PSP/ under the output directory. If the program finds those files, PSI-BLAST will be skipped. This will save running time. Judging Prediction Quality from Alignment First, you can compare the actual secondary structure of the template with the predicted secondary structure of the query sequence. As the accuracy of secondary structure is around 80%, this is an important measure of the prediction quality. Then you can look at the gaps in the alignment. The fewer the gaps, the better the prediction quality. The shorter the gaps, the better the prediction quality. Ending gaps normally can be ignored. Sometimes, the ending gaps may be very long. This means the program can only give good prediction for part of the query sequence. What if the ending gaps are too long? In many cases, for long sequences, they may have more than one domain. Thus the ending gaps may be very long. You can cut them into domains first and run each domain with RAPTOR. Using Modeller If you are an academic user, you can download Modeller for free from here. And you need to register here to get a license key in order to install Modeller. After you install it, you also need to specify the Modeller path in the configuration panel, i.e., /home/usr/modeller8v2/bin/mod8v2 under linux and C:\modeller8v2\bin\mod8v2 under Window. As Modeller8v2 has used python internally, it may give the follow error message while running, due to a bug in python: 'import site' failed; use -v for traceback” Just ignore it. Customizing Templates RAPTOR/data/parameters/fssp.list stores the names f all the templates in the template library. If you are interested in a specific template, you can save its name in another file and specify the path in the configuration panel. You can also create your own template library. You need a pdb file and generate PSM and fssp file from it. Then put PSM file in RAPTOR/data/PSM and fssp file in RAPTOR/data/fssp. Using RasMol The default viewer for pdb files is RasMol. The default display mode is cartoon. The structure is colored according to the secondary structure. You can rotate the structure by pressing and dragging the left key of the mouse. To move the structure, press the right mouse key and drag. To shrink or enlarge the display, press “shift” key, press the right mouse key and drag. For a full reference of RasMol, you can visit http://www.umass.edu/microbio/rasmol/. If you run RAPTOR on a remote machine, rasmol_32BIT may not work properly with the X server. Instead, you need to configure the configuration panel to run a wrapper shell program “rasmol” which will launch RasMol. RAPTOR Reference List Feng Jiao, Jinbo Xu, Libo Yu, Dale Schuurmans. Protein Fold Recognition Using Gradient Boost Algorithm. Accepted by CSB 2006. Jinbo Xu. Protein Fold Recognition by Predicted Alignment Accuracy. ACM/IEEE Transactions on Computational Biology and Bioinformatics, 2(2):157-165. 2005. Jinbo Xu, Ming Li, Dongsup Kim, Ying Xu. RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1:1(2003) 95-117. Jinbo Xu and Ming Li. Assessment of RAPTOR's linear programming approach in CAFASP3. Proteins: Structure, Function, and Genetics, 53(S6): 579--584, Oct. 2003. Invited paper for CASP5, voted by peers as the "most innovative method in CASP5". Bioinformatics Solutions Inc. Technical Support   |
|
|
|
|
|
|
|
|