|
||
|
Superior
protein structure prediction
Software
RAPTOR RAPTOR User Manual 4.0 for Windows
Index
Introduction
What is Homology Modeling (HM)? Suppose you know the amino acid sequence of a target protein and you want to know its three-dimensional (3D) structure, yet to be solved experimentally by X-ray crystallography or NMR. An underlying premise for homology modeling is that a set of proteins are homologous, their 3D structures are more conserved than their sequences. The homology modeling method constructs the three-dimensional structure for a target sequence by using the homologous proteins of the target. General Procedures to Create Homologous Models
Does Homology Modeling Always Work? Given a target sequence, if there are no homologous proteins found in structure database, you cannot use homology modeling. In practice, when the sequence identity in the alignment is below 25%, the homology is insignificant and you can not expect to obtain a good homologous model from homology modeling. Why Fold Recognition (Protein Threading) Fold recognition is based on the observation that the number of distinct structures do not grow as fast as the PDB, as a whole, and 90% of the new structures submitted to PDB in the past several years have similar structure folds to some structures in PDB. Currently, there are more than 1000 folds. Given a Target Sequence
Fold recognition is most effective for hard targets that homology modeling cannot handle. RAPTOR (RApid Protein Threading predictOR) is a protein threading software package developed by Dr. Jinbo Xu and Dr. Ming Li. It applies novel Linear Programming techniques to the protein threading problem and has achieved great success. RAPTOR minimizes the scoring function (i.e. seeks for the optimal alignment between sequence and template) by integer programming method. The scoring function used by RAPTOR rigorously takes the pair-wise contact potential into account. The threading problem is formulated as a large scale integer programming problem and RAPTOR can find a global optimal alignment. It turns out that RAPTOR can produce high accuracy alignments and is most effective for hard targets. Installation
Note that this installation guide is for Windows only. Computing Requirements To run RAPTOR, the PC must have at least 512M of memory. For time efficiency, multiple high speed CPUs are preferred. After installation, the RAPTOR package will take up to 3G of space on the hard drive. Required Files: RAPTOR1.exe Executable and Template Library How to Install RAPTOR First create a temporary directory on your hard drive. Copy all the installation files to the temporary direction and enter that directory. If You Do Not Have RAPTOR2.exe (or Want to Download REFSEQ or NR Database by Yourself) PSI-BLAST is used internally by RAPTOR. Database searched by PSI-BLAST can be either NR or REFSEQ which is a representative subset of NR and half the size of NR. By default, RAPTOR comes with REFSEQ which is compressed in RAPTOR2.exe. Alternatively, you can download REFSEQ or NR by yourself and install it manually, which is quite straightforward. For that install RAPTOR1.exe first by running Install.sh. Then you can download NR or REFSEQ by yourself from here. Here are instructions for downloading NR database: Alternatively, you can download REFSEQ database which is much smaller than NR. Registration After the installation, when you run RAPTOR with a sequence for the first time, a registration window will pop up asking you for a key. Organization of Directories (and Important files) RAPTOR
bin\                          Binaries Quick Tour Load Sequence To test RAPTOR, you can load a test sequence and run it with RAPTOR. To do that, you click “File” in the menu and select “Load Sequence/XML File”. In the file browser, you can go to RAPTOR\data\seq\ and load one test sequence into the work space. After that, you will see an icon on the left panel and the content of the sequence will be displayed in a window on the right. Alternatively, you can paste your fasta-format sequence to notepad, save it as a .seq file and load it into RAPTOR. Run Sequence Then you can select “Run” in the menu and select “Run Selected” from the dropdown menu. A configuration panel will pop up. The only option that you may need to change is the path of the database used by PSI-BLAST depending on how you install the database. Click “Advanced” tab and find the “Database for PSI-BLAST”. If you have installed NR database, the path should be [home directory]\RAPTOR\data\nr\nr. If you have installed RefSeq database, the path should be [home directory]\RAPTOR\data\RefSeq\refseq_protein. where [home directory] is the path of your home directory. Click “Run” and RAPTOR will start to run. It will take about one hour to run one sequence depending on the sequence length. After the sequence is finished, a tabbed window will appear on the right. You will find PSP matrix obtained by PSI-BLAST, predicted secondary structure, ranking list of templates and all the alignments. Menu System Launch RAPTOR In RAPTOR\, run RAPTOR_GUI.bat to launch RAPTOR GUI or double click the RAPTOR icon on the Window's desktop. File File->Load File->Close Selected File->Close All File->Delete Output File->Exit Edit Edit->RAPTOR Config Run Run->Run Selected Run->Run All Window This will select different window from the drop down menu. Help This will launch a browser to allow you to read this manual or visit BSI website. Configuration Panel Basic Options Threading Method 3D Modeling Output Path Output Files Keep raw Files Advanced Options Template Settings Database for PSI-BLAST PDB File Viewer Template Ranking Method Navigation Panel and Output Panel Navigation Panel The left hand side is the navigation panel. Each Sequence is represented by a Output Window PSI-BLAST Profile The output window is composed of a set of tab windows. The first tab window is PSI-BLAST profile. It is a 20 row matrix, each row corresponding to some amino acid. The frequency is from 0 to 1. To make it easier for you to read the profile, the frequency is divided into 10 segments. Each segment will be represented by a color. In this way, the matrix can be represented by a rectangle in the window which is composed of many small square cells. The color of cell is determined by the occurring frequency. You can easily find out the conserved residues and non-conserved residues by differentiating colors. Secondary Structure Different colors are used to represent helices, beta sheets, loops (add color in html). Some acronyms Rank by Score Top Window Each method is represented by a folder icon. If you double click it, the templates will be displayed, ranked by their E-values. The smaller the E-value, the better. Also displayed are other scored used internally. Table fields: Bottom Window If you click a template, its alignment will be displayed in a drop down window. The color of the template is consistent with its actual secondary structure and the color of the target is consistent with its predicted secondary structure. If you click “View 3D structure with RasMol”, a RasMol window will pop up and the structure will be displayed. If you click “Export pdb file”, a file browser will pop up and you can save the 3D structure in a pdb file. If you click the “Functional Annotation” tab, a window will drop down and show the functional information extracted from the template pdb file. If you click the template name, a browser will pop up and connect to rcsb PDB website. Alignments The left side of the toolbar allows you to select some session(s) and specify how many templates you want to display. The right side of the tool bar allows you to compare any two alignments. To specify an alignment, you can use method name and its rank. Error This window displays the errors that occurred during the run. Due to incorrectly generated template files or other reasons, the target sequence may not be able to be threaded to some templates. These templates can be ignored and will not have significant influence on the threading results, considering their number is so small. Using RAPTOR
Input File and Output File RAPTOR accept FASTA format sequence file as input. To create a .seq file, copy your FASTA-format sequence and paste it to Notepad and save it as a .seq file (instead of .txt)
To load a sequence file, click “File” menu and select “Load File”. In the popup file browser, select the right file filter and display all .seq files. Here is an example of FASTA format sequence: All the raw files of RAPTOR are stored in a directory whose name is the sequence name in the output directory. Suppose the sequence name is XXXX. XXXX The structure of output directory: Where [method name] can be NoCore, NPCore, or IP. Directories embraced by <> are only generated when the corresponding checkbox is selected and the path is specified in the configuration panel. PSI-BLAST Database In RAPTOR, PSI-BLAST is used to generate position specific matrix (sequence profile) of a target sequence. By default, PSI-BLAST uses NR database, but the size of NR database is very large (1 G after compression). So an alternative database is RefSeq, which is a curetted non-redundant sequence database of genomes, transcripts and proteins maintained by NCBI. RefSeq is much smaller, about half size of NR. We conducted a comparison of the two. The profiles obtained from them are almost the same. So you can always use RefSeq to replace NR. NR database can be downloaded from ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz and RefSeq can be downloaded from ftp://ftp.ncbi.nih.gov/blast/db/refseq_protein.tar.gz. Threading Methods Dynamic Programming vs. Integer Programming NoCore vs. NPCore Running One Sequence with Different Methods The fist step of RAPTOR is to run PSI-BLAST. If you already run NoCore, then when you run NPCore, this step will be skipped, as the PSI-BLAST is stored in PSP/ under the output directory. If the program finds those files, PSI-BLAST will be skipped. This will save running time. Judging Prediction Quality from Alignment First, you can compare the actual secondary structure of the template with the predicted secondary structure of the query sequence. As the accuracy of secondary structure is around 80%, this is an important measure of the prediction quality. Then you can look at the gaps in the alignment. The fewer the gaps, the better the prediction quality. The shorter the gaps, the better the prediction quality. Ending gaps normally can be ignored. Sometimes, the ending gaps may be very long. This means the program can only give good prediction for part of the query sequence. What if the ending gaps are too long? In many cases, for long sequences, they may have more than one domain. Thus the ending gaps may be very long. You can cut them into domains first and run each domain with RAPTOR. Using Modeller If you are an academic user, you can download Modeller for free from here. And you need to register here to get a license key in order to install Modeller. After you install it, you also need to specify the Modeller path in the configuration panel, i.e. C:\modeller8v2\bin\mod8v2. As Modeller8v2 has used python internally, it may give the follow error message while running, due to a bug in python: 'import site' failed; use -v for traceback” Just ignore it. To use Modeller8 on Windows, there are several Modeller environmental variables to be defined. First, click Start -> All Programs -> Modeller8v2 -> Modeller, a black command line window will appear. Type "set" and press return, all environmental variables will be displayed. You will find: Customizing Templates RAPTOR/data/parameters/fssp.list stores the names of all the templates in the template library. If you are interested in a specific template, you can save its name in another file and specify the path in the configuration panel. You can also create your own template library. You need a pdb file and generate PSM and fssp file from it. Then put PSM file in RAPTOR/data/PSM and fssp file in RAPTOR/data/fssp. Using RasMol The default viewer for pdb files is RasMol. The default display mode is a cartoon. The structure is colored according to the secondary structure. You can rotate the structure by pressing and dragging the left key of the mouse. To move the structure, press the right mouse key and drag. To shrink or enlarge the display, press “shift” key, press the right mouse key and drag. For a full reference of RasMol, you can click here. If you run RAPTOR on a remote machine, rasmol_32BIT may not work properly with the X server. Instead, you need to configure the configuration panel to run a wrapper shell program “rasmol” which will launch RasMol. Reporting Bugs If you find any problem when you run RAPTOR, you can report the problem to us and we will try to help you out as soon as possible. RAPTOR’s configuration files are in .raptor/ under your home directory. To report a bug, please send us the two .conf files in .raptor/. You can make some snapshots of the RAPTOR GUI and the terminal from which you launched RAPTOR and send them to us. RAPTOR Reference List Feng Jiao, Jinbo Xu, Libo Yu, Dale Schuurmans. Protein Fold Recognition Using Gradient Boost Algorithm. Accepted by CSB 2006. Jinbo Xu. Protein Fold Recognition by Predicted Alignment Accuracy. ACM/IEEE Transactions on Computational Biology and Bioinformatics, 2(2):157-165. 2005. Jinbo Xu, Ming Li, Dongsup Kim, Ying Xu. RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1:1(2003) 95-117. Jinbo Xu and Ming Li. Assessment of RAPTOR's linear programming approach in CAFASP3. Proteins: Structure, Function, and Genetics, 53(S6): 579--584, Oct. 2003. Invited paper for CASP5, voted by peers as the "most innovative method in CASP5". Bioinformatics Solutions Inc. Technical Support Email: raptor@bioinfor.com Phone: 1-519-8858288 ext. 16   |
|
|
|
|
|
|
|
|