Day 3: Design‎ > ‎

RosettaRemodel basic tutorial

In this tutorial, we learn to modify a protein structure using RosettaRemodel. 
1. download a PDB file, in this case we use protein G as an example (PDB code: 1PGA)
   mkdir remodel
   cd remodel
   perl ~/tools/ 1pga A > 1pagA.pdb

2. Similar to what we had done before, we would relax the structure to conform it to the Rosetta forcefield
Repack or relax the template structures. Repacking is often necessary to remove small clashes identified by the score function as present in the crystal structure. See attached "REPACK.OPTIONS" and "REPACK.XML".

rosetta_scripts.default.linuxgccrelease @repack.options -s 1pgaA.pdb -parser:protocol 

Follow the instruction here:

3. Now the template is ready, we would create a "blueprint" that describes the content of the PDB file.
for this step, you need to download a perl script from the link at the bottom of this page (REMODEL FILES):

This script is part of the Rosetta release. Rosetta/tools/remodel  (normally you'd find rosetta source code in Rosetta/main)

For this tutorial using VirtualBox:
first you have to change the permission on this download to make it executable

cd ~/Downloads
chmod 755
cd ~/remodel

run the script in the same directory

~/Downloads/ -pdbfile 1pgaA_0001.pdb > template.blueprint

The blueprint file is simply a three column text file (nothing fancy) that encodes the peptide chain position, native amino acid type and a dot (".").  The length of the blueprint should match the length of PDB, and for convenience, it is strongly recommended that the PDB start from 1. 

4. First we can use Remodel to report the secondary structure of the input structure by running it with no other modification.  

 ~/Rosetta/source/bin/remodel.linuxgccrelease -s 1pgaA_0001.pdb -blueprint template.blueprint -jd2:no_output 
In the log file, you will see:

protocols.forge.remodel.RemodelMover: apply(): input PDB dssp assignment: (based on start structure)

The blueprint file doubles as a "resfile" which can specify packing behavior of side chains.  For each residue you want to mutate, you can use all of the commands available for a resfile.
for resfile commands see this website:

Based on the secondary structure and what you know about the protein, you can make mutations to the model simply by editing the blueprint file.  Here's an example of mutating residue 10 to a phenylalanine:

use your favorite text editor, change the line
10 K .
10 K . PIKAA F

and run the same command as what you have done before:
~/Rosetta/source/bin/remodel.linuxgccrelease -s 1pgaA_0001.pdb -blueprint template.blueprint -jd2:no_output

you get five PDB files, because the default behavior of Remodel is to run 10 trajectories and save the best 5 results. 
Open any of the PDB files, you will see that the residue number 10 is now a Phenylalanine and not Lysine (native). 

[TASK3] MODIFY LOOPS  -- longer loops
Looking back at the DSSP secondary structure assignment you had run earlier, residues 20, 21, 22 are loops, denoted with "L".
Now we want to make the loop longer, but does not change anything else of the protein.

Again, you open the blueprint file. Then change the three lines 
20 A .
21 V .
22 D .
20 A L
0 x L
0 x L
0 x L
0 x L
22 D L
This tells the program that you want to build 4 resides between residues 20 and 22. 

Run the same command again, but this time we restrict the number of trajectories to 1, and also to make sure the resulting PDB of this run does not overwrite the previous results, we give this run a prefix "long_loop."
~/Rosetta/source/bin/remodel.linuxgccrelease -s 1pgaA_0001.pdb -blueprint template.blueprint -jd2:no_output -num_trajectory 1 -save_top 1 -out:prefix long_loop -remodel:quick_and_dirty
The first two new flags allows you to control the number of trajectories and the number of best ranked results to output. The third one assigns a prefix to the output.  The last one skips high-resolution refinement steps.

You will get a result called "long_loop_1.pdb" if Remodel can build the loop with no problem.
[TASK3] MODIFY LOOPS  -- shorter loops
First make a copy of the blueprint file so we don't lose the previous version

cp template.blueprint short.bp

Now try changing your blueprint assignment for the region to:
20 A L
22 D L

Run the same command again, but change the blueprint to the new copy, short.bp, and change the prefix to short_loop
~/Rosetta/source/bin/remodel.linuxgccrelease -s 1pgaA_0001.pdb -blueprint short.bp -jd2:no_output -num_trajectory 1 -save_top 1 -out:prefix short_loop -remodel:quick_and_dirty

It is very likely that the program can not complete the task.  It is because you can not randomly delete residues and expect the gap to be filled.  Now let's try to allow more residues in the region to move

17 T L
18 T L
19 E L
20 A L
22 D L
23 A L
24 A L
25 T L

Try the command again and see what happens. 

If the run is successful, you will get a pdb file "short_loop_1.pdb".   
Now if you check the number of residues in either long_loop_1.pdb or short_loop_1.pdb, they should agree with the number of lines in each respective blueprint file.  This allows you to not having to do the math for changing the length of a protein.

Let's make a new blueprint file, this time call it disulf.bp

 ~/Downloads/ -pdbfile 1pgaA_0001.pdb > disulf.bp

We are going to treat this file a little differently.  We are going to assign every position in this file, by replacing all of the dots "." to either H, E, or L.  They will all behave exactly the same way regardless of what you use because we are going to bypass the low-resolution building stage. These assignments are then just to tell the program to include them when scanning for disulfide forming pairs.

run the same command as previously, but we now  change again the prefix and add three new flags to handle disulfides
~/Rosetta/source/bin/remodel.linuxgccrelease -s 1pgaA_0001.pdb -blueprint disulf.bp -jd2:no_output -num_trajectory 1 -save_top 1 -out:prefix disulfide -remodel:quick_and_dirty -build_disulf -match_rt_limit 1 -bypass_fragments 

-build_disulf turns on the disulfide building mode
-match_rt_limit 1 defines the threshold for a disulfide match to 1 Å rmsd
-bypass_fragments because we only want to scan a fixed structure for disulfides.  This step skips low-resolution building.