Tutorial: Protein Folding with Co-evolution restraints!

You can see files generated here:

WARNING BEFORE YOU START: TO FIX frag_pick script do:
wget -O ~/tools/pick_frag/pick_frag.pl https://gremlin2.bakerlab.org/rosninja2016/KTSC/pick_frag.pl

In this tutorial we will try to predict the structure of the following sequence:
  • From PFAM (PF13619):
    This short domain is named after Lysine tRNA synthetase C-terminal domain. It is found at the C-terminus of some Lysyl tRNA synthetases as well as a single domain in bacterial proteins. The domain is about 60 amino acids in length and contains a reasonably conserved YXY motif in the centre of the sequence. The function of this domain is unknown but it could be an RNA binding domain.
  1. Setup a working directory
    mkdir KTSC
    cd KTSC
  2. Submit the sequence to the GREMLIN (co-evolution) server: http://gremlin.bakerlab.org/submit.php
    In this case, I've already submitted the sequence and got the results: 1475969195
  3. Download the restraints to use in rosetta by click on "Generate restraints for ROSETTA Structure Modeling"
  4. Pick fragments
    • Download the alignment
      wget -O KTSC.fas "http://gremlin.bakerlab.org/sub_fasta.php?id=1475969195"
    • Run script!
      perl ~/tools/pick_frag/pick_frag.pl -msa KTSC.fas
      This script does the following:
      • Generates a PSSM (Position specific scoring matrix) from the input alignment using csbuild to add pseudo-counts.
      • The PSSM is used to predict the secondary structure using PSIPRED
      • The fragment picker picks both 3mers and 9mers based on similarity to the PSSM and secondary structure prediction.
  5. Create a "flags" file:
    -abinitio::fastrelax -abinitio::increase_cycles 10 # increase for larger proteins -abinitio::rg_reweight 0.5 # radius of gyration; set to 0.0 for long extended proteins -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -in:file:native 4RGIA.pdb -in:file:fasta KTSC.fas.fasta -in:file:frag3 KTSC.fas.200.3mers -in:file:frag9 KTSC.fas.200.9mers -nstruct 10 -constraints:cst_weight 3 -constraints:cst_file 1475969195_cb.cst -constraints:cst_fa_weight 3 -constraints:cst_fa_file 1475969195_cb.cst
  6. Launch pymol (because we want to watch!) This script launches a "listening" script inside pymol that can take data from rosetta and display it. 
    run /home/work/Rosetta/source/src/python/bindings/PyMOLPyRosettaServer.py
  7. Launch abinitio relax!
    AbinitioRelax.default.linuxgccrelease @flags -show_simulation_in_pymol 1
  8. Extract models
    I've already generated lots of decoys for you guys to play with:

    wget https://gremlin2.bakerlab.org/rosninja2016/KTSC/default.out.zip
    unzip default.out.zip
    By default the output of the protocol saves the models to a file called "default.out". All your models are saved in this single file.  To extract ALL models (not a good idea), you can run the following command:
    extract_pdbs.default.linuxgccrelease -in:file:silent default.out
    Instead you would want to only extract the top 5 (or more models). grep for the SCORE line from the default.out, sort by the score column and get the top 5 names:
    grep ^SCORE default.out | awk '{print $2,$NF}' | sort -k1n | head -5 
    give the extract app the top 5 tags:
    extract_pdbs.default.linuxgccrelease \
    -in:file:silent default.out
    -in::file::tags S_00000041_27 S_00000052_22 S_00000054_5 S_00000052_49 S_00000024_34
  9. Lets Hybridize!
    mkdir hyb; cd hyb
    create "hyb.flags" file:
    -frag_weight_aligned 0.1
    -beta # this flag enables the latest rosetta score function
    -in:file:fasta ../KTSC.fas.fasta
    -in:file:native ../4RGIA.pdb
    -parser:protocol hyb.xml # rosetta script (see below)
    -relax:jump_move true
    -default_max_cycles 200
    -relax:min_type lbfgs_armijo_nonmonotone
    -hybridize:stage1_probability 1.0
    -hybridize:stage1_4_cycles 400
    -nstruct 1
    create "hyb.xml" file:
            <stage1 weights="stage1.wts" symmetric=0>
                <Reweight scoretype=atom_pair_constraint weight=3/>
            <stage2 weights="stage2.wts" symmetric=0>
                <Reweight scoretype=atom_pair_constraint weight=3/>
            <fullatom weights="beta_cart.wts" symmetric=0>
                <Reweight scoretype=atom_pair_constraint weight=3/>
            <Hybridize name=hybridize stage1_scorefxn=stage1 stage2_scorefxn=stage2 fa_cst_file="../1475969195_cb.cst" fa_scorefxn=fullatom batch=1 stage1_increase_cycles=2.0 stage2_increase_cycles=1.0 linmin_only=0 skip_long_min=1>
                <Fragments 3mers="../KTSC.fas.200.3mers" 9mers="../KTSC.fas.200.9mers"/>
                <Template pdb="../S_00000041_27.pdb" weight="1" cst_file="../1475969195_cb.cst"/>
                <Template pdb="../S_00000052_22.pdb" weight="1" cst_file="../1475969195_cb.cst"/>
                <Template pdb="../S_00000054_5.pdb"  weight="1" cst_file="../1475969195_cb.cst"/>
                <Template pdb="../S_00000052_49.pdb" weight="1" cst_file="../1475969195_cb.cst"/>
                <Template pdb="../S_00000024_34.pdb" weight="1" cst_file="../1475969195_cb.cst"/>
            <Add mover=hybridize/>
        <OUTPUT scorefxn=fullatom/>
    Download stage weight files:
    wget https://gremlin2.bakerlab.org/rosninja2016/rattata/stage1.wts
    wget https://gremlin2.bakerlab.org/rosninja2016/rattata/stage2.wts
  10. Lets hybridize:
    rosetta_scripts.default.linuxgccrelease @hyb.flags