Tutorial: Protein-protein docking

  1. For this tutorial we'll be docking two proteins from a known complex E.coli Dha kinase DhaK-DhaL complex (PDB code: 3pnl). The input and outputs of the tutorial can be found here: https://gremlin2.bakerlab.org/rosninja2016/PPI/ (or download zip)
  2. Setup directory
    mkdir PPI
    cd PPI
  3. Download needed files:
    wget -O ~/tools/get_pdb.pl https://gremlin2.bakerlab.org/rosninja2016/tools/get_pdb.pl
    perl ~/tools/get_pdb.pl 3pnk A > 3pnkA.pdb

    perl ~/tools/get_pdb.pl 2btd A > 2btdA.pdb
    perl ~/tools/get_pdb.pl 3pnl A B > 3pnlAB.pdb
    Question: we are not using 3pnl (known answer) for this benchmark, why is that?
  4. Repack or relax the template structures. Repacking is often necessary to remove small clashes identified by the score function as present in the crystal structure. See attached "REPACK.OPTIONS" and "REPACK.XML".
    wget https://gremlin2.bakerlab.org/rosninja2016/PPI/repack.options
    wget https://gremlin2.bakerlab.org/rosninja2016/PPI/repack.xml


    rosetta_scripts.default.linuxgccrelease @repack.options -s 3pnkA.pdb -parser:protocol 
    repack.xml
    rosetta_scripts.default.linuxgccrelease @repack.options -s 2btdA.pdb -parser:protocol repack.xml
    Question: How are these flags and XML different from the previous XML we wrote?
  5. Submit proteins to PatchDock: http://bioinfo3d.cs.tau.ac.il/PatchDock/
    PatchDock is a super fast "low resolution" Docking Algorithm Based on Shape Complementarity Principles.
    Receptor: 3pnkA_0001.pdb
    Ligand: 2btdA_0001.pdb
    I've already submitted the sequence for you, see results here:
    http://bioinfo3d.cs.tau.ac.il/PatchDock/runs/3pnkA_0001.pdb_2btdA_0001.pdb_32_30_8_11_9_116/
  6. Lets prepare 3pnlAB.pdb for comparison. Open with Pymol and trim the PDB so that chain A matches the sequence of 3pnkA and chain B matches the sequence of 2btdA. Lets save this PDB as "3pnlAB_trim.pdb"
  7. Extract the top100 solutions from patchdock
    mkdir top; cd top 
    wget http://bioinfo3d.cs.tau.ac.il/PatchDock/runs/3pnkA_0001.pdb_2btdA_0001.pdb_32_30_8_11_9_116/topSolutions.zip
    unzip topSolutions.zip
  8. Calculate iRMSD (interface root-mean-squared) for the patchdock solutions
    Make "calc_irmsd.xml"
    <ROSETTASCRIPTS>
        <FILTERS>
            <IRmsd name="calc_irmsd" jump="1" threshold="1000" scorefxn="talaris2013" />
        </FILTERS>
        <PROTOCOLS>
            <Add filter="calc_irmsd" />
        </PROTOCOLS>
    </ROSETTASCRIPTS>
    Question: Why would we want to do "interface" RMSD instead of plan RMSD?
    Run RosettaScripts:

    rosetta_scripts.default.linuxgccrelease \
    -in:file:native ../3pnlAB_trim.pdb \
    -s *.pdb \
    -parser:protocol calc_irmsd.xml
     \
    -renumber_pdb
    Note: if there are only a few flags, you do not need to make a flags file, you can specified them directly! (the -renumber_pdb flag will renumber and rechain pdb. The "TER" in the input files are important, that is where Rosetta sets the jumps (if chains are unavailable).
  9. Look inside the score.sc file (find column "calc_irmsd"). In rosetta_scripts "filters" adds a score column to score files.)
    Question: How far down the list (of docking.res.????.pdb) before we see a solution within 2 iRMSD? What is the iRMSD of the solution #1?
    wget https://gremlin2.bakerlab.org/rosninja2016/PPI/excol.pl
    cat score.sc | perl excol.pl calc_irmsd description | sort -k1n | head
  10. Refine results with Rosetta!
    mkdir dock; cd dock
    Make "local_dock.xml":
    <ROSETTASCRIPTS>
        <MOVERS>
            <Docking name="local_dock" score_low="score_docking_low" score_high="talaris2014" fullatom="1" local_refine="1" optimize_fold_tree="1" conserve_foldtree="0" design="0" />
            <FastRelax name="relax" scorefxn="talaris2014" repeats="1" />
        </MOVERS>
        <PROTOCOLS>
            <Add mover="local_dock" />
            <Add mover="relax" />
        </PROTOCOLS>
    </ROSETTASCRIPTS>
    Run RosettaScripts:
    rosetta_scripts.default.linuxgccrelease \
    -in:file:native ../../3pnlAB_trim.pdb \
    -s ../docking.res.[1-9]
    .pdb ../docking.res.10.pdb \
    -parser:protocol local_dock.xml
     \
    -renumber_pdb \
    -nstruct 5
  11. Look inside the score.sc file (find column "Irms" and "total_score").
    Question: If you sort by total_score (lower = better; energy), what is the top iRMSD of the top solution? If you instead search by "I_sc" (interface score), what is the top iRMSD?