To come back to the tutorial index, click here.
benzene.pdb
- the structure of the benzene molecule in PDB formatprotein.pdb
- the structure of the T4 lysozyme in PDB format
python $PROTOMSHOME/protoms.py -s dualtopology -p protein.pdb -l benzene.pdb --absolute
this sets up one simulation for the bound leg and one for the free leg of the double-decoupling free energy cycle.
The simulations will run 5 m equilibration steps and 50 m production steps for each of the 16 λ-values. Output will be printed to files every 100 k moves and Hamiltonian replica-exchanges between neighbouring λ-values will be attempted every 100 k moves.
the -s dualtopology
argument tells the script to setup a dual topology free energy simulation and the --absolute
argument tells the script that no second ligand exists and that the given ligand should be perturbed to a dummy particle (nothing). The -p
and -l
arguments specifies the protein and the ligand, respectively.
You can read more about the files that the setup script creates further down on this page.
You can visualise the systems that will be simulated with (for instance) VMD:
vmd -m benzene.pdb benzene_dummy.pdb benzene_box.pdb vmd -m benzene.pdb benzene_dummy.pdb protein_scoop.pdb water.pdb
mpirun -np 16 $PROTOMSHOME/protoms3 run_bnd.cmd mpirun -np 16 $PROTOMSHOME/protoms3 run_free.cmdThis is most conveniently done on a computer cluster. The calculations will take up to 24 h to complete.
python $PROTOMSHOME/tools/calc_series.py -f out_free/lam-0.000/resultsto look at time series for the free leg at λ=0.0. In the wizard that appears you can type
total
followed by enter twice. This will plot the total energy as a function of simulation snapshot. It should look something like this:Next, we will analyse how effective the λ replica exchange was. Type
grep "lambda swaps" out_free/lam-0.000/infoand this will give you the overall acceptance ratio of the λ swaps. It should be around 60% for this perturbation, indicating that the overall effectiveness is acceptable. Next, we will look at individual replicas and see how they exchange λ-value throughout the simulation. This can be done with
python $PROTOMSHOME/tools/calc_replicapath.py -f out_free/lam-*/results -p 0.0 0.2 0.8 1.0 -o replica_path_free.png
to plot the path of the first few λ-values. It should look something like this.
as is clear, these the replicas was not able to traverse the full λ-space within the simulation time. Also the replica starting with λ=1.0, did not exchange a single time throughout the simulation. Hence, the replica exchange was not fully efficient.
Now, we will estimate the free energy. We will do this both with thermodynamic integration, Bennett's acceptance ratio (BAR) and multistate BAR (MBAR). To do this you can type
python $PROTOMSHOME/tools/calc_dg.py -d out_free/ -l 0.025
A typical result is between 0 and 1 kcal/mol. The experimental hydration free energy of benzene is -0.9 kcal/mol, thus this free-leg should be fairly accurate. Notice that only 10 snapshots (2.5% of the simulation) are removed from the 400 snapshots when calculating the free energy, because the analysis with calc_series.py
indicated that the simulation was equilibrated very much from the start.
It is important to study the gradient of the TI calculation. It should be smooth in order for the TI to work properly. For the free leg it should look something like this
Now, we need to repeat the analysis for the bound leg. When you plot the evolution of the total energy using calc_series.py
you will often find that it takes a very long time for the total energy to converge. This is because in the bound simulation, the total energy is dominated by solvent-solvent and protein-solvent interactions that have very little effect on the free energy of binding. Thus we should not look at the total energy when investigating convergence, rather we should look at the interaction between the ligand and either protein or solvent.
Type
python $PROTOMSHOME/tools/calc_series.py -f out_bnd/lam-0.000/resultsand when you're prompted for the series to analyze, type the following series
total inter/solvent-solvent/sum inter/protein1-solvent/sum inter/bnz-solvent/sum inter/protein1-bnz/sumand when asked for the type of plot, simply select option
5
. The plot should look something like this.As we have decoupled the ligand whilst using a restraint there are two additional correction terms that must be calculated. These are the free energy to introduce the restraint whilst the ligand is in the protein (ΔGrest1) and the free energy to release the restraint in the gas phase (ΔGrest2). As the second term is for the gas phase it has a simple analytical form that also incorporates a standard state correction. This is evaluated as the following: ΔGrest2 = RT ln ( Vsim / V0 ), where RT is the gas constant multiplied by the absolute temperature and V0=1660 A3. The simulation volume of benzene Vsim can be estimated from (2πRT / k)3/2 where k is the harmonic spring constant. With a spring constant of 2 kcal / mol. ΔGrest2 should evaluate to -3.84 kcal/mol. (Note that ProtoMS defines the harmonic restraint using k, rather than 0.5 k, the above formula for the simulation volume is correct for a restraint with 0.5 k).
Unfortunately, the bound phase restraint term, ΔGrest1, has no simple solution and must be calculated from an additional simulation. The file run_bnd_rstr.cmd was also created by the setup tools. It performs a replica exchange free simulation for the bound phase system with and without the restraint. This can be executed with:
mpirun -np 2 $PROTOMSHOME/protoms3 run_bnd_rstr.cmdand will create the output folder out_bnd_rstr. The free energy to introduce the restraint is then calculated similarly to before:
python $PROTOMSHOME/tools/calc_dg.py -d out_bnd_rstr/ -l 0.375 -e barwhere the amount of skipped data is chosen as above. Only the free energy difference calculated using BAR is requested as the trapezium rule used to evaluate TI cannot be considered accurate with only 2 λ values as here. This should evaluate to ~1 kcal/mol.
The standard binding free energy of benzene can then be estimated from:
ΔGbind = ΔGfree – ΔGbound – ΔGrest1 – ΔGrest2
and it should evaluate to between -5.4 and -5.9 kcal/mol. The experimental value is -5.2 kcal/mol, so the computational prediction is pretty good!
protoms.py
as above, you accept the standard number of λ-values, simulation length etc. The values of these parameters were chosen from experience and should work for most systems. However, there are situations when you may want to do something else. Here, we will go through some of the many available options. To know about other options you have to read the manuals for the tools or the overall MC program. You might also have to set up your system by executing the individual tools separately.
--nequil
- this controls the number of equilibration steps--nprod
- this controls the number of production stepspython $PROTOMSHOME/protoms.py -s dualtopology -p protein.pdb -l benzene.pdb --absolute --nequil 10E6 --nprod 50E6you will run 10 m equiliration steps and 50 m production steps (instead of the 5 m and 40 m that is default)
--lambdas
.
by typing for instance
python $PROTOMSHOME/protoms.py -s dualtopology -p protein.pdb -l benzene.pdb --absolute --lambdas 24you will initiate 24 λ-values rather than default 16. You can also give individual λ-values to the argument. For instance
python $PROTOMSHOME/protoms.py -s dualtopology -p protein.pdb -l benzene.pdb --absolute --lambdas 0.000 0.033 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 0.967 1.000
will add two new λ-values at 0.033 and 0.967 to the 16 created by default.
--repeats
or just -r
.
by typing for instance
python $PROTOMSHOME/protoms.py -s dualtopology -p protein.pdb -l benzene.pdb --absolute -r 5you will create 5 input files for the bound leg and 5 input files for the free leg. Therefore, you also need to execute ProtoMS 10 times with the different input files. The output will be in 10 different folders, e.g.
out1_bnd
and out2_bnd
.
temperaturere 1E5 25.0 27.0 30.0 34.0 39.0 45.0 52.0 60.0 69.0 temperatureladder lambda 0.00 1.00 lambdaladder temperature 25.0The last line indicates that the lambda replica exchange will occur at 25 degrees Celsius. It should be noted that the selection of temperatures is a non-trivial task and it is likely to be system dependent. The above lines should therefore be regarded as a suggestion. ProtoMS then needs to be executed with additional nodes, for instance
mpirun -np 32 $PROTOMSHOME/protoms3 run_bnd.cmd
protoms.py
is by default 1 kcal/mol/A2. It can sometimes be necessary to optimize it so that the free energy is not affected by the magnitude of the restraint. This can be done by running a short unrestrained simulation and estimate the force constant from the root mean square deviation by applying the equipartition theorem.
To setup the short unrestrained simulation, first make a separate folder for the simulation. Then you can type
python $PROTOMSHOME/protoms.py -s sampling -p protein.pdb -l benzene.pdb --nprod 500E3 --nequil 0 --dumpfreq 1E3 $PROTOMSHOME/protoms3 run_bnd.cmdthis 50 k simulation should only take about five minutes to execute. The simulation output can be found in
out_bnd
.
To calculate the RMSD and estimate an appropriate force constant, type
python $PROTOMSHOME/tools/calc_rmsd.py -i benzene.pdb -f out_bnd/all.pdb -l BNZ -a C4where
BNZ
is the residue name of the ligand, and C4
is the atom the restraint is applied to. The output should be something like this.
The RMSD of the atom c4 is 0.898 A This corresponds to a spring constant of 2.205 kcal/mol/A2 (to use this as an harmonic restraint in ProtoMS, specify 1.102)and we can see that the default force constant is appropriate. The force constant is estimated from k = 3 RT / <RMSD2>, where R and T are the gas constant and the absolute temperature, respectively. Notice that in the ProtoMS input you specify 1/2k rather than k.
benzene.prepi
= the z-matrix and atom types of benzene in Amber formatbenzene.frcmod
= additional parameters not in GAFFbenzene.zmat
= the z-matrix of benzene used to sample it in the MC simulationbenzene.tem
= the complete template (force field) file for the ligand in ProtoMS formatprotein_scoop.pdb
= the truncated protein structure
benzene_box.pdb
= the box of water solvating benzene in the free leg simulationwater.pdb
= the cap of water solvating the protein-ligand system in the bound leg simulationbenzene_dummy.pdb
= the dummy particle that benzene will be perturbed intobnz-dummy.tem
= the combined template file for benzene and the dummy, used only in this simulationrun_bnd.cmd
= the ProtoMS input file for the bound-leg simulationrun_free.cmd
= the ProtoMS input file for the free-leg simulation
dualtopology1 1 2 synctrans syncrot softcore1 solute 1 softcore2 solute 2 softcoreparams coul 1 delta 0.2 deltacoul 2.0 power 6 soft66
lambdare 100000 0.000 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 1.000
chunk id add 1 solute 1 C4 BNZ chunk restraint add 1 cartesian harmonic 27.837 6.921 3.569 1 chunk id add 2 solute 2 C1 DDD chunk restraint add 2 cartesian harmonic 26.911 6.126 4.178 1
protoms.py
benzene.pdb
contains a directive, telling ProtoMS the name of the solute. The line should read HEADER BNZ
and can be added by typing
sed -i "1iHEADER BNZ" benzene.pdb
python $PROTOMSHOME/tools/ambertools.py -f benzene.pdb -n BNZ
and this will execute the AmberTools programs antechamber
and parmchk
, creating the files benzene.prepi
and benzene.frcmod
, respectively.
python $PROTOMSHOME/tools/build_template.py -p benzene.prepi -f benzene.frcmod -o benzene.tem -n BNZ
this will creates the files benzene.tem
containing the ProtoMS template file and benzene.zmat
. It is a good idea to check these files to see if the script has defined the molecule properly.
python $PROTOMSHOME/tools/solvate.py -b $PROTOMSHOME/data/wbox_tip4p.pdb -s benzene.pdb -o benzene_box.pdb
this will solvate benzene using standard settings, i.e. it will be 10 A between the solute and the edge of the box. A pre-equilibrated box of TIP4P water molecules located in $PROTOMSHOME/data/
is used. The box is written to the file benzene_box.pdb
.
python $PROTOMSHOME/tools/make_dummy.py -f benzene.pdb -o benzene_dummy.pdb
creating benzene_dummy.pdb
that contains the particle.
python $PROTOMSHOME/tools/merge_templates.py -f benzene.tem $PROTOMSHOME/data/dummy.tem -o bnz-dummy.tem
creating bnz-dummy.tem
. The template file of the dummy particle is located in $PROTOMSHOME/data/
.
Now we have all the files to run the free-leg of the simulation. The input file for ProtoMS will be created when we have prepared the bound-leg.
python $PROTOMSHOME/tools/convertatomnames.py -p protein.pdb -o protein_pms.pdb -s amber -c $PROTOMSHOME/data/atomnamesmap.dat
The converted structure will be in protein_pms.pdb
. This execution assumes that the Amber naming convention is used in protein.pdb
.
python $PROTOMSHOME/tools/convertwater.py -p protein_pms.pdb -o protein_pms_t4p.pdb
creating protein_pms_t4p.pdb
.
python $PROTOMSHOME/tools/scoop.py -p protein_pms_t4p.pdb -l benzene.pdb -o protein_scoop.pdb
The protein scoop is centred on the benzene molecule and all residues further than 20 A are cut-away. The scoop is written to protein_scoop.pdb
python $PROTOMSHOME/tools/solvate.py -b $PROTOMSHOME/data/wbox_tip4p.pdb -s benzene.pdb -pr protein_scoop.pdb -o water.pdb -g droplet
this will create a droplet with 30 A radius centred on the benzene molecule. The droplet is written to water.pdb
solvate.py
script adds the crystallographic waters from the scoop to the droplet. Therefore, we need to remove them from the scoop PDB-file.
sed -i -e "/T4P/d" -e "/TER/d" protein_scoop.pdb
Now we have all the files need to run the simulation. As you noticed, this step-by-step procedure create a few files that protoms.py
does not generate.
python $PROTOMSHOME/tools/generate_input.py -s dualtopology -p protein_scoop.pdb -l benzene.pdb benzene_dummy.pdb -t benzene.tem $PROTOMSHOME/data/dummy.tem -pw water.pdb -lw benzene_box.pdb --absolute -o run
creating run_bnd.cmd
and run_free.cmd
. Notice that we had to specify the template file for the dummy atom, as it wasn't automatically created.