GROMACS-2-runMD
Published:
In this blog post, I will introduce my second study note on using GROMACS package – running protein MD. The purpose of this post is to provide a concise example of the key steps and functions involved in setting up and running MD simulations for proteins. For a more comprehensive guide, I highly recommend referring to the tutorials available at mdtutorials.
Step 1: Preparation
get prepared
The GROMACS can be loaded by source the GMXRC, find the location that you install GROMACS and the GMXRC file is under the bin
directory.
source GMXRC
And then gmx is ready to run, you can use the help
command to check gmx modules.
gmx help
Next, download the pdb file of lysozyme from PDB website, code 1AKI.
Upload the 1aki.pdb
file to your working directory (make one and run jobs under a empty folder will be very helpful). And now the structure and gmx are both ready!
First we need to remove the water molecules in the protein. This is down by a linux command, which means get all lines without HOH
and write them to a new file 1AKI_clean.pdb
The -v option instructs grep to print all lines that do not contain or match the expression.
grep -v HOH 1aki.pdb > 1AKI_clean.pdb
The method described above is a simple and straightforward way to remove water molecules from a protein structure using GROMACS. However, in real-world scenarios, the process can be more complicated due to various factors. For example:
- corrupted PDB files
- Non-standard PDB files
- Uncleared PDB files
- Complex protein structures
Step 2: Get initial topology file
Next, the pdb2gmx
module helps to generate topology files. This will generate three files:
- The topology for the molecule. (topol.top)
- A position restraint file. (posre.itp)
- post-processed structure file. (1AKI_processed.gro)
gmx pdb2gmx -f 1AKI_clean.pdb -o 1AKI_processed.gro -water spce
-o
specify the name of the structure file and -water
define the force field for water molecule.
After hitting enter, you will be prompted to choose the force field.
Select the Force Field:
From '/home/wtguo/opt/gromacs/share/gromacs/top':
1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
......
14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40,, 843-856, DOI: 10.1007/s00249-011-0700-9)
15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
For this task, choose 15, OPLS for all atoms. Topology file contains all the force field information and the force field should be selected carfully.
For a full list of options can be passed to pdb2gmx
please refer to the documentation
The topology (topol.top) file can be visulized by plain text readers. Some key contents in the files are:
- force field name
#include "oplsaa.ff/forcefield.itp"
This line calls the parameters within the OPLS-AA force field.
- moleculetype
; Name nrexcl Protein_A 3
atoms
- remainders
constrains:
; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif
water topology and restraints
; Include water topology
#include "oplsaa.ff/spce.itp"
#ifdef POSRES_WATER
; Position restraint for each water oxygen
[ position_restraints ]
; i funct fcx fcy fcz
1 1 1000 1000 1000
#endif
ion parameter
; Include generic topology for ions
#include "oplsaa.ff/ions.itp"
system level definition
[ system ]
; Name
LYSOZYME
[ molecules ]
; Compound #mols
Protein_A 1
Step 3: Box and solvate
Next, we will put the protein into a box. It does not have to be a real cubic box. Periodic boundary conditions (PBC) boxes will save computational time as it takes smaller volumn. The module for generating box is editconf
. See the documentation for options. Here we use simplist cubic box to generate the gromacs file 1AKI_newbox.gro
.
gmx editconf -f 1AKI_processed.gro -o 1AKI_newbox.gro -c -d 1.0 -bt cubic
After defining the box, put water inside using solvate
(documentation)[https://manual.gromacs.org/current/onlinehelp/gmx-solvate.html]:
gmx solvate -cp 1AKI_newbox.gro -cs spc216.gro -o 1AKI_solv.gro -p topol.top
The -cp
takes the protein configuration from the inbox structure 1AKI_newbox.gro
and the -cs
indicates the water configuration spc216.gro
, which is a generic equilibrated 3-point solvent model. The orginal topol.top
file is updated and you will see that the [ molecule ] section (in the end of the file) now includes SOL. You can also tell that the original topology file is backuped by reading the outputs Back Off! I just backed up topol.top to ./#topol.top.1#
[ molecules ]
; Compound #mols
Protein_chain_A 1
SOL 10644
Step 4: Add ions
The protein is charged. The qtot
in the topology file (.top) record the total of chage for each residue. Since the protein is neutral, we have to add ions to balance it.
To add ions in GROMACS, call genion
module. What genion does is read through the topology and replace water molecules with the ions that the user specifies. Before that, we need prompp
to generate an atomic-level input file .tpr
. That will require a .mdp
(molecular dyanmics parameter file).
A sample of mdp file that contains energy minimization looks like:
; ions.mdp - used as input into grompp to generate ions.tpr
; Parameters describing what to do, when to stop and what to save
integrator = steep ; Algorithm (steep = steepest descent minimization)
emtol = 1000.0 ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep = 0.01 ; Minimization step size
nsteps = 50000 ; Maximum number of (minimization) steps to perform
; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist = 1 ; Frequency to update the neighbor list and long range forces
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; Method to determine neighbor list (simple, grid)
coulombtype = cutoff ; Treatment of long range electrostatic interactions
rcoulomb = 1.0 ; Short-range electrostatic cut-off
rvdw = 1.0 ; Short-range Van der Waals cut-off
pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions
Under your working directory, you can write a .mdp
file with the above content using vim
.
vim ions.mdp
Afterwards, we can generate the ions.tpr
file:
gmx grompp -f ions.mdp -c 1AKI_solv.gro -p topol.top -o ions.tpr
Next, we can use the genion
to add ions, specifically Na+ and Cl- here:
gmx genion -s ions.tpr -o 1AKI_solv_ions.gro -p topol.top -pname NA -nname CL -neutral
The prompt after hitting enter will ask for the continuous group.
Reading file ions.tpr, VERSION 2023.5 (single precision)
Reading file ions.tpr, VERSION 2023.5 (single precision)
Will try to add 0 NA ions and 8 CL ions.
Select a continuous group of solvent molecules
Group 0 ( System) has 33892 elements
Group 1 ( Protein) has 1960 elements
Group 2 ( Protein-H) has 1001 elements
Group 3 ( C-alpha) has 129 elements
Group 4 ( Backbone) has 387 elements
Group 5 ( MainChain) has 517 elements
Group 6 ( MainChain+Cb) has 634 elements
Group 7 ( MainChain+H) has 646 elements
Group 8 ( SideChain) has 1314 elements
Group 9 ( SideChain-H) has 484 elements
Group 10 ( Prot-Masses) has 1960 elements
Group 11 ( non-Protein) has 31932 elements
Group 12 ( Water) has 31932 elements
Group 13 ( SOL) has 31932 elements
Group 14 ( non-Water) has 1960 elements
Here we choose Group 13 for embedding ions, which means we will replace SOL to ions.
Again, it will back up the old topology and update the previous .top
file. And eight solvent molecules are substituted to Cl- to off set the +8 charges.
Back Off! I just backed up topol.top to ./#topol.top.2#
Using random seed 1863253503.
Replacing solvent molecule 713 (atom 4099) with CL
Replacing solvent molecule 5377 (atom 18091) with CL
Replacing solvent molecule 1311 (atom 5893) with CL
Replacing solvent molecule 6960 (atom 22840) with CL
Replacing solvent molecule 4007 (atom 13981) with CL
Replacing solvent molecule 7807 (atom 25381) with CL
Replacing solvent molecule 1775 (atom 7285) with CL
Replacing solvent molecule 3845 (atom 13495) with CL
Within the command, -p
= topology; -s
= structure/state file; -o
= output file.
Now the topology file ends up with
[ molecules ]
; Compound #mols
Protein_A 1
SOL 10636
CL 8
Step 5: Energy minimization
The solvated, electroneutral system is now assembled. Before running dynamics, we still have to make sure that the structure is proper. Therefore, we naeed to relax the structure through energy minimzation.
Now that we need another mdp
file for energy minimization, here is the sample:
; minim.mdp - used as input into grompp to generate em.tpr
; Parameters describing what to do, when to stop and what to save
integrator = steep ; Algorithm (steep = steepest descent minimization)
emtol = 1000.0 ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep = 0.01 ; Minimization step size
nsteps = 50000 ; Maximum number of (minimization) steps to perform
; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist = 1 ; Frequency to update the neighbor list and long range forces
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; Method to determine neighbor list (simple, grid)
coulombtype = PME ; Treatment of long range electrostatic interactions
rcoulomb = 1.0 ; Short-range electrostatic cut-off
rvdw = 1.0 ; Short-range Van der Waals cut-off
pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions
After writing a new file called minim.mdp
with the above countent using vim
, it is time to get the em.tpr
file.
gmx grompp -f minim.mdp -c 1AKI_solv_ions.gro -p topol.top -o em.tpr
Next, you can submit a job to run the energy minimzation task with module mdrun
gmx mdrun -deffnm em
The -v flag makes mdrun verbose, such that it prints its progress to the screen at every step. (I am not using it) And -deffnm
flag define the file names of the input and output. Details about mdrun can be found here.
mdrun produces at least four output files. A single log file (-g) is written. The trajectory file (-o), contains coordinates, velocities and optionally forces. The structure file (-c) contains the coordinates and velocities of the last step. The energy file (-e) contains energies, the temperature, pressure, etc, a lot of these things are also printed in the log file. Optionally coordinates can be written to a compressed trajectory file (-x).
I am using 16 CPU cores and it takes 20 seconds to converge the energy in ~900 steps.
The four generated output files contain information during the process:
.gro
file: This file contains the final minimized structure in the GROMACS coordinate file format. It includes the positions of all the atoms in the system after the energy minimization..edr
file: The .edr file is the energy file that contains the energy terms calculated during the minimization process. It stores information such as potential energy, kinetic energy, bonded and non-bonded interactions, and restraint energies. You may usegmx energy
to extract and analyze the data..log
file: The .log file is a log file that records the progress and details of the energy minimization process. It includes information about the minimization algorithm used, the number of steps performed, and the convergence criteria. It also reports the initial and final energy values of the system. You can examine this file to check if the minimization converged successfully and to get an overview of the minimization process..trr
file: The .trr file is a trajectory file that contains the coordinates, velocities, and forces of the system at regular intervals during the minimization process. It allows you to visualize the progression of the minimization and observe how the structure evolves over the course of the minimization. You can use tools likegmx dump
to extract information from the .trr file or visualize the trajectory using visualization software like VMD or PyMOL.
The analyze of energy can be accessed by
gmx energy -f em.edr -o potential.xvg
The prompt will ask you the terms for analysis:
Select the terms you want from the following list by
selecting either (part of) the name or the number or a combination.
End your selection with an empty line or a zero.
-------------------------------------------------------------------
1 Bond 2 Angle 3 Proper-Dih. 4 Ryckaert-Bell.
5 LJ-14 6 Coulomb-14 7 LJ-(SR) 8 Coulomb-(SR)
9 Coul.-recip. 10 Potential 11 Pressure 12 Vir-XX
13 Vir-XY 14 Vir-XZ 15 Vir-YX 16 Vir-YY
17 Vir-YZ 18 Vir-ZX 19 Vir-ZY 20 Vir-ZZ
21 Pres-XX 22 Pres-XY 23 Pres-XZ 24 Pres-YX
25 Pres-YY 26 Pres-YZ 27 Pres-ZX 28 Pres-ZY
29 Pres-ZZ 30 #Surf*SurfTen 31 T-rest
To visulize energy changes, type in
10
0
This will give average energy and write a potential.xvg
file:
Statistics over 912 steps [ 0.0000 through 911.0000 ps ], 1 data sets
All statistics are over 722 points (frames)
Energy Average Err.Est. RMSD Tot-Drift
-------------------------------------------------------------------------------
Potential -567075 11000 28211 -73433.1 (kJ/mol)
Energies along the trajectory is summarzied in the .xvg
file:
# Command line:
# gmx energy -f em.edr -o potential.xvg
# gmx energy is part of G R O M A C S:
#
# Guyana Rwanda Oman Macau Angola Cameroon Senegal
#
@ title "GROMACS Energies"
@ xaxis label "Time (ps)"
@ yaxis label "(kJ/mol)"
@TYPE xy
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on
@ legend loctype view
@ legend 0.78, 0.8
@ legend length 2
@ s0 legend "Potential"
0.000000 -330454.843750
1.000000 -347175.593750
2.000000 -370154.187500
3.000000 -398829.406250
4.000000 -427371.062500
5.000000 -452330.031250
6.000000 -458704.843750
7.000000 -464386.156250
8.000000 -464746.093750
To make the plot, the original tutorial recommends Grace. It is easy to get it through the mirros:
The software is available from https://plasma-gate.weizmann.ac.il/pub/grace/ and it is daily mirrored at
- Germany (FTP) (ftp://ftp.fu-berlin.de/unix/graphics/grace/)
- Japan (FTP) (ftp://ftp.u-aizu.ac.jp/pub/SciEng/math/grace/)
The energy converges and we are ready for next step!
Step 6: NVT equilibrium
After energy minimization, the structure is relaxed to a stationary point on the potential energy surface. The following equilibrium MD will equilibrate the solvent and ions around the protein. This step requires restraints, which is writtenin the file posre.itp
we generated in step 2.
The purpose of posre.itp is to apply a position restraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they allow us to equilibrate our solvent around our protein, without the added variable of structural changes in the protein.
Equilibration is a crucial step in molecular dynamics simulations that involves two phases: NVT (constant Number of particles, Volume, and Temperature) and NPT (constant Number of particles, Pressure, and Temperature).
The concept of ensembles is a fundamental aspect of thermodynamics, where we define systems based on consistent thermodynamic properties such as temperature, pressure, and volume. Ensembles describe the possible states of a system under specific constraints.
- In the NVT ensemble, also known as the canonical ensemble, the system is coupled to a heat bath that maintains a constant temperature. The volume of the system remains fixed, while the particles can exchange energy with the heat bath. This phase allows the system to reach thermal equilibrium at the desired temperature.
- Following the NVT phase, the system is subjected to the NPT ensemble, also called the isothermal-isobaric ensemble. In this phase, both the temperature and pressure of the system are kept constant. The volume of the system is allowed to fluctuate to maintain the specified pressure. The NPT phase ensures that the system reaches both thermal and mechanical equilibrium, mimicking realistic experimental conditions.
The NVT equilibrium MD also requires a .mdp
file, you may write it as nvt.mdp
. The parameters below are common settings and is used for the test case here. For a comprehensive list of molecular dynamics parameters in GROMACS, see the documentation
title = OPLS Lysozyme NVT equilibration
define = -DPOSRES ; position restrain the protein
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 50000 ; 2 * 50000 = 100 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 500 ; save coordinates every 1.0 ps
nstvout = 500 ; save velocities every 1.0 ps
nstenergy = 500 ; save energies every 1.0 ps
nstlog = 500 ; update log file every 1.0 ps
; Bond parameters
continuation = no ; first dynamics run
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds involving H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Nonbonded settings
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; search neighboring grid cells
nstlist = 10 ; 20 fs, largely irrelevant with Verlet
rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
DispCorr = EnerPres ; account for cut-off vdW scheme
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein Non-Protein ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
; Pressure coupling is off
pcoupl = no ; no pressure coupling in NVT
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Velocity generation
gen_vel = yes ; assign velocities from Maxwell distribution
gen_temp = 300 ; temperature for Maxwell distribution
gen_seed = -1 ; generate a random seed
And we repeat the step 5: generate tpr file through grompp
and then run it with mdrun
:
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
gmx mdrun -deffnm nvt
Using 16 cores, it takes around 5 minutes for NVT equilibrium (50000 steps, 100 picoseconds). Typically it requires 50-100 ps to equilibrate tempreture.
To analyze the tempreture, use energy
module to process file nvt.edr
:
gmx energy -f nvt.edr -o temperature.xvg
choose 16 0 at the prompt to extract tempreture.
The following prompt shows the average tempreture – which is very close to the target we set (300K).
Last energy frame read 100 time 100.000
Statistics over 50001 steps [ 0.0000 through 100.0000 ps ], 1 data sets
All statistics are over 501 points
Energy Average Err.Est. RMSD Tot-Drift
-------------------------------------------------------------------------------
Temperature 299.51 0.24 3.06328 1.30655 (K)
And you may read the file temprature.xvg
or plot the variance of energy along timesteps to check if the energy converges.
Step 7: NPT equilibrium
After the protein reaches its thermal equilibrium at the given tempreture, we then run NPT condition to achieve pressure equilibration.
The sample file given by the tutorial for npt.mdp
is:
title = OPLS Lysozyme NPT equilibration
define = -DPOSRES ; position restrain the protein
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 50000 ; 2 * 50000 = 100 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 500 ; save coordinates every 1.0 ps
nstvout = 500 ; save velocities every 1.0 ps
nstenergy = 500 ; save energies every 1.0 ps
nstlog = 500 ; update log file every 1.0 ps
; Bond parameters
continuation = yes ; Restarting after NVT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds involving H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Nonbonded settings
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; search neighboring grid cells
nstlist = 10 ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
DispCorr = EnerPres ; account for cut-off vdW scheme
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein Non-Protein ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
; Pressure coupling is on
pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
refcoord_scaling = com
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Velocity generation
gen_vel = no ; Velocity generation is off
Taking the nvt outputs as input, let’s repeat the MD simulation:
gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
gmx mdrun -deffnm npt
The NPT equilibration phase typically takes longer than the NVT phase due to the additional step of adjusting the simulation box size. The duration of the NPT phase depends on several factors, such as the size of the system, the initial configuration, and the target pressure. Larger systems or systems with significant pressure differences from the initial state may require longer NPT equilibration times to achieve stable box dimensions and pressure values.
Since the trajectory is not super long, it also takes 5 minutes around…
To analyze the result, we can read both pressure and density (option 18 and 24).
gmx energy -f npt.edr -o pressure.xvg
gmx energy -f npt.edr -o density.xvg
Interestingly, if you take a look at the figure of pressure:
The pressure value fluctuates widely over the course of the 100-ps equilibration phase, but this behavior is not unexpected. The running average of these data are plotted as the red line in the plot. Over the course of the equilibration, the average value of the pressure is 7.5 ± 160.5 bar. Note that the reference pressure was set to 1 bar, so is this outcome acceptable? Pressure is a quantity that fluctuates widely over the course of an MD simulation, as is clear from the large root-mean-square fluctuation (160.5 bar), so statistically speaking, one cannot distinguish a difference between the obtained average (7.5 ± 160.5 bar) and the target/reference value (1 bar).
On the other hand, density is more informative and is very close to the experimental results:
As with the pressure, the running average of the density is also plotted in red. The average value over the course of 100 ps is 1019 ± 3 kg m-3, close to the experimental value of 1000 kg m-3 and the expected density of the SPC/E model of 1008 kg m-3. The parameters for the SPC/E water model closely replicate experimental values for water. The density values are very stable over time, indicating that the system is well-equilibrated now with respect to pressure and density.
The author also bring up the question of why density values obtained do not match my results. A long ger NPT is safer.
Step 8 Unrestricted MD simulation
After equilibrium at both stable temperature and pressure, we are able to release the strain on protein and run production MD trajectories for 1 ns
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr
Yes, again we need a md.mdp
file:
title = OPLS Lysozyme NPT equilibration
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 500000 ; 2 * 500000 = 1000 ps (1 ns)
dt = 0.002 ; 2 fs
; Output control
nstxout = 0 ; suppress bulky .trr file by specifying
nstvout = 0 ; 0 for output frequency of nstxout,
nstfout = 0 ; nstvout, and nstfout
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
nstxout-compressed = 5000 ; save compressed coordinates every 10.0 ps
compressed-x-grps = System ; save the whole system
; Bond parameters
continuation = yes ; Restarting after NPT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds involving H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; search neighboring grid cells
nstlist = 10 ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein Non-Protein ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
; Pressure coupling is on
pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; Velocity generation is off
mrompp
print an estimate for PME load:
Estimate for the relative computational load of the PME mesh part: 0.22
PME (Particle Mesh Ewald) is a method used in molecular dynamics simulations to efficiently calculate long-range electrostatic interactions. The term “PME load” refers to the computational load or the amount of work associated with the PME calculations during the simulation. In molecular dynamics simulations, electrostatic interactions play a crucial role in determining the behavior of the system. However, calculating these interactions directly for all pairs of particles can be computationally expensive, especially for large systems. The PME method addresses this issue by splitting the electrostatic calculations into two parts: short-range and long-range interactions.
According to the original tutorial:
For a cubic box, the optimal setup will have a PME load of 0.25 (3:1 PP:PME - we’re very close to optimal!); for a dodecahedral box, the optimal PME load is 0.33 (2:1 PP:PME). When executing mdrun, the program should automatically determine the best number of processors to assign for the PP and PME calculations. Thus, make sure you indicate an appropriate number of threads/cores for your calculation (the value of -nt X), so that you can get the best performance.
Seems that the optimal value for PME is very customized, based on the system and MD setup.
Next we can run the GROMACS, again. And the file we generated before is md_0_1.tpr
gmx mdrun -deffnm md_0_1
If you are using the GPU accelerator, try
gmx mdrun -deffnm md_0_1 -nb gpu
Step 9 Analysis
Finally! The end of this tutorial is trajectory analysis – lets try some modules in gmx
that help process trajectories.
trjconv
This command is used to post-process trajectory files and can help produce the coordinates for subsequent analysis.
gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center
The above command takes the following options:
-s md_0_1.tpr
: The input structure file (TPR) that contains the system topology.-f md_0_1.xtc
: The input trajectory file (XTC) that contains the coordinates over time.-o md_0_1_noPBC.xtc
: The output trajectory file (XTC) with periodic boundary conditions (PBC) removed.-pbc mol
: Specifies that PBC should be removed by shifting molecules into the central box.-center
: Centers the protein in the box.
For comprehensive options, check here.
rms
This is the GROMACS built-in module for calculating the root mean square deviation (RMSD) of the protein structure over time.gmx rms -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rmsd.xvg -tu ns
The command options are as follows:
-s md_0_1.tpr
: The input structure file (TPR) that contains the system topology.-f md_0_1_noPBC.xtc
: The input trajectory file (XTC) with PBC removed.-o rmsd.xvg
: The output file (XVG) that will contain the RMSD values over time.-tu ns
: Specifies that the time unit should be nanoseconds.
Check here for other flags.
- Calculating RMSD relative to the crystal structure: To calculate the RMSD relative to the crystal structure, you can compare the trajectory with PBC removed (md_0_1_noPBC.xtc) to the crystal structure file. Use the following command:
gmx rms -s crystal_structure.pdb -f md_0_1_noPBC.xtc -o rmsd_crystal.xvg -tu ns
Here, crystal_structure.pdb represents the file containing the crystal structure coordinates. The output file rmsd_crystal.xvg will contain the RMSD values calculated relative to the crystal structure.