GROMACS-2-runMD

23 minute read

Published:

In this blog post, I will introduce my second study note on using GROMACS package – running protein MD. The purpose of this post is to provide a concise example of the key steps and functions involved in setting up and running MD simulations for proteins. For a more comprehensive guide, I highly recommend referring to the tutorials available at mdtutorials.

Step 1: Preparation

get prepared

The GROMACS can be loaded by source the GMXRC, find the location that you install GROMACS and the GMXRC file is under the bin directory.

source GMXRC

And then gmx is ready to run, you can use the help command to check gmx modules.

gmx help

Next, download the pdb file of lysozyme from PDB website, code 1AKI.

Upload the 1aki.pdb file to your working directory (make one and run jobs under a empty folder will be very helpful). And now the structure and gmx are both ready!

First we need to remove the water molecules in the protein. This is down by a linux command, which means get all lines without HOH and write them to a new file 1AKI_clean.pdb

The -v option instructs grep to print all lines that do not contain or match the expression.

grep -v HOH 1aki.pdb > 1AKI_clean.pdb

The method described above is a simple and straightforward way to remove water molecules from a protein structure using GROMACS. However, in real-world scenarios, the process can be more complicated due to various factors. For example:

  1. corrupted PDB files
  2. Non-standard PDB files
  3. Uncleared PDB files
  4. Complex protein structures

Step 2: Get initial topology file

Next, the pdb2gmx module helps to generate topology files. This will generate three files:

  1. The topology for the molecule. (topol.top)
  2. A position restraint file. (posre.itp)
  3. post-processed structure file. (1AKI_processed.gro)
gmx pdb2gmx -f 1AKI_clean.pdb -o 1AKI_processed.gro -water spce

-o specify the name of the structure file and -water define the force field for water molecule.

After hitting enter, you will be prompted to choose the force field.

Select the Force Field:

From '/home/wtguo/opt/gromacs/share/gromacs/top':
 1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)

 2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)

......

14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40,, 843-856, DOI: 10.1007/s00249-011-0700-9)

15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)

For this task, choose 15, OPLS for all atoms. Topology file contains all the force field information and the force field should be selected carfully.

For a full list of options can be passed to pdb2gmx please refer to the documentation

The topology (topol.top) file can be visulized by plain text readers. Some key contents in the files are:

  1. force field name
    #include "oplsaa.ff/forcefield.itp"
    

    This line calls the parameters within the OPLS-AA force field.

  2. moleculetype
    ; Name       nrexcl
    Protein_A    3
    
  3. atoms

  4. remainders

constrains:

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

water topology and restraints

; Include water topology
#include "oplsaa.ff/spce.itp"

#ifdef POSRES_WATER
; Position restraint for each water oxygen
[ position_restraints ]
;  i funct       fcx        fcy        fcz
   1    1       1000       1000       1000
#endif

ion parameter

; Include generic topology for ions
#include "oplsaa.ff/ions.itp"

system level definition

[ system ]
; Name
LYSOZYME

[ molecules ]
; Compound        #mols
Protein_A           1

Step 3: Box and solvate

Next, we will put the protein into a box. It does not have to be a real cubic box. Periodic boundary conditions (PBC) boxes will save computational time as it takes smaller volumn. The module for generating box is editconf. See the documentation for options. Here we use simplist cubic box to generate the gromacs file 1AKI_newbox.gro.

gmx editconf -f 1AKI_processed.gro -o 1AKI_newbox.gro -c -d 1.0 -bt cubic

After defining the box, put water inside using solvate (documentation)[https://manual.gromacs.org/current/onlinehelp/gmx-solvate.html]:

gmx solvate -cp 1AKI_newbox.gro -cs spc216.gro -o 1AKI_solv.gro -p topol.top

The -cp takes the protein configuration from the inbox structure 1AKI_newbox.gro and the -cs indicates the water configuration spc216.gro, which is a generic equilibrated 3-point solvent model. The orginal topol.top file is updated and you will see that the [ molecule ] section (in the end of the file) now includes SOL. You can also tell that the original topology file is backuped by reading the outputs Back Off! I just backed up topol.top to ./#topol.top.1#

[ molecules ]
; Compound        #mols
Protein_chain_A     1
SOL             10644

Step 4: Add ions

The protein is charged. The qtot in the topology file (.top) record the total of chage for each residue. Since the protein is neutral, we have to add ions to balance it.

To add ions in GROMACS, call genion module. What genion does is read through the topology and replace water molecules with the ions that the user specifies. Before that, we need prompp to generate an atomic-level input file .tpr. That will require a .mdp (molecular dyanmics parameter file).

A sample of mdp file that contains energy minimization looks like:

; ions.mdp - used as input into grompp to generate ions.tpr
; Parameters describing what to do, when to stop and what to save
integrator  = steep         ; Algorithm (steep = steepest descent minimization)
emtol       = 1000.0        ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep      = 0.01          ; Minimization step size
nsteps      = 50000         ; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist         = 1         ; Frequency to update the neighbor list and long range forces
cutoff-scheme	= Verlet    ; Buffered neighbor searching 
ns_type         = grid      ; Method to determine neighbor list (simple, grid)
coulombtype     = cutoff    ; Treatment of long range electrostatic interactions
rcoulomb        = 1.0       ; Short-range electrostatic cut-off
rvdw            = 1.0       ; Short-range Van der Waals cut-off
pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions

Under your working directory, you can write a .mdp file with the above content using vim.

vim ions.mdp

Afterwards, we can generate the ions.tpr file:

gmx grompp -f ions.mdp -c 1AKI_solv.gro -p topol.top -o ions.tpr

Next, we can use the genion to add ions, specifically Na+ and Cl- here:

gmx genion -s ions.tpr -o 1AKI_solv_ions.gro -p topol.top -pname NA -nname CL -neutral

The prompt after hitting enter will ask for the continuous group.

Reading file ions.tpr, VERSION 2023.5 (single precision)
Reading file ions.tpr, VERSION 2023.5 (single precision)
Will try to add 0 NA ions and 8 CL ions.
Select a continuous group of solvent molecules
Group     0 (         System) has 33892 elements
Group     1 (        Protein) has  1960 elements
Group     2 (      Protein-H) has  1001 elements
Group     3 (        C-alpha) has   129 elements
Group     4 (       Backbone) has   387 elements
Group     5 (      MainChain) has   517 elements
Group     6 (   MainChain+Cb) has   634 elements
Group     7 (    MainChain+H) has   646 elements
Group     8 (      SideChain) has  1314 elements
Group     9 (    SideChain-H) has   484 elements
Group    10 (    Prot-Masses) has  1960 elements
Group    11 (    non-Protein) has 31932 elements
Group    12 (          Water) has 31932 elements
Group    13 (            SOL) has 31932 elements
Group    14 (      non-Water) has  1960 elements

Here we choose Group 13 for embedding ions, which means we will replace SOL to ions.

Again, it will back up the old topology and update the previous .top file. And eight solvent molecules are substituted to Cl- to off set the +8 charges.

Back Off! I just backed up topol.top to ./#topol.top.2#
Using random seed 1863253503.
Replacing solvent molecule 713 (atom 4099) with CL
Replacing solvent molecule 5377 (atom 18091) with CL
Replacing solvent molecule 1311 (atom 5893) with CL
Replacing solvent molecule 6960 (atom 22840) with CL
Replacing solvent molecule 4007 (atom 13981) with CL
Replacing solvent molecule 7807 (atom 25381) with CL
Replacing solvent molecule 1775 (atom 7285) with CL
Replacing solvent molecule 3845 (atom 13495) with CL

Within the command, -p = topology; -s = structure/state file; -o = output file.

Now the topology file ends up with

[ molecules ]
; Compound      #mols
Protein_A         1
SOL           10636
CL                8

Step 5: Energy minimization

The solvated, electroneutral system is now assembled. Before running dynamics, we still have to make sure that the structure is proper. Therefore, we naeed to relax the structure through energy minimzation.

Now that we need another mdp file for energy minimization, here is the sample:

; minim.mdp - used as input into grompp to generate em.tpr
; Parameters describing what to do, when to stop and what to save
integrator  = steep         ; Algorithm (steep = steepest descent minimization)
emtol       = 1000.0        ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep      = 0.01          ; Minimization step size
nsteps      = 50000         ; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist         = 1         ; Frequency to update the neighbor list and long range forces
cutoff-scheme   = Verlet    ; Buffered neighbor searching
ns_type         = grid      ; Method to determine neighbor list (simple, grid)
coulombtype     = PME       ; Treatment of long range electrostatic interactions
rcoulomb        = 1.0       ; Short-range electrostatic cut-off
rvdw            = 1.0       ; Short-range Van der Waals cut-off
pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions

After writing a new file called minim.mdp with the above countent using vim, it is time to get the em.tpr file.

gmx grompp -f minim.mdp -c 1AKI_solv_ions.gro -p topol.top -o em.tpr

Next, you can submit a job to run the energy minimzation task with module mdrun

gmx mdrun -deffnm em

The -v flag makes mdrun verbose, such that it prints its progress to the screen at every step. (I am not using it) And -deffnm flag define the file names of the input and output. Details about mdrun can be found here.

mdrun produces at least four output files. A single log file (-g) is written. The trajectory file (-o), contains coordinates, velocities and optionally forces. The structure file (-c) contains the coordinates and velocities of the last step. The energy file (-e) contains energies, the temperature, pressure, etc, a lot of these things are also printed in the log file. Optionally coordinates can be written to a compressed trajectory file (-x).

I am using 16 CPU cores and it takes 20 seconds to converge the energy in ~900 steps.

The four generated output files contain information during the process:

  1. .gro file: This file contains the final minimized structure in the GROMACS coordinate file format. It includes the positions of all the atoms in the system after the energy minimization.

  2. .edr file: The .edr file is the energy file that contains the energy terms calculated during the minimization process. It stores information such as potential energy, kinetic energy, bonded and non-bonded interactions, and restraint energies. You may use gmx energy to extract and analyze the data.

  3. .log file: The .log file is a log file that records the progress and details of the energy minimization process. It includes information about the minimization algorithm used, the number of steps performed, and the convergence criteria. It also reports the initial and final energy values of the system. You can examine this file to check if the minimization converged successfully and to get an overview of the minimization process.

  4. .trr file: The .trr file is a trajectory file that contains the coordinates, velocities, and forces of the system at regular intervals during the minimization process. It allows you to visualize the progression of the minimization and observe how the structure evolves over the course of the minimization. You can use tools like gmx dump to extract information from the .trr file or visualize the trajectory using visualization software like VMD or PyMOL.

The analyze of energy can be accessed by

gmx energy -f em.edr -o potential.xvg

The prompt will ask you the terms for analysis:

Select the terms you want from the following list by
selecting either (part of) the name or the number or a combination.
End your selection with an empty line or a zero.
-------------------------------------------------------------------
  1  Bond             2  Angle            3  Proper-Dih.      4  Ryckaert-Bell.
  5  LJ-14            6  Coulomb-14       7  LJ-(SR)          8  Coulomb-(SR)
  9  Coul.-recip.    10  Potential       11  Pressure        12  Vir-XX
 13  Vir-XY          14  Vir-XZ          15  Vir-YX          16  Vir-YY
 17  Vir-YZ          18  Vir-ZX          19  Vir-ZY          20  Vir-ZZ
 21  Pres-XX         22  Pres-XY         23  Pres-XZ         24  Pres-YX
 25  Pres-YY         26  Pres-YZ         27  Pres-ZX         28  Pres-ZY
 29  Pres-ZZ         30  #Surf*SurfTen   31  T-rest

To visulize energy changes, type in

 10
 0

This will give average energy and write a potential.xvg file:

 Statistics over 912 steps [ 0.0000 through 911.0000 ps ], 1 data sets
All statistics are over 722 points (frames)

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Potential                   -567075      11000      28211   -73433.1  (kJ/mol)

Energies along the trajectory is summarzied in the .xvg file:

# Command line:
#   gmx energy -f em.edr -o potential.xvg
# gmx energy is part of G R O M A C S:
#
# Guyana Rwanda Oman Macau Angola Cameroon Senegal
#
@    title "GROMACS Energies"
@    xaxis  label "Time (ps)"
@    yaxis  label "(kJ/mol)"
@TYPE xy
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on
@ legend loctype view
@ legend 0.78, 0.8
@ legend length 2
@ s0 legend "Potential"
    0.000000  -330454.843750
    1.000000  -347175.593750
    2.000000  -370154.187500
    3.000000  -398829.406250
    4.000000  -427371.062500
    5.000000  -452330.031250
    6.000000  -458704.843750
    7.000000  -464386.156250
    8.000000  -464746.093750

To make the plot, the original tutorial recommends Grace. It is easy to get it through the mirros:

The software is available from https://plasma-gate.weizmann.ac.il/pub/grace/ and it is daily mirrored at

  • Germany (FTP) (ftp://ftp.fu-berlin.de/unix/graphics/grace/)
  • Japan (FTP) (ftp://ftp.u-aizu.ac.jp/pub/SciEng/math/grace/)

The energy converges and we are ready for next step!

Step 6: NVT equilibrium

After energy minimization, the structure is relaxed to a stationary point on the potential energy surface. The following equilibrium MD will equilibrate the solvent and ions around the protein. This step requires restraints, which is writtenin the file posre.itp we generated in step 2.

The purpose of posre.itp is to apply a position restraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they allow us to equilibrate our solvent around our protein, without the added variable of structural changes in the protein.

Equilibration is a crucial step in molecular dynamics simulations that involves two phases: NVT (constant Number of particles, Volume, and Temperature) and NPT (constant Number of particles, Pressure, and Temperature).

The concept of ensembles is a fundamental aspect of thermodynamics, where we define systems based on consistent thermodynamic properties such as temperature, pressure, and volume. Ensembles describe the possible states of a system under specific constraints.

  1. In the NVT ensemble, also known as the canonical ensemble, the system is coupled to a heat bath that maintains a constant temperature. The volume of the system remains fixed, while the particles can exchange energy with the heat bath. This phase allows the system to reach thermal equilibrium at the desired temperature.
  2. Following the NVT phase, the system is subjected to the NPT ensemble, also called the isothermal-isobaric ensemble. In this phase, both the temperature and pressure of the system are kept constant. The volume of the system is allowed to fluctuate to maintain the specified pressure. The NPT phase ensures that the system reaches both thermal and mechanical equilibrium, mimicking realistic experimental conditions.

The NVT equilibrium MD also requires a .mdp file, you may write it as nvt.mdp. The parameters below are common settings and is used for the test case here. For a comprehensive list of molecular dynamics parameters in GROMACS, see the documentation

title                   = OPLS Lysozyme NVT equilibration 
define                  = -DPOSRES  ; position restrain the protein
; Run parameters
integrator              = md        ; leap-frog integrator
nsteps                  = 50000     ; 2 * 50000 = 100 ps
dt                      = 0.002     ; 2 fs
; Output control
nstxout                 = 500       ; save coordinates every 1.0 ps
nstvout                 = 500       ; save velocities every 1.0 ps
nstenergy               = 500       ; save energies every 1.0 ps
nstlog                  = 500       ; update log file every 1.0 ps
; Bond parameters
continuation            = no        ; first dynamics run
constraint_algorithm    = lincs     ; holonomic constraints 
constraints             = h-bonds   ; bonds involving H are constrained
lincs_iter              = 1         ; accuracy of LINCS
lincs_order             = 4         ; also related to accuracy
; Nonbonded settings 
cutoff-scheme           = Verlet    ; Buffered neighbor searching
ns_type                 = grid      ; search neighboring grid cells
nstlist                 = 10        ; 20 fs, largely irrelevant with Verlet
rcoulomb                = 1.0       ; short-range electrostatic cutoff (in nm)
rvdw                    = 1.0       ; short-range van der Waals cutoff (in nm)
DispCorr                = EnerPres  ; account for cut-off vdW scheme
; Electrostatics
coulombtype             = PME       ; Particle Mesh Ewald for long-range electrostatics
pme_order               = 4         ; cubic interpolation
fourierspacing          = 0.16      ; grid spacing for FFT
; Temperature coupling is on
tcoupl                  = V-rescale             ; modified Berendsen thermostat
tc-grps                 = Protein Non-Protein   ; two coupling groups - more accurate
tau_t                   = 0.1     0.1           ; time constant, in ps
ref_t                   = 300     300           ; reference temperature, one for each group, in K
; Pressure coupling is off
pcoupl                  = no        ; no pressure coupling in NVT
; Periodic boundary conditions
pbc                     = xyz       ; 3-D PBC
; Velocity generation
gen_vel                 = yes       ; assign velocities from Maxwell distribution
gen_temp                = 300       ; temperature for Maxwell distribution
gen_seed                = -1        ; generate a random seed

And we repeat the step 5: generate tpr file through grompp and then run it with mdrun:

gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
gmx mdrun -deffnm nvt

Using 16 cores, it takes around 5 minutes for NVT equilibrium (50000 steps, 100 picoseconds). Typically it requires 50-100 ps to equilibrate tempreture.

To analyze the tempreture, use energy module to process file nvt.edr:

gmx energy -f nvt.edr -o temperature.xvg

choose 16 0 at the prompt to extract tempreture.

The following prompt shows the average tempreture – which is very close to the target we set (300K).

Last energy frame read 100 time  100.000

Statistics over 50001 steps [ 0.0000 through 100.0000 ps ], 1 data sets
All statistics are over 501 points

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Temperature                  299.51       0.24    3.06328    1.30655  (K)

And you may read the file temprature.xvg or plot the variance of energy along timesteps to check if the energy converges.

Step 7: NPT equilibrium

After the protein reaches its thermal equilibrium at the given tempreture, we then run NPT condition to achieve pressure equilibration.

The sample file given by the tutorial for npt.mdp is:

title                   = OPLS Lysozyme NPT equilibration 
define                  = -DPOSRES  ; position restrain the protein
; Run parameters
integrator              = md        ; leap-frog integrator
nsteps                  = 50000     ; 2 * 50000 = 100 ps
dt                      = 0.002     ; 2 fs
; Output control
nstxout                 = 500       ; save coordinates every 1.0 ps
nstvout                 = 500       ; save velocities every 1.0 ps
nstenergy               = 500       ; save energies every 1.0 ps
nstlog                  = 500       ; update log file every 1.0 ps
; Bond parameters
continuation            = yes       ; Restarting after NVT 
constraint_algorithm    = lincs     ; holonomic constraints 
constraints             = h-bonds   ; bonds involving H are constrained
lincs_iter              = 1         ; accuracy of LINCS
lincs_order             = 4         ; also related to accuracy
; Nonbonded settings 
cutoff-scheme           = Verlet    ; Buffered neighbor searching
ns_type                 = grid      ; search neighboring grid cells
nstlist                 = 10        ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb                = 1.0       ; short-range electrostatic cutoff (in nm)
rvdw                    = 1.0       ; short-range van der Waals cutoff (in nm)
DispCorr                = EnerPres  ; account for cut-off vdW scheme
; Electrostatics
coulombtype             = PME       ; Particle Mesh Ewald for long-range electrostatics
pme_order               = 4         ; cubic interpolation
fourierspacing          = 0.16      ; grid spacing for FFT
; Temperature coupling is on
tcoupl                  = V-rescale             ; modified Berendsen thermostat
tc-grps                 = Protein Non-Protein   ; two coupling groups - more accurate
tau_t                   = 0.1     0.1           ; time constant, in ps
ref_t                   = 300     300           ; reference temperature, one for each group, in K
; Pressure coupling is on
pcoupl                  = Parrinello-Rahman     ; Pressure coupling on in NPT
pcoupltype              = isotropic             ; uniform scaling of box vectors
tau_p                   = 2.0                   ; time constant, in ps
ref_p                   = 1.0                   ; reference pressure, in bar
compressibility         = 4.5e-5                ; isothermal compressibility of water, bar^-1
refcoord_scaling        = com
; Periodic boundary conditions
pbc                     = xyz       ; 3-D PBC
; Velocity generation
gen_vel                 = no        ; Velocity generation is off 

Taking the nvt outputs as input, let’s repeat the MD simulation:

gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
gmx mdrun -deffnm npt

The NPT equilibration phase typically takes longer than the NVT phase due to the additional step of adjusting the simulation box size. The duration of the NPT phase depends on several factors, such as the size of the system, the initial configuration, and the target pressure. Larger systems or systems with significant pressure differences from the initial state may require longer NPT equilibration times to achieve stable box dimensions and pressure values.

Since the trajectory is not super long, it also takes 5 minutes around…

To analyze the result, we can read both pressure and density (option 18 and 24).

gmx energy -f npt.edr -o pressure.xvg
gmx energy -f npt.edr -o density.xvg

Interestingly, if you take a look at the figure of pressure:

The pressure value fluctuates widely over the course of the 100-ps equilibration phase, but this behavior is not unexpected. The running average of these data are plotted as the red line in the plot. Over the course of the equilibration, the average value of the pressure is 7.5 ± 160.5 bar. Note that the reference pressure was set to 1 bar, so is this outcome acceptable? Pressure is a quantity that fluctuates widely over the course of an MD simulation, as is clear from the large root-mean-square fluctuation (160.5 bar), so statistically speaking, one cannot distinguish a difference between the obtained average (7.5 ± 160.5 bar) and the target/reference value (1 bar).

On the other hand, density is more informative and is very close to the experimental results:

As with the pressure, the running average of the density is also plotted in red. The average value over the course of 100 ps is 1019 ± 3 kg m-3, close to the experimental value of 1000 kg m-3 and the expected density of the SPC/E model of 1008 kg m-3. The parameters for the SPC/E water model closely replicate experimental values for water. The density values are very stable over time, indicating that the system is well-equilibrated now with respect to pressure and density.

The author also bring up the question of why density values obtained do not match my results. A long ger NPT is safer.

Step 8 Unrestricted MD simulation

After equilibrium at both stable temperature and pressure, we are able to release the strain on protein and run production MD trajectories for 1 ns

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr

Yes, again we need a md.mdp file:

title                   = OPLS Lysozyme NPT equilibration 
; Run parameters
integrator              = md        ; leap-frog integrator
nsteps                  = 500000    ; 2 * 500000 = 1000 ps (1 ns)
dt                      = 0.002     ; 2 fs
; Output control
nstxout                 = 0         ; suppress bulky .trr file by specifying 
nstvout                 = 0         ; 0 for output frequency of nstxout,
nstfout                 = 0         ; nstvout, and nstfout
nstenergy               = 5000      ; save energies every 10.0 ps
nstlog                  = 5000      ; update log file every 10.0 ps
nstxout-compressed      = 5000      ; save compressed coordinates every 10.0 ps
compressed-x-grps       = System    ; save the whole system
; Bond parameters
continuation            = yes       ; Restarting after NPT 
constraint_algorithm    = lincs     ; holonomic constraints 
constraints             = h-bonds   ; bonds involving H are constrained
lincs_iter              = 1         ; accuracy of LINCS
lincs_order             = 4         ; also related to accuracy
; Neighborsearching
cutoff-scheme           = Verlet    ; Buffered neighbor searching
ns_type                 = grid      ; search neighboring grid cells
nstlist                 = 10        ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb                = 1.0       ; short-range electrostatic cutoff (in nm)
rvdw                    = 1.0       ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype             = PME       ; Particle Mesh Ewald for long-range electrostatics
pme_order               = 4         ; cubic interpolation
fourierspacing          = 0.16      ; grid spacing for FFT
; Temperature coupling is on
tcoupl                  = V-rescale             ; modified Berendsen thermostat
tc-grps                 = Protein Non-Protein   ; two coupling groups - more accurate
tau_t                   = 0.1     0.1           ; time constant, in ps
ref_t                   = 300     300           ; reference temperature, one for each group, in K
; Pressure coupling is on
pcoupl                  = Parrinello-Rahman     ; Pressure coupling on in NPT
pcoupltype              = isotropic             ; uniform scaling of box vectors
tau_p                   = 2.0                   ; time constant, in ps
ref_p                   = 1.0                   ; reference pressure, in bar
compressibility         = 4.5e-5                ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc                     = xyz       ; 3-D PBC
; Dispersion correction
DispCorr                = EnerPres  ; account for cut-off vdW scheme
; Velocity generation
gen_vel                 = no        ; Velocity generation is off 

mrompp print an estimate for PME load:

Estimate for the relative computational load of the PME mesh part: 0.22

PME (Particle Mesh Ewald) is a method used in molecular dynamics simulations to efficiently calculate long-range electrostatic interactions. The term “PME load” refers to the computational load or the amount of work associated with the PME calculations during the simulation. In molecular dynamics simulations, electrostatic interactions play a crucial role in determining the behavior of the system. However, calculating these interactions directly for all pairs of particles can be computationally expensive, especially for large systems. The PME method addresses this issue by splitting the electrostatic calculations into two parts: short-range and long-range interactions.

According to the original tutorial:

For a cubic box, the optimal setup will have a PME load of 0.25 (3:1 PP:PME - we’re very close to optimal!); for a dodecahedral box, the optimal PME load is 0.33 (2:1 PP:PME). When executing mdrun, the program should automatically determine the best number of processors to assign for the PP and PME calculations. Thus, make sure you indicate an appropriate number of threads/cores for your calculation (the value of -nt X), so that you can get the best performance.

Seems that the optimal value for PME is very customized, based on the system and MD setup.

Next we can run the GROMACS, again. And the file we generated before is md_0_1.tpr

gmx mdrun -deffnm md_0_1

If you are using the GPU accelerator, try

gmx mdrun -deffnm md_0_1 -nb gpu

Step 9 Analysis

Finally! The end of this tutorial is trajectory analysis – lets try some modules in gmx that help process trajectories.

  1. trjconv

This command is used to post-process trajectory files and can help produce the coordinates for subsequent analysis.

gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center

The above command takes the following options:

  • -s md_0_1.tpr: The input structure file (TPR) that contains the system topology.
  • -f md_0_1.xtc: The input trajectory file (XTC) that contains the coordinates over time.
  • -o md_0_1_noPBC.xtc: The output trajectory file (XTC) with periodic boundary conditions (PBC) removed.
  • -pbc mol: Specifies that PBC should be removed by shifting molecules into the central box.
  • -center: Centers the protein in the box.

For comprehensive options, check here.

  1. rms This is the GROMACS built-in module for calculating the root mean square deviation (RMSD) of the protein structure over time.
    gmx rms -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rmsd.xvg -tu ns
    

The command options are as follows:

  • -s md_0_1.tpr: The input structure file (TPR) that contains the system topology.
  • -f md_0_1_noPBC.xtc: The input trajectory file (XTC) with PBC removed.
  • -o rmsd.xvg: The output file (XVG) that will contain the RMSD values over time.
  • -tu ns: Specifies that the time unit should be nanoseconds.

Check here for other flags.

  1. Calculating RMSD relative to the crystal structure: To calculate the RMSD relative to the crystal structure, you can compare the trajectory with PBC removed (md_0_1_noPBC.xtc) to the crystal structure file. Use the following command:
gmx rms -s crystal_structure.pdb -f md_0_1_noPBC.xtc -o rmsd_crystal.xvg -tu ns

Here, crystal_structure.pdb represents the file containing the crystal structure coordinates. The output file rmsd_crystal.xvg will contain the RMSD values calculated relative to the crystal structure.