Example 1: Lysozyme

Example 1: Lysozyme

In this example, we are going to use hen egg white lysozyme (PDB ID 1AKI) to run a MD simulation.

Prepare structure and topology files

Create a working directory:

mkdir lysozyme

cd lysozyme

Download a pdb file from Protein Data Bank

getpdb 1aki

Pymol molecule structure viewer is preinstalled on the system. Due the Python dependency, one needs to deactivate conda to run Pymol.

conda deactivate

pymol 1aki.pdb

After verifying structure, quit pymol and activate conda again.

conda activate

Delete crystal water molecules. Water in the pdb file are crystal water molecules. Gromacs has its own way to add solvent water.

grep -v HOH 1aki.pdb > 1aki_clean.pdb

Create topology file and process the structure file

gmx pdb2gmx -f 1aki_clean.pdb -o 1AKI_processed.gro -water spce

When it will prompts to select a force field, select choice 15 - OPLS force field.

The explanation of pdb2gmx module can be found by running:

gmx help pdb2gmx

-f 1aki_clean.pdb:    Input file

-o 1AKI_processed.gro:   Output file for Gromacs use

-water spce:   The model to build solvent water

Other two files are

topol.top   Topology file defines atom and bond parameters.

posre.itp   Position restraint file

Define the simulation box and solvent

Step 1: Define the box dimensions using the editconf module

gmx editconf -f 1AKI_processed.gro -o 1AKI_newbox.gro -c -d 1.0 -bt cubic

This command use module editconf to center and define a box.

-f 1AKI_processed.gro:  Input file

-o 1AKI_newbox.gro: Output file, the box is 0,0,0 and the coordinates at the bottom line

-c:   center the box

-d 1.0:  Leave at least 1 nm at the edge

-bt cubic: Use cubic box.  There are other choices such as rhombic dodecahedron.

Step 2: Fill the box with water using the solvate module

gmx solvate -cp 1AKI_newbox.gro -cs spc216.gro -o 1AKI_solv.gro -p topol.top

This command uses module solvate to add water molecules into the box.

-cp 1AKI_newbox.pro: Configuration of the protein from the named file

-cs spc216.gro: Configuration of the solvent. Spc216.gro is a generic equilibrated 3-point solvent model good for SPC, SPC/E, or TIP3P water.

-o 1AKI_solv.gro: Output file name

-p topol.top: Topology file name. Solvate module will update this file to include both protein molecule and solvate (SOL) line

Add ions

In topology file [ atom ]  section, the protein total charge is calculated. The charge at the end of this section is the net charge.

1960   opls_272    129    LEU     O2    682       -0.8    15.9994   ; qtot 8

In this example, the net charge is 8.

In MD simulation, we need to balance the charge with ions so that we have a neutral system. This is a two step procedure.

Step 1: Prepare a run input file (extension .tpr) for genion module

MD parameter file ions.mdp contains instructions for Gromacs Preprocessor module grompp to assemble coordinates and topology into an atomic-level input .tpr file.

Sample ions.mdp file

; ions.mdp - used as input into grompp to generate ions.tpr

; Parameters describing what to do, when to stop and what to save

integrator = steep ; Algorithm (steep = steepest descent minimization)

emtol = 1000.0 ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm

emstep = 0.01 ; Minimization step size

nsteps = 50000 ; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions

nstlist = 1 ; Frequency to update the neighbor list and long range forces

cutoff-scheme	= Verlet ; Buffered neighbor searching 

ns_type = grid ; Method to determine neighbor list (simple, grid)

coulombtype = cutoff ; Treatment of long range electrostatic interactions

rcoulomb = 1.0 ; Short-range electrostatic cut-off

rvdw = 1.0 ; Short-range Van der Waals cut-off

pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions

This mdp file tells Gromacs to run an energy minimization.

gmx grompp -f ions.mdp -c 1AKI_solv.gro -p topol.top -o ions.tpr

Module grompp is a gromacs preprocessor. Its job is to make a .tpr file.

-f ions.mdp: Read instruction from ions.mdp file.

-c 1AKI_solv.gro: Coordinates file

-p topol.top: Topology file

-o ions.tpr: Output file. This is an atomic level input file with coordinates and topology all assembled. It's going to be the input of MD simulation. In this case, it will be the input of ion adding module's input file.

Step 2: Use module genion to replace some water molecules with ions

gmx genion -s ions.tpr -o 1AKI_solv_ions.gro -p topol.top -pname NA -nname CL -neutral

-s ions.tpr: Specify structure file ions.tpr.

-o 1AKI_solv_ions.gro: Write to this output file.

-p topol.top: Update topology file to reflect the removal of water and addition of ions.

-pname NA: Use NA for position ion.

-nname CL: Use CL for negative ion.

-neutral: Neutralize the system. In this case, it will replace 8 waters by CL- to offset the 8 positive net charge.

When prompted, choose option 13 SOL so module genion will replace solvent molecules.

After this step, the topol.top file will include CL in its [ molecules ] section:

[ molecules ]; Compound        #molsProtein_chain_A     1SOL         10636CL               8

Energy minimization:

We have a solvated and charge neutral system by now in coordinates file 1AKI_solv_ions.gro with the molecule toplogy in file topol.top. Before we run production MD, we have a few more prepartion steps:

Energy minimization: Remove structure clashes.

Equilibration: Move the structure from high energy state to equilibrated state.

Similar to add ions step, we need to make a MD include-all atomic level .tpr file, with 3 pieces of information:

.mdp file: MD parameter file that serves as instruction script

.gro file: Gromacs coordinate file

.top file: Topology file

Sample minim.mdp file:

; minim.mdp - used as input into grompp to generate em.tpr

; Parameters describing what to do, when to stop and what to save

integrator = steep ; Algorithm (steep = steepest descent minimization)

emtol = 1000.0 ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm

emstep = 0.01 ; Minimization step size

nsteps = 50000 ; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions

nstlist = 1 ; Frequency to update the neighbor list and long range forces

cutoff-scheme = Verlet ; Buffered neighbor searching

ns_type = grid ; Method to determine neighbor list (simple, grid)

coulombtype = PME ; Treatment of long range electrostatic interactions

rcoulomb = 1.0 ; Short-range electrostatic cut-off

rvdw = 1.0 ; Short-range Van der Waals cut-off

pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions

This .mdp file is the same as ions.mdp except the coulombtype. PME is Fast smooth Particle-Mesh Ewald (SPME) electrostatics, more accurate than cutoff in ions.mdp.

gmx grompp -f minim.mdp -c 1AKI_solv_ions.gro -p topol.top -o em.tpr

grompp: Gromacs preprocessor to assemble .gro and .top files.

-f minim.mdp: MD parameter file

-c 1AKI_solv_ions.gro: Coordinate file

-p topol.top: Toplogy file

-o em.tpr: Output file

Once the em.tpr is ready, we can pass it to mdrun module to run an energy minimization.

gmx mdrun -v -deffnm em

mdrun: MD simulation module

-v:  Verbose mode

-deffnm: Define file names of the input and output. If you did not name your grompp output "em.tpr," you will have to explicitly specify its name with the mdrun -s flag.

Since we passed the model em in, mdrun takes em.tpr as input and writes out 4 files that start with em. Gromacs will detect CPU and uses OpenMP to run on maximum available threads. If you would like control the number of threads manually, use option

-ntomp

For example

gmx mdrun -v -ntomp 8 -deffnm em

Will uses 8 threads.

This step produces output files as following:

(base) jmao@gromacs:~/demo/lysozyme$ ls -lttotal 9044-rw-rw-r-- 1 jmao jmao  305030 Oct 18 13:32  em.log-rw-rw-r-- 1 jmao jmao 1524475 Oct 18 13:32  em.gro-rw-rw-r-- 1 jmao jmao  406632 Oct 18 13:32  em.trr-rw-rw-r-- 1 jmao jmao  129712 Oct 18 13:32  em.edr-rw-rw-r-- 1 jmao jmao  848248 Oct 18 13:27  em.tpr

These files are:

em.log: ASCII-text log file of the EM process

em.edr: Binary energy file

em.trr: Binary full-precision trajectory

em.gro: Energy-minimized structure

The energy file is a binary file and not directly viewable. To see the minimization process and evaluate the convergence quality, we can convert this energy file to a plot.

gmx energy -f em.edr -o potential.xvg

energy: Energy module extracts energy components from an energy file.

-f em.edr: The energy file as input

-o potential.xvg: Write out the energy trace plot to this file.

When prompted, enter "10 0". "10" is to select potential and "0" is to end the input.

To view the plot, run

xmgrace potential.xvg

You need to enable X11 forwarding when you ssh to the server. That is to use "-X option in ssh command such as "ssh -X username@serverIP"

Energy equilibration:

While energy minimization brings the structure to a low energy state quickly, we are not at the equilibrated state at a desired temperature, and the equilibrated state including the solvent and solvate.

The eqilibration is often conducted in two phases.

The first phase is conducted under an NVT ensemble (constant Number of particles, Volume, and Temperature). In NVT, the temperature of the system should reach a plateau at the desired value. Typically, 50-100 ps should suffice.

The second phase is conducted under an NPT ensemble (constant Number of particles, Pressure, and Temperature). After NPT euilibration, the presure and density might fluctuate, but the running average should be stable. This phase generally requires  longer to reach equilibration than NVT.

Phase 1: NVT equilibration

Again we need 3 input files to assemble the atomic-level md .tpr file

.mdp file: MD parameter file that serves as instruction script

.gro file: Gromacs coordinate file. Use em.gro from energy minimization

.top file: Topology file. This file keeps the same name topol.top.

nvt.mdp

title = OPLS Lysozyme NVT equilibration 

define = -DPOSRES ; position restrain the protein

; Run parameters

integrator = md ; leap-frog integrator

nsteps = 50000 ; 2 * 50000 = 100 ps

dt = 0.002 ; 2 fs

; Output control

nstxout = 500 ; save coordinates every 1.0 ps

nstvout = 500 ; save velocities every 1.0 ps

nstenergy = 500 ; save energies every 1.0 ps

nstlog = 500 ; update log file every 1.0 ps

; Bond parameters

continuation = no ; first dynamics run

constraint_algorithm = lincs ; holonomic constraints 

constraints = h-bonds ; bonds involving H are constrained

lincs_iter = 1 ; accuracy of LINCS

lincs_order = 4 ; also related to accuracy

; Nonbonded settings 

cutoff-scheme = Verlet ; Buffered neighbor searching

ns_type = grid ; search neighboring grid cells

nstlist = 10 ; 20 fs, largely irrelevant with Verlet

rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)

rvdw = 1.0 ; short-range van der Waals cutoff (in nm)

DispCorr = EnerPres ; account for cut-off vdW scheme

; Electrostatics

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics

pme_order = 4 ; cubic interpolation

fourierspacing = 0.16 ; grid spacing for FFT

; Temperature coupling is on

tcoupl = V-rescale ; modified Berendsen thermostat

tc-grps = Protein Non-Protein ; two coupling groups - more accurate

tau_t = 0.1 0.1 ; time constant, in ps

ref_t = 300 300 ; reference temperature, one for each group, in K

; Pressure coupling is off

pcoupl = no ; no pressure coupling in NVT

; Periodic boundary conditions

pbc = xyz ; 3-D PBC

; Velocity generation

gen_vel = yes ; assign velocities from Maxwell distribution

gen_temp = 300 ; temperature for Maxwell distribution

gen_seed = -1 ; generate a random seed

Assemble the tpr file:

gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr

This command is similar to commands used in adding ions and energy minimization. The instruction parameter file mdp is different. Some parameters worth attention are:

nsteps=5000 and dt=0.2: They combined to define the run time 100 ps

pcoupl= no: No pressure coupling for NVT equilibration

ref_t=300: Temperature at 300K

This will generate file nvt.tpr as input for mdrun:

gmx mdrun -deffnm nvt

The output files have names as nvt.*

To check the temperature convergence, use the energy module choice 16.

gmx energy -f nvt.edr -o temperature.xvg

Type "16 0" at the prompt to select Temperature. Use xmgrace temperature.xvg to view the system temperature.

Phase 2: NPT equilibration

We need 3 input files to assemble the atomic-level md .tpr file

.mdp file: MD parameter file that serves as instruction script

.gro file: Gromacs coordinate file. Use nvt.gro from NVT equilibration

.top file: Topology file. This file keeps the same name topol.top.

In ntp.mdp file:

title = OPLS Lysozyme NPT equilibration 

define = -DPOSRES ; position restrain the protein

; Run parameters

integrator = md ; leap-frog integrator

nsteps = 50000 ; 2 * 50000 = 100 ps

dt = 0.002 ; 2 fs

; Output control

nstxout = 500 ; save coordinates every 1.0 ps

nstvout = 500 ; save velocities every 1.0 ps

nstenergy = 500 ; save energies every 1.0 ps

nstlog = 500 ; update log file every 1.0 ps

; Bond parameters

continuation = yes ; Restarting after NVT 

constraint_algorithm = lincs ; holonomic constraints 

constraints = h-bonds ; bonds involving H are constrained

lincs_iter = 1 ; accuracy of LINCS

lincs_order = 4 ; also related to accuracy

; Nonbonded settings 

cutoff-scheme = Verlet ; Buffered neighbor searching

ns_type = grid ; search neighboring grid cells

nstlist = 10 ; 20 fs, largely irrelevant with Verlet scheme

rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)

rvdw = 1.0 ; short-range van der Waals cutoff (in nm)

DispCorr = EnerPres ; account for cut-off vdW scheme

; Electrostatics

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics

pme_order = 4 ; cubic interpolation

fourierspacing = 0.16 ; grid spacing for FFT

; Temperature coupling is on

tcoupl = V-rescale ; modified Berendsen thermostat

tc-grps = Protein Non-Protein ; two coupling groups - more accurate

tau_t = 0.1 0.1 ; time constant, in ps

ref_t = 300 300 ; reference temperature, one for each group, in K

; Pressure coupling is on

pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT

pcoupltype = isotropic ; uniform scaling of box vectors

tau_p = 2.0 ; time constant, in ps

ref_p = 1.0 ; reference pressure, in bar

compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1

refcoord_scaling = com

; Periodic boundary conditions

pbc = xyz ; 3-D PBC

; Velocity generation

gen_vel = no ; Velocity generation is off 

Some differences over nvt.mdp are:

continuation = yes: This says we will use the coordinates from

pcoupl=Parrinello-Rahman: Presure coupling for NPT

gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr

In this command, an extra option -t nvt.cpt is added because we specified continuation=yes. We use the checkpoint file from NVT phase.

After we have npt.tpr, we run the NPT phase of equilibration:

gmx mdrun -deffnm npt

The output files have names npt.*

Now check the presure:

gmx energy -f npt.edr -o pressure.xvg

Chose option "18 0" to select Pressure.

View the plot:

xmgrace pressure.xvg

Unlike the temperature, presure varies widely. This is normal behavior for MD simulation though, as long as the running average of presure is steady.

To generate running average in xmgrace, Go to Data -> Transformation -> Running average, select a set (Set 0 in our case) and specify length of average (10 ps in our case)

Production MD

The system is ready to run production MD after the equilibration to the desired tenperature. We will run MD for longer time, usually in nano seconds to collect statistically meaningful  results.

md.mdp

title = OPLS Lysozyme NPT equilibration 

; Run parameters

integrator = md ; leap-frog integrator

nsteps = 500000 ; 2 * 500000 = 1000 ps (1 ns)

dt = 0.002 ; 2 fs

; Output control

nstxout = 0 ; suppress bulky .trr file by specifying 

nstvout = 0 ; 0 for output frequency of nstxout,

nstfout = 0 ; nstvout, and nstfout

nstenergy = 5000 ; save energies every 10.0 ps

nstlog = 5000 ; update log file every 10.0 ps

nstxout-compressed = 5000 ; save compressed coordinates every 10.0 ps

compressed-x-grps = System ; save the whole system

; Bond parameters

continuation = yes ; Restarting after NPT 

constraint_algorithm = lincs ; holonomic constraints 

constraints = h-bonds ; bonds involving H are constrained

lincs_iter = 1 ; accuracy of LINCS

lincs_order = 4 ; also related to accuracy

; Neighborsearching

cutoff-scheme = Verlet ; Buffered neighbor searching

ns_type = grid ; search neighboring grid cells

nstlist = 10 ; 20 fs, largely irrelevant with Verlet scheme

rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)

rvdw = 1.0 ; short-range van der Waals cutoff (in nm)

; Electrostatics

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics

pme_order = 4 ; cubic interpolation

fourierspacing = 0.16 ; grid spacing for FFT

; Temperature coupling is on

tcoupl = V-rescale ; modified Berendsen thermostat

tc-grps = Protein Non-Protein ; two coupling groups - more accurate

tau_t = 0.1 0.1 ; time constant, in ps

ref_t = 300 300 ; reference temperature, one for each group, in K

; Pressure coupling is on

pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT

pcoupltype = isotropic ; uniform scaling of box vectors

tau_p = 2.0 ; time constant, in ps

ref_p = 1.0 ; reference pressure, in bar

compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1

; Periodic boundary conditions

pbc = xyz ; 3-D PBC

; Dispersion correction

DispCorr = EnerPres ; account for cut-off vdW scheme

; Velocity generation

gen_vel = no ; Velocity generation is off 

This file is largely the same as NPT equilibration, but runs longer and saves more frequently.

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr

gmx mdrun -deffnm md_0_1

This run takes long time.

Analysis

Trajectory conversion:

As molecules will shift out of the box during long simulation, this is a psot-processing step to strip out coordinates, correct for periodicity, or manually alter the trajectory (time units, frame frequency, etc).

gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center

Select "1 Protein" to be centered and "0 System" to be written out.

trjconv: Gromacs trajectory convert module.

-s md_0_1.tpr: all-atom run input file

-f md_0_1.xtc:  trajectory file as input

-o md_0_1_noPBC.xtc: converted trajectory file as output

-pbc mol: Periodical Boundary Correction (PBC) treatment method to use molecule

-center:  center atoms in box

All analysis will be conducted on the converted trajectory md_0_1_noPBC.xtc.

RMSD - Structure stablity

RMSD module can calculate the RMSD between a trajectory and a reference structure.

gmx rms -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rmsd.xvg -tu ns

Select "4 Backbone" for fitting and "4 Backbone" for RMSD calculation.

-s md_0_1.tpr: all-atom run input file

-f md_0_1_noPBC.xtrc: trajectory file

-o rmsd.xvg: write rmsd plot

-tu ns: specify ns as time unit

Write out snapshots in odb format for viewing

Write out the whole structure at 100 ps

gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -dump 100 -o snapshot100.pdb

Choose "0 System" for the whole system.

Write out the protein structure at every 2 ps

gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -dt 2 -o snapshot.pdb

Choose "1 Protein" for the protein only.

Multiple frames are marked as models in the output pdb file. When viewing the structure with Pymol, the "play" button will show the frames as a movie sequence.