This chapter describes various operations that are useful in many situations.
MMTK provides an easy way to store (almost) arbitrary objects in files and retrieve them later. All objects of interest to users can be stored, including chemical objects, collections, universes, normal modes, configurations, etc. It is also possible to store standard Python objects such as numbers, lists, dictionaries etc. Some objects used internally in MMTK, e.g. force field evaluators, cannot be stored in files, but that would not make sense anyway. Storage is based on the standard Python module pickle.
Objects are saved with save(object, filename) and restored with object = load(filename). If several objects are to be stored in a single file, use tuples: save((object1, object2), filename) and object1, object2 = load(filename) to retrieve the objects.
It should be noted that when saving an object, all objects that this object refers to are also saved in the same file (otherwise the restored object would be missing some references). In practice this means that saving any chemical object, even a single atom, involves saving the whole universe that this object is part of. However, when saving several objects in one file, objects referenced several times are saved only once.
Frequently it is also useful to copy an object, such as a molecule or a configuration. There are two functions (which are actually taken from the Python standard library module copy) for this purpose, which have a somewhat different behaviour for container-type objects (lists, dictionaries, collections etc.). copy(object) returns a copy of the given object. For a container object, it returns a new container object which contains the same objects as the original one. If the intention is to get a container object which contains copies of the original contents, then deepcopy(object) should be used. For objects that are not container-type objects, there is no difference between the two functions.
MMTK can write objects in specific file formats that can be used by other programs. Currently two file formats are availble: the PDB format, widely used in computational chemistry, and the VRML format understood by VRML browsers as a representation of a three-dimensional scene for visualization. MMTK also provides a more general interface that can generate graphics objects in any representation if a special module for that representation exists. In addition to facilitating the implementation of new graphics file formats, this approach also permits the addition of custom graphics elements (lines, arrows, spheres, etc.) to molecular representations.
External files are generated with OutputFile(filename, format), where the string filename specifies the file name and the string format must be either "pdb" or "vrml". The return value is a special file object with two methods: write(object) writes an object (a chemical object, a collection, or a universe) to the file, and close() closes the file.
The most common need for file export is visualization. There is a special function view(object) which creates a temporary export files, starts a visualization program, and deletes the temporary file. An optional argument indicates the configuration to be used (default is the current configuration). A second optional argument can be given to specify the visualization program and model:
MMTK also allows visualization of normal modes and trajectories using animation. Since not all visualization programs permit animation, and since there is no standard way to ask for it, animation is implemented only for the programs XMol and VMD.
Normal modes are visualized using view(mode), where mode is a single-mode object. Optional parameters are a scaling factor for the atomic displacements (default is 1.) and an object or a collection of objects to be shown (by default the whole universe for which the normal modes have been calculated).
Trajectories are animated using view(trajectory) with optional parameters first (number of first configuration; default is 1), last (number of last configuration; default is None meaning the last configuration in the trajectory), step (number of configurations between two frames; default is 1), and object (object to be displayed; default is None meaning the whole universe).
There is also a function to animate an arbitrary list of configurations, e.g. configurations read in from PDB files. It is viewSequence(object, list_of_configurations).
For more specialized needs, MMTK permits the creation of graphical representations of most of its objects via general graphics modules that have to provided externally. Such graphics modules are available for currently exist for VRML (version 1), VRML2 (aka VRML97), and for the visualization program VMD. The VRML module is part of the required modules for MMTK, so any MMTK user should have it. Modules for other representations (e.g. rendering programs) can be written easily; it is recommended to use the existing modules as an example.
To generate a graphical representation of a visualizable object (i.e. all chemical objects, universes, collections, and fields), the method graphicsObjects() must be called. It returns a list of graphics objects, created from classes in the graphics module. There several optional keyword arguments:
The following example will generate a backbone representation for a protein and add arrows indicating the principal axes of inertia:
from mmtk import * import Numeric, LinearAlgebra import VRML protein = Protein('insulin') center, inertia = protein.centerAndMomentOfInertia() mass = protein.mass() diagonal, directions = LinearAlgebra.eigenvectors(inertia.array) diagonal = Numeric.sqrt(diagonal/mass) graphics = protein.graphicsObjects(graphics_module = VRML, model = 'backbone', color = 'red') for length, axis in map(None, diagonal, directions): graphics.append(VRML.Arrow(center, center+length*Vector(axis), 0.02, material=VRML.EmissiveMaterial('green'))) scene = VRML.Scene(graphics) scene.view()
MMTK can read and write trajectories in the DCD format used by the programs CHARMM and X-PLOR. Note that, unlike MMTK trajectory files, DCD files use a machine-dependent format; it is not always possible to read files on a different system than the one they were produced on. If you try to read DCD files on an incompatible system, MMTK will probably print an error message (most likely "number of atoms in DCD file does not match universe"), but it might also read the file and produce meaningless data.
Another problem with reading DCD files is the correct identification of atoms. In MMTK, each atom has an identity that is preserved in all files, including trajectories; users need not take care about atom identification themselves. In contrast, DCD files store data with no more than an atom number as an identification. Therefore a DCD file by itself is not sufficient for any analysis; a second file, such as a topology file (PSF) or a coordinate file (CRD or PDB), is necessary to provide a full system description. MMTK can only deal with PDB files, so you must have a PDB file corresponding to your DCD file. If you only have a PSF file, you need CHARMM or X-PLOR to generate a PDB file. If you have a CRD file, you can also use the Babel utility program to create a PDB file.
The first step in reading a DCD file is creating the system from the PDB file. You must create the complete system without omitting any parts, otherwise the number of atoms in your system won't agree with the DCD file. The chapter "Constructing a molecular system" contains example code for creating a complete system from a PDB file.
The actual reading procedure is very similar to running a minimization or molecular dynamics algorithm. The DCD file is treated as the source of a trajectory and can be used just like, for example, the trajectory generated by an MD integrator). In particular, it can be written to an MMTK trajectory file. All the options described in the chapter on dynamics apply.
The following example shows the complete procedure of converting a DCD trajectory to an MMTK trajectory:
from mmtk import * pdb_file = 'example.pdb' dcd_file = 'example.dcd' mmtk_trajectory_file = 'example.nc' world = InfiniteUniverse() sequence = PDBFile(pdb_file).readSequenceWithConfiguration() world.protein = Protein(map(PeptideChain, sequence)) world.water = waterFromPDBSequence(sequence) world.other = unknownResiduesFromPDBSequence(sequence) t = Trajectory(world, mmtk_trajectory_file, "w","DCD conversion") dcd_reader = DCDReader(world, trajectory=(0, None, 1, t), log=(0, None, 10, stdout, ('time',)), dcd_file = dcd_file) dcdReader() t.close()
The function writeDCDPDB(configurations, dcd_file_name, pdb_file_name, delta_t) produces a compatible combination of a PDB file and a DCD file. The first argument must be a sequence of configurations, the second and third specify the two file names, and the last argument, which indicates the time step between consecutive configurations, is optional and defaults to 0.1 ps. The resulting PDB file will contain the first configuration in the sequence, the DCD file will contain all the configurations.
The following example shows the complete procedure of converting a subset of an MMTK trajectory to a DCD trajectory:
from mmtk import * mmtk_trajectory_file = 'example.nc' first_step = 0 last_step = 1000 # actually the first one that is *not* written skip = 10 # take every tenth step pdb_file = 'example.pdb' dcd_file = 'example.dcd' trajectory = Trajectory(None, mmtk_trajectory_file, 'r') configurations = trajectory.configuration[first_step:last_step:skip] delta_t = skip*(trajectory.time[1]-trajectory.time[0]) writeDCDPDB(configurations, dcd_file_name, pdb_file_name, delta_t) trajectory.close()
MMTK has a few functions that return random points and other random objects:
Sometimes it is necessary to generate objects (atoms or molecules) on a lattice. To facilitate this task, MMTK defines lattice objects which are essentially sequence objects containing points or objects at points. Lattices can therefore be used like lists with indexing and for-loops. The contents of a lattice are determined by a function that is passed to the lattice constructor as an argument. This function is called for every point on the lattice (with a single vector argument), and its return value is stored in the lattice. If no function is passed, or if None is passed, the points themselves are stored.
A general rhombic lattice is constructed with RhombicLattice(elementary_cell, lattice_vectors, cells, function). The first argument is a list of points in the elementary cell. The second argument is a list of lattice vectors. Each lattice vector defines a lattice dimension (only values from one to three make sense) and indicates the displacement along this dimension from one cell to the next. The third argument is a list of integers whose length must equal the number of dimensions. Each entry specifies how often a cell is repeated along this dimension. The last argument is the object creation function described in the last paragraph.
The important special case of a rhombic lattice with only one point per elementary cell can be created with BravaisLattice(lattice_vectors, cells, function). The even more specialized case of a simple cubic lattice is obtained with SCLattice(cell_size, cells, function), where cell_size defines the edge length of the elementary cell and cells specifies the number of repetitions along each axis.
It is often necessary to identify all objects that are within a certain distance of each other or from a given point. A straightforward search procedure is very inefficient for a large system. A more efficient solution is to divide the whole system into cubic partitions first and do explicit distance calculations only for objects whose partition is within the distance limit. In MMTK there are two special types of collections for this purpose.
PartitionedCollection(partition_size) creates such a partition with partition_size as the edge length of the cubic partitions. An optional second argument specifies an object or a list/collection of objects to be added to the collection. The resulting object has all the operations that are defined on standard collections. In addition, the following methods are defined:
All distances are calculated between the centers of mass of the elements of the collection. For some applications it is necessary to choose the objects on the basis of atomic distances. MMTK offers a specialized partitioned collection for this purpose, which is created with PartitionedAtomCollection(partition_size). When objects are added to such a collection, it is not the objects themselves but their atoms that become the elements of the collection.
For analyzing or visualizing atomic properties that change little over short distances, it is often convenient to represent these properties as functions of position instead of one value per atom. Functions of position are also known as fields, and mathematical techniques for the analysis of fields have proven useful in many branches of physics. Such a field can be obtained by averaging over the values corresponding to the atoms in a small region of space.
MMTK provides two types of fields, scalar fields (i.e. the function of position is a number) and vector fields (i.e. the function of position is a vector). A scalar field is created by AtomicScalarField(object, grid_size, values). The first argument, object specifies the object whose atoms define the region and values of the field. The second argument, grid_size specifies the edge length of the cubic regions over which the values are averaged. The last argument, values must be a mapping object that defines a value for each atom in object. Usually this mapping object is an object of type ParticleScalar. Vector fields are created analogously by AtomVectorField, whose arguments are idential, except that the values must yield vectors. Usually the last argument for AtomicVectorField is an object of type ParticleVector.
AtomicScalarField objects offer the following methods:
AtomicVectorField objects offer the following methods:
Both types of fields also offer the graphicsObjects method described elsewhere, and the method particleVariable(), which creates a ParticleScalar or ParticleVector object with values for each atom by interpolation.
The following example produces a visualization of the difference between two conformations of a protein. For simplicity the second conformation is generated by a rotation; in a real application it would be read from a file or obtained by some calculation. The atomic displacement vectors are turned into a vector field, and this vector field is shown by arrows superimposed on the molecule.
from mmtk import * import VRML universe = InfiniteUniverse() protein = Protein('insulin') universe.addObject(protein) configuration1 = copy(universe.configuration()) universe.rotateAroundCenter(Vector(0.,0.,1.), 0.5) configuration2 = universe.configuration() displacement = configuration1-configuration2 field = AtomicVectorField(universe, 0.5, displacement) graphics = protein.graphicsObjects(graphics_module = VRML, model = 'backbone', color = 'red') + \ field.graphicsObjects(graphics_module = VRML, color = 'black') scene = VRML.Scene(graphics) scene.view()
A frequent problem in determining force field parameters is the determination of partial charges for the atoms of a molecule by fitting to the electrostatic potential around the molecule, which is obtained from quantum chemistry programs. Although this is essentially a straightforward linear least-squares problem, many procedures that are in common use do not use state-of-the-art techniques and may yield erroneous results. MMTK provides a charge fitting method that is numerically stable and allows the imposition of constraints on the charges.
The first step in charge fitting is the generation of test points around the molecule at which the electrostatic potential is evaluated and fitted. A useful approach is the use of random points that are uniformly distributed in a shell around the molecule. The function evaluationPoints(object, n, lower, upper) provides a list of n points around the atoms of the specified object, where each point is at least a distance lower away from all non-hydrogen atom and at most a distance upper away from any atom. There are default values of 0.3 for lower and 0.5 for upper.
The next step is calculating the electrostatic potential at each point using a quantum chemistry program. MMTK cannot provide any direct help, but it is usually easy to write a small Python program which generates the necessary input files and extracts the results from the output file.
The final fitting step is done by ChargeFit(object, points, constraints). Here points can be a list of (point, potential) tuples or a dictionary whose entries have a configuration object as their key and a list of (point, potential) tuples as their value. The second variety permits multiple-configuration fits; obviously the test points must have been generated separately for each configuration. The last argument specifies the charge constraints. It can be None (no constraints) or a list of constraint objects. The following constraint objects are available:
The result of a charge fit is a ChargeFit object, which can be used like a dictionary mapping atoms to charges. In addition, it has some attributes which return global information about the fit:
For running large applications on appropriate machines, MMTK contains an experimental interface to the molecular dynamics program DL_POLY developed at Daresbury laboratory. The interface consists of a way to generate the files FIELD and CONFIG that DL_POLY reads to specify the force field and initial configuration. The third input file, CONTROL, must be written manually. At the moment, the DL_POLY interfaces works only for the standard Amber force field.
Since the DL_POLY interface is not an integral part of MMTK, it must be imported explicitly by from DLPOLY import DLPOLY. Then the command DLPOLY(system, force_field, text) will generate the two DL_POLY input files. The last argument is a string that will be put into the first line of the text files as an identifier.
Note that the DL_POLY interface is not supported and may disappear or stop working in future versions.