Chemical objects and the database

The module ChemicalObjects

The class ChemicalObject

The most frequently used object hierarchy is the one that describes atoms, groups, molecules, and complexes. It is defined in the module ChemicalObjects. The common base class is the abstract class ChemicalObject, which, however, already inherits from another class, Collection.GroupOfAtoms. The latter is a mix-in class that contains methods which are appropriate for any object that contains atoms, i.e. chemical objects, collections, universes, etc. Examples of such methods are centerOfMass() or rmsDifference().

Chemical objects are constructed from blueprint objects, which are explained below. This is handled by the method __init__ of class ChemicalObject. Chemical objects also have a type, which is the database entry from which they were made. The type's attributes can be used as if they were attributes of the object itself; this is arranged by the method __getattr__.

Chemical objects have a hierarchical structure. The object created by a user application is called a top-level object. It may consist of subobjects (e.g. groups or atoms), which may themselves have subobjects, etc. The lowest level in the hierarchy is the atom. Each chemical object has an attribute parent that points to the next level up in the hierarchy. For a top-level object, it points to the universe of which the object is a part. The methods topLevelChemicalObject() and universe use the chain of references to find top-level object and universe for any arbitrary chemical object. The method fullName() uses the same chain to construct the complete name of a given object.

The method __copy__ makes sure that copying always copies all subobjects, by using deepcopy() for the actual work. Both copying and pickling of chemical objects relies on the existence of the method __getinitargs__ and properly defined initialization methods.

The class Atom

The simplest chemical objects are atoms, represented by instances of class Atom. Most of its methods are concerned with managing the position attribute. The position of an atom can either be defined by a vector or, if the atom is part of a universe, by a reference to the universe's coordinate array. The purpose of this coordinate array is to allow efficient coordinate access in C modules. To ensure consistency, the coordinates should be kept in one place only. Therefore the Python modules access positions via the array used also by the C modules. However, this detail is hidden in the class Atom; other classes access positions only via the method position().

The classes CompositeChemicalObject, Group, Molecule and Complex

The class CompositeChemicalObject is the base class for all composite objects. It takes care of initialization and other common operations. Its subclasses Group, Molecule, and Complex are very similar. They differ mostly in the database they refer to and in the situations in which they can be used (e.g. groups cannot be created directly, whereas complexes can not be part of another composite object).

Atom properties

The most essential information stored in the database about an object is its structure, i.e. its subobjects and the bonds between them. But the database also stores many atom properties, especially force field parameters. It is important to understand how these can be accessed and modified.

For memory efficiency, atom properties are usually not assigned to attributes of each atom. They are kept in dictionaries in the type definitions in the database. However, an atom attribute of the same name always overrides the value in the database, so changing atom properties is easy: just assign a new value to an attribute (e.g. atom.amber_charge = 1.).

To retrieve a property, the method getAtomProperty() is called on the top-level chemical object that contains the atom. Only the top-level object can find the correct value since objects higher up in the hierarchy can override the values given at a lower level. For example, the definition of a methyl group may contain the charge of the carbon atom, but a molecule that uses a methyl group may override this charge by a more specific value.

A general expression for obtaining the property p for atom a is a.topLevelChemicalObject().getAtomProperty(a, p). However, applying this expression to many atoms is rather inefficient, since the top-level object will be searched for each time. In many situations, the top-level object may be known or can be obtained for several atoms at once, which is much more efficient. For this reason, the general expression has not been implemented as a method on atoms.

The module Database

As described in the user's guide, the chemical database contains definitions for object types in the form of short Python programs. Internally, each directory with definitions (atoms, molecules, etc.) corresponds to an instance of class Database. When a specific object type is requested for the first time, the method findType() calls the type constructor (e.g. AtomType), which executes the definition file in an environment determined by the appropriate environment module (e.g. AtomEnvironment). The resulting type object is kept in the database for future use.

Subobjects in type definitions are not instances of the real chemical object classes, but instances of blueprint classes, which are also defined in the module Database. When a real object is constructed from a type (this happens in the class ChemicalObject), each blueprint object is used as a template for the real subobject. The details of this procedure are subject to change in the future.

Specialized objects

The ChemicalObject hierarchy was designed for small molecules of known structure. For macromolecules, especially polymeric ones, the database approach does not make sense; it would be very inconvenient, for example, to write a database definition file for a protein, defining all residues and peptide bonds manually. Therefore specializations of the general hierarchy are unavoidable.

The module Protein

The module Protein defines several specialized objects as subclasses of the ChemicalObject hierarchy: residues (special groups), peptide chains (special molecules), and proteins (special complexes). The main taks of the corresponding classes is to construct the objects from an amino acid sequence specification. Residues are defined in the database in several forms (with all, some, or no hydrogens; there are also special terminal versions). The class PeptideChain can also construct the positions of hydrogen atoms, and class Protein knows about sulfur bridges. Furthermore, the classes define special operations that make sense only for proteins (e.g. finding the backbone).


Table of contents