Xiang-Jun's Corner: PDB ATOM coordinates record

Friday, May 15, 2009

PDB ATOM coordinates record

PDB format is one of the standard formats for biological macromolecular structures (proteins, DNA/RNA, their complexes, etc). It came into existence when the initial Brookhaven PDB was established in 1971. Over the years, PDBML and mmCIF have been proposed as substitutes to allow for more flexibilities, yet PDB format is still the mostly commonly used one. Software dealing with PDB structures each has its own parser, at least for the Coordinate Section (especially the ATOM and HETATM records).

Overall, PDB format is simple and very well documented. The simplicity lies just in its 'rigidity', in FORTRAN 77 style. The ATOM/HETATM record description is excerpted below for easy reference:


COLUMNS        DATA  TYPE    FIELD        DEFINITION
-------------------------------------------------------------------------------------
1 -  6         Record name   "ATOM  "
7 - 11         Integer       serial       Atom  serial number.
13 - 16        Atom          name         Atom name.
17             Character     altLoc       Alternate location indicator.
18 - 20        Residue name  resName      Residue name.
22             Character     chainID      Chain identifier.
23 - 26        Integer       resSeq       Residue sequence number.
27             AChar         iCode        Code for insertion of residues.
31 - 38        Real(8.3)     x            Orthogonal coordinates for X in Angstroms.
39 - 46        Real(8.3)     y            Orthogonal coordinates for Y in Angstroms.
47 - 54        Real(8.3)     z            Orthogonal coordinates for Z in Angstroms.
55 - 60        Real(6.2)     occupancy    Occupancy.
61 - 66        Real(6.2)     tempFactor   Temperature  factor.
77 - 78        LString(2)    element      Element symbol, right-justified.
79 - 80        LString(2)    charge       Charge  on the atom.

It won't take much time/lines in a script language such as Perl/Python/Ruby etc to extract specific information one is interested in, e.g., atomic coordinates. However, there are some subtleties that are beyond simple script parsers. On top of that, one needs to understand that not all self-claimed PDB files are standard compliant.

More specifically, a decent PDB format parser must take the following into considerations:

The four-character atom name specified in columns 13 to 16. Each biological molecule has a convention in naming atoms. For example, the two H-bonds of the A-T pair are between " N1 " (A) to " N3 " (T), and " N6 " (A) to " O4 " (T). In this regard, it worths noting that babel/openbabel converted PDB files do not follow such naming convention.
The one-character alternate location indicator (altLoc) in column 17
The one-character residue insertion code (iCode) in column 27.
Others details to follow ...

If you are serious about your PDB format parser, there is a very simple thing to do: run it against all the NDB entries if you are working on nucleic acid structures (that's what I did for 3DNA), and the whole PDB entries if you are interested in protein structures. If it does not crash and does what it has been designed for, then congratulate yourself!

Xiang-Jun's Corner

Friday, May 15, 2009

PDB ATOM coordinates record

No comments:

Post a Comment

About Me

Links

Topics

Blog Archive