Overall, PDB format is simple and very well documented. The simplicity lies just in its 'rigidity', in FORTRAN 77 style. The ATOM/HETATM record description is excerpted below for easy reference:
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------------
1 - 6 Record name "ATOM "
7 - 11 Integer serial Atom serial number.
13 - 16 Atom name Atom name.
17 Character altLoc Alternate location indicator.
18 - 20 Residue name resName Residue name.
22 Character chainID Chain identifier.
23 - 26 Integer resSeq Residue sequence number.
27 AChar iCode Code for insertion of residues.
31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms.
39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms.
47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms.
55 - 60 Real(6.2) occupancy Occupancy.
61 - 66 Real(6.2) tempFactor Temperature factor.
77 - 78 LString(2) element Element symbol, right-justified.
79 - 80 LString(2) charge Charge on the atom.
It won't take much time/lines in a script language such as Perl/Python/Ruby etc to extract specific information one is interested in, e.g., atomic coordinates. However, there are some subtleties that are beyond simple script parsers. On top of that, one needs to understand that not all self-claimed PDB files are standard compliant.
More specifically, a decent PDB format parser must take the following into considerations:
- The four-character atom name specified in columns 13 to 16. Each biological molecule has a convention in naming atoms. For example, the two H-bonds of the A-T pair are between " N1 " (A) to " N3 " (T), and " N6 " (A) to " O4 " (T). In this regard, it worths noting that babel/openbabel converted PDB files do not follow such naming convention.
- The one-character alternate location indicator (altLoc) in column 17
- The one-character residue insertion code (iCode) in column 27.
- Others details to follow ...
No comments:
Post a Comment
You are welcome to make a comment. Just remember to be specific and follow common-sense etiquette.