Friday, May 28, 2010

Naming conventions: RasMol, PyMOL and Jmol

Over the years, I've used the following three molecular graphics programs the most: RasMol, PyMOL and Jmol. It is interesting to note their different naming conventions: while their names all end with m-o-l (for molecule, I believe), the cases (used in convention) are different: Mol as in RasMol, MOL as in PyMOL, and mol as in Jmol.

For Jmol, there is actually an FAQ item "How do you write Jmol?" that reads "Capital J, lower case mol. Please do not write it any other way." For PyMOL, I am not aware of a specification on its name – I know that's the "official name" from its home page. Regarding RasMol, that's the most common way to name it, even though the title of its 1995 publication is "RASMOL: biomolecular graphics for all."

As for the prefix in the names, J in Jmol stands for Java; Py in PyMOL for Python; and Ras in RasMol could well be the initials of Roger A. Sayle, the original author of the program.

Sunday, May 23, 2010

MODEL/ENDMDL records in PDB X-ray crystal structures

Over the years, I have always been using the PDB format for nucleic acid structures. Naturally I've thought that I know the format well, especially the "Coordinate Section." According to the PDB documentation,
MODEL/ENDMDL records are used only when more than one structure is presented in the entry, as is often the case with NMR entries.
In my experience, I have always connected the MODEL/ENDMDL pair only with the different models in an NMR entry. However, I was recently bitten by a subtlety in PDB format that is related to the MODEL/ENDMDL records in X-ray crystal structures which contain more than one asymmetric unit in their biological assembly.

As always, the point is best illustrated with an example – here I am using the four-stranded DNA Holliday junction X-ray crystal structure (1zf4/ud0061) solved by Ho and colleagues [PNAS 2005 May 17;102(20):7157-62]. As shown in the NDB website, the asymmetric unit of 1zf4/ud0061 contains only two chains; it is the biological assembly that has the four-stranded DNA junction.

I downloaded the biological assembly coordinates (in PDB format) from the NDB (named it '1zf4.pdb'), and ran blocview on it: blocview -i=1zf4.png 1zf4.pdb. However, the generated image [see Figure (A) below] has only half of what expected – as if the downloaded file contains only coordinates of an asymmetric unit. The mystery was gone when I checked the PDB file and realized that the MODEL/ENDMDL records now also apply to X-ray crystal structures to delineate symmetric-related units. Since 3DNA is designed to handle only one structure – it stops processing whenever an END or ENDMDL record is encountered. Simply change each 'ENDMDL' record to ' ENDMDL' (i.e., adding a space, or any character, for that matter) and run blocview again will get the expected image [Figure (B)].

Note that by default, RasMol also only displays the first model in a PDB file. To see multiple structures, the option '-nmrpdb' must be specified in the command line.