Saturday, August 14, 2010

PDB format, how many variants are there?

In the field of structural biology of macromolecules (proteins and nucleic acids), the PDB format is still the primary one, even though it has shown its age/limitations, and mmCIF and PDBML have been introduced as alternatives.

In 3DNA, PDB is currently the only accepted format for the analysis routines (analyze/cehs), the primary format for rebuilt structures (rebuild/fiber), and various other output structure files (e.g., bestpairs.pdb etc.). By and large, 3DNA follows the PDB format documented at the RCSB website. In my maintaining and supporting of 3DNA and the forum over the years , I have come across other variants of the PDB format. Software packages customize PDB in subtle, sometimes substantial, ways that cause problems with 3DNA. Some examples follow:
  • Recently, a user tried to analyze an RNA duplex using w3DNA, but found that "the server seems not to recognize the file". It turned out that the so-called PDB format does not have residue name/number and chain id aligned in columns properly. For example, among other issues, the chain id in the problematic PDB is not at column #22 as specified by the standard.
  • A couple of months back, another user asked how to handle AMBER-introduced ribonucleotide abreviations (RA/RC/RG/RU). Compared to the above "PDB" variant, the AMBER PDB is well-behavored, and can be easily recognized by 3DNA after introducing corresponding one letter matches in file baselist.dat. Interestingly, a couple of years ago, DA/DC/DG/DT was introduced into the PDB format to distinguish deoxyribonucleotide (DNA) from ribonucleotide A/C/G/U (RNA). The AMBER convention, by using RA/RC/RG/RU for RNA, appears to be more symmetric.
  • From my (limited) experience in helping a 3DNA user several years ago, I know that CHARMM uses three-letter residue names for DNA (e.g., ADE/CYT/GUA/THY). There is actually a short Perl utility program (x3dna2charmm_pdb) in 3DNA for converting 3DNA generated PDB file to that recognized by CHARMM.
  • As noted in another post, "PDB ATOM coordinates record", Babel/Open Babel converted PDB files (mostly of small molecules) do not follow atom naming conventions.
As widely used and standard as PDB format is, there are still many variants in common use. Thus, when you meet a problem related to a PDB file, it helps to check if it conforms to the standard. With 3DNA, I've always tried to incorporate as many PDB variants as possible, so users can run 3DNA on them transparently.