Sunday, June 26, 2011

PDB v3.3 and partial atom occupancy

From the PDB mailing list, I know of the recent announcement "PDB Archive Version 4.0 to be Released July 13, 2011". This "ambitious" review of the PDB archive has resulted in a new set of corrected files in ten categories, including biological assemblies, residual B factors, peptide inhibitors/antibiotics, polymers containing nonstandard polymer linkages, and partial occupancy etc. "These data reflect the wwPDB's continuing commitment to providing accurate and detailed data to users worldwide."

I am interested in changes in the PDB format, and read the PDF document "Description of Changes and Corrections for PDB July 2011 Remediation Release":
PDB format files updated in this remediation release comply with PDB Format Version 3.30. PDBx and PDBML data files comply with the PDB Exchange Dictionary v.4.0, and PDBML XML Schema V4.0, respectively.
Specifically, I checked carefully the section on "Partial occupancy", which is quoted in full below:

In the 2009 remediation, occupancies were corrected in 490 X-ray
and neutron entries. A mistake was made in 104 of these entries:
for atoms with alternate conformer labels and with summed total
occupancy less than 1.0, the occupancies were re-scaled as 1.0/n,
where n is the number of conformers.


The originally deposited occupancies of the affected atoms were
restored and the remediation was then carried out properly, via:
  • Atoms with multiple conformations but identical coordinates and B-values were merged and their occupancies were summed.
  • Atoms which now have (total) occupancies <= 1.0 were left as deposited.
  • Atoms with (total) occupancies > 1 were rescaled proportionally to a sum of 1.0

The occupancies have been corrected in these entries.

This partial occupancy issue reminds me of an extensive and very informative discussion a few months back in CCP4BB, under the thread "what to do with disordered side chains" and its derivatives, about setting "zero" occupancy and/or high B values for PDB ATOM/HETATM records in disordered regions. Over the course of the discussion, Frances Bernstein made the following comment:
I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54).
3DNA does pay attention beyond column 54 (up to 80, actually) for ATOM/HETATM records, but internally it does not make use of the occupancy info. In future releases of 3DNA, I am planning to take occupancy/B-factor into consideration, probably through configurable parameters.

Reading through the "Description of Changes and Corrections for PDB July 2011 Remediation Release", I noticed cases of re-corrections of previous corrections in wwPDB remediation efforts. A concrete example is about partial occupancy in the 2009 remediation: among the 490 corrected X-ray and neutron entries, a mistake was made in 104 of them. As a side note, there was a post by Morten Kjeldgaard, titled "PDB changes data in entries?" early this year in CCP4BB, where partial occupancy was used as an example.