Sunday, July 5, 2009

Errors in PDB entries

In the June 24, 2009 issue of Nature (v459, pp.1038-1039), there is an news item titled "New protein structures replace the old" by Katharine Sanderson, on a 'Dutch software to weed out errors in Protein Data Bank'.

In my experience with software development and using the PDB/NDB, it is certainly not a surprise that there are errors of various types in the macromolecular databases: whenever I apply an algorithm consistently to all the entries in the NDB (which is part of PDB, consisting of only nucleic-acid containing structures), I always notice some inconsistencies. As a more concrete example, blocview, a visualization tool initially developed as a by-product of another project while I was still at Rutgers (and partially involved with the NDB), was once used for correcting errors in the NDB as well.

Pure 're-refinement' of existing structures with software is surely helpful in catching obvious, systemic errors. However, it is impossible to catch all problems, no matter how sophisticated the software could be. Moreover, as put in the comment by yet another phd: "i have hard enough time getting the RCSB to change four atoms in a structure for me." and "scientists must remember that when they click the re-refine button, you read the paper where the structure was reported."

Errors will always be there -- that's just a basic fact of life. It is thus crucial for those who perform structural analysis to draw their conclusions based on not one or just a few purposely selected structures, but on a more objective and extensive ground.

No comments:

Post a Comment

You are welcome to make a comment. Just remember to be specific and follow common-sense etiquette.