Friday, December 11, 2009

Not all PDB entries are reliable; some could be plain fake

With interest, I have browsed the recent thread in the PDB mailing list (pdb-l), "Retraction of 12 Structures" posted by Michael Sadowski and followed-up by Kevin Karplus et al. The story is about Krishna Murthy, a former scientist at the University of Alabama at Birmingham (UAB), who has been alleged to fabricate protein structures and published papers on them. Here is an informative comment by firebug36 from the above link:
I am a protein crystallographer myself, so just trust me - the results this gentleman [Murthy] published were falsified, and not in a smart way. The structures [for C3b] deposited in the Protein Data Bank made no physical sense.

Allegations against UAB group were first brought to light by several prominent people in the field, and not UAB officials:

http://www.nature.com/nature/journal/v448/n7154/full/nature06102.html

Accordingly to the post of Kevin Karplus, "several of the PDB files by Krishna Murthy's group were identified as problematic in the RosettaHoles paper". Naturally, then, comes the question, "should we remove ALL the PDB files from Krishna Murthy's group as suspect?"

The way Murthy's case coming to spotlight may represent an exception rather than norm. Imagine the scenario that he did not publish his C3b structure in Nature which caught the attention from leading crystallographers (Bert Janssen1, Randy Read2, Axel Brünger and Piet Gros), maybe Murthy is still publishing on protein structures today. In a sense, it is a hard to believe how Murthy could falsify 12 protein structures and published 9 papers in prestigious journals (including Nature, Cell, PNAS, JMB, Biochemistry, JBC etc) which have been cited 449 times.

PDB contains the state-of-the-art experimental data of bio-macromolecular structures. Yet, the archive is certainly full of inconsistencies/errors of various types. It would be helpful to know how many PDB entries are largely or partially wrong, and which can be taken as "gold standard" as far data quality is concerned.

This case gives an excellent lesson for those performing data-mining on macromolecular structures. Nowadays, PDB structures are many and keep increasing rapidly, but they are clearly of varying quality. Structural bioinformatics is about solving biology problems using informatics tools. Thus knowing the caveats of your data (how reliable are they?) and tools (what are their limitations?) is a prerequisite to draw sound scientific conclusions.

No comments:

Post a Comment

You are welcome to make a comment. Just remember to be specific and follow common-sense etiquette.