Sunday, November 14, 2010

Proper labeling of O1P and O2P atoms in a phosphate group

Recently, a question on the 3DNA o1p_o2p utility program in the forum led me to reflect on the proper labeling of O1P/O2P atoms in a phosphate group. As is well-known, in DNA/RNA structures, the phosphate group (see figure below left) connected two neighboring nucleosides.

The two nonbridging oxygen atoms of the phosphate group (the horizontal Os in the O-P=O line, left) are named O1P and O2P in PDB files (see also figure to the right). Stereochemically, O1P and O2P are also designated as pro-R and pro-S oxygens, respectively.

Presumably, the structural files in the PDB and NDB databases should be consistent and follow the standard nomenclature. In practice, however, some entries in the NDB had mislabeled O1P/O2P atoms (e.g., adh026). I first noticed this issue when I superposed the A-DNA adh026 to its 3DNA rebuilt version with the sugar-phosphate backbone. I observed an unreasonably large RMSD only for the octamer adh026, while the RMSDs were much smaller (as expected) for the B-DNA dodecamer bdl084 and the 146-bp nucleosomal DNA in pd0001 (see $X3DNA/examples/analyze_rebuild distributed with 3DNA v2.0). Since the standard building blocks in 3DNA were applied consistently, I traced the cause of the large RMSD problem to the PDB file of adh026 itself, and finally identified it was actually due to the mislabeling of the O1P/O2P atoms.

The utility program o1p_o2p was written specifically for the purpose of checking if the O1P/O2P atoms are properly labeled in a PDB file. In a phosphate group, if O1P/O2P are correctly labeled, then following O1P-->O2P-->O5' in a right-handed sense would point the thumb in the direction of O3' (see the figure up right). As always, how it actually works is best illustrated with an example. Shown below, the second phosphate in adh026 is used (as distributed with 3DNA), with GNU octave script. Here the O1P/O2P atoms are mislabeled since direction has a negative value. In contrast, for a properly labeled phosphate group, direction should be positive.

#ATOM      6  O3*   G A   1       8.396  -3.995  -1.948  1.00 30.86           O  
#ATOM     20  P     G A   2       8.163  -3.069  -0.619  1.00 32.38           P  
#ATOM     21  O1P   G A   2       7.401  -1.917  -1.218  1.00 32.09           O  
#ATOM     22  O2P   G A   2       7.280  -3.934   0.195  1.00 34.05           O  
#ATOM     23  O5*   G A   2       9.600  -2.800  -0.121  1.00 29.41           O  

P   = [8.163  -3.069  -0.619]
O1P = [7.401  -1.917  -1.218]
O2P = [7.280  -3.934   0.195]
O3  = [8.396  -3.995  -1.948]
O5  = [9.600  -2.800  -0.121]

O1P_to_O2P = O2P - O1P                    # -0.12100  -2.01700   1.41300
O2P_to_O5 = O5 - O2P                      #  2.32000   1.13400  -0.31600
O1P_O2P_O5 = cross(O1P_to_O2P, O2P_to_O5) # -0.96497   3.23992   4.54223
P_to_O3 = O3 - P                          #  0.23300  -0.92600  -1.32900

direction = dot(O1P_O2P_O5, P_to_O3)    # -9.2616 < 0: O1P/O2P mislabeld

The O1P/O2P labeling issue is just a little detail I came cross while developing 3DNA. Nevertheless, it serves as an excellent example of the subtleties subtitles that should be taken care of in scientific programming.

Please note that as of 2008, in the remediated PDB/NDB entry adh026, the mislabeled O1P/O2P pair has been correct. More generally, O1P/O2P atoms have now been renamed as OP1/OP2, respectively. 3DNA v2.0 takes care of such naming changes internally; for generated PDB files, however, 3DNA still adopts the conventional O1P/O2P labeling.