Recently, a
question on the 3DNA o1p_o2p utility program in the forum led me to reflect on the proper labeling of O1P/O2P atoms in a phosphate group. As is well-known, in DNA/RNA structures, the phosphate group (see figure below left) connected two neighboring nucleosides.
The two nonbridging oxygen atoms of the phosphate group (the horizontal
Os in the O-P=O line, left) are named O1P and O2P in PDB files (see also figure to the right). Stereochemically, O1P and O2P are also designated as pro-R and pro-S oxygens, respectively.
Presumably, the structural files in the PDB and NDB databases should be consistent and follow the standard nomenclature. In practice, however, some entries in the NDB had mislabeled O1P/O2P atoms (e.g.,
adh026). I first noticed this issue when I superposed the A-DNA
adh026 to its 3DNA rebuilt version with the sugar-phosphate backbone. I observed an unreasonably large RMSD
only for the octamer
adh026, while the RMSDs were much smaller (as expected) for the B-DNA dodecamer
bdl084 and the 146-bp nucleosomal DNA in
pd0001 (see
$X3DNA/examples/analyze_rebuild distributed with 3DNA v2.0). Since the standard building blocks in 3DNA were applied consistently, I traced the cause of the large RMSD problem to the PDB file of
adh026 itself, and finally identified it was actually due to the mislabeling of the O1P/O2P atoms.
The utility program
o1p_o2p was written specifically for the purpose of checking if the O1P/O2P atoms are properly labeled in a PDB file. In a phosphate group, if O1P/O2P are correctly labeled, then following
O1P-->O2P-->O5' in a right-handed sense would point the thumb in the direction of O3' (see the figure up right). As always, how it actually works is best illustrated with an example. Shown below, the second phosphate in
adh026 is used (as distributed with 3DNA), with GNU
octave script. Here the O1P/O2P atoms are mislabeled since
direction has a negative value. In contrast, for a properly labeled phosphate group,
direction should be positive.
#ATOM 6 O3* G A 1 8.396 -3.995 -1.948 1.00 30.86 O
#ATOM 20 P G A 2 8.163 -3.069 -0.619 1.00 32.38 P
#ATOM 21 O1P G A 2 7.401 -1.917 -1.218 1.00 32.09 O
#ATOM 22 O2P G A 2 7.280 -3.934 0.195 1.00 34.05 O
#ATOM 23 O5* G A 2 9.600 -2.800 -0.121 1.00 29.41 O
P = [8.163 -3.069 -0.619]
O1P = [7.401 -1.917 -1.218]
O2P = [7.280 -3.934 0.195]
O3 = [8.396 -3.995 -1.948]
O5 = [9.600 -2.800 -0.121]
O1P_to_O2P = O2P - O1P # -0.12100 -2.01700 1.41300
O2P_to_O5 = O5 - O2P # 2.32000 1.13400 -0.31600
O1P_O2P_O5 = cross(O1P_to_O2P, O2P_to_O5) # -0.96497 3.23992 4.54223
P_to_O3 = O3 - P # 0.23300 -0.92600 -1.32900
direction = dot(O1P_O2P_O5, P_to_O3) # -9.2616 < 0: O1P/O2P mislabeld
The O1P/O2P labeling issue is just a little detail I came cross while developing 3DNA. Nevertheless, it serves as an excellent example of the subtleties
subtitles that should be taken care of in scientific programming.
Please note that as of 2008, in the remediated PDB/NDB entry
adh026, the mislabeled O1P/O2P pair has been correct. More generally, O1P/O2P atoms have now been renamed as OP1/OP2, respectively. 3DNA v2.0 takes care of such naming changes internally; for generated PDB files, however, 3DNA still adopts the conventional O1P/O2P labeling.