Saturday, June 11, 2011

Conformation of the sugar ring in nucleic acid structures

The conformation of the five-membered sugar ring in DNA/RNA structure can be characterized using the five corresponding endocyclic torsion angles (see Figure below).
v0: C4'-O4'-C1'-C2'
v1: O4'-C1'-C2'-C3'
v2: C1'-C2'-C3'-C4'
v3: C2'-C3'-C4'-O4'
v4: C3'-C4'-O4'-C1'
Due to the ring constraint, the conformation can be characterized approximately by 5 - 3 = 2 parameters. Using the concept of pseudorotation of the sugar ring, the two parameters are the amplitude (τm) and phase angle (P).

One set of widely used formula to convert the five torsion angles to the pseudorotation parameters is due to Altona & Sundaralingam: "Conformational Analysis of the Sugar Ring in Nucleosides and Nucleotides. A New Description Using the Concept of Pseudorotation" [J. Am. Chem. Soc., 1972, 94(23), pp 8205–8212]. As always, the concept is best illustrated with an example. Here I use the sugar ring of G4 (chain A) of the Dickerson-Drew dodecamer (1bna/bdl001), with Matlab/Octave code:
# xyz coordinates of the sugar ring: G4 (chain A), 1bna/bdl001
ATOM     63  C4'  DG A   4      21.393  16.960  18.505  1.00 53.00
ATOM     64  O4'  DG A   4      20.353  17.952  18.496  1.00 38.79
ATOM     65  C3'  DG A   4      21.264  16.229  17.176  1.00 56.72
ATOM     67  C2'  DG A   4      20.793  17.368  16.288  1.00 40.81
ATOM     68  C1'  DG A   4      19.716  17.901  17.218  1.00 30.52

# endocyclic torsion angles:
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4
Pconst = sin(pi/5) + sin(pi/2.5)  # 1.5388
P0 = atan2(v4 + v1 - v3 - v0, 2.0 * v2 * Pconst);  # 2.9034
tm = v2 / cos(P0);  # amplitude: 48.469
P = 180/pi * P0;  # phase angle: 166.35 [P + 360 if P0 < 0]
The Altona & Sundaralingam (1972) pseudorotation parameters are what have been adopted in 3DNA. The Curves+ program, however, uses another set of formula due to Westhof & Sundaralingam: "A Method for the Analysis of Puckering Disorder in Five-Membered Rings: The Relative Mobilities of Furanose and Proline Rings and Their Effects on Polynucleotide and Polypeptide Backbone Flexibility" [J. Am. Chem. Soc., 1983, 105(4), pp 970–976]. The two sets of formula by Altona & Sundaralingam (1972) and Westhof & Sundaralingam (1983) give slightly different numerical values for the two pseudorotation parameters  (amplitude τand phase angle P).

Since 3DNA and Curves+ are currently the most commonly used programs for conformational analysis of nucleic acid structures, the subtle differences in pseudorotation parameters may cause confusions for users who use both programs. With the same G4 (chain A, 1bna) sugar ring, here is the Matlab/Octave script showing how Curve+ calculates the pseudorotation parameters:
# xyz coordinates of sugar ring G4 (chain A, 1bna/bdl001)

# endocyclic torsion angles, same as above
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4

v = [v2, v3, v4, v0, v1]; # reorder them into vector v[]
A = 0; B = 0;
for i = 1:5
    t = 0.8 * pi * (i - 1);
    A += v(i) * cos(t);
    B += v(i) * sin(t);
A *= 0.4;   # -48.476
B *= -0.4;  # 11.516

tm = sqrt(A * A + B * B);  # 49.825

c = A/tm; s = B/tm;
P = atan2(s, c) * 180 / pi;  # 166.64

For this specific example, i.e., the sugar ring G4 (chain A, 1bna/bdl001), the pseudorotation parameters as calculated by 3DNA following Altona & Sundaralingam (1972) and Curves+ following Westhof & Sundaralingam (1983) are as follows:

         amplitude (τm)     phase angle (P)
3DNA        48.469             166.35
Curves+     49.825             166.64
Needless to say, the differences are subtle, and few people will notice/bother at all. For those who do care about such little details, however, hopefully this post will help you understand where the differences actually come from.

Sunday, June 5, 2011

Lower case chain identifiers in PDB format

First formulated in early 1970s, the PDB format is rigid with fixed columns for designated contents in its ATOM/HETATM records. Specificlly, a single column, #22, is assigned for the chain identifier (id). Traditionally, the 26 upper case letters of English alphabet (A-Z), space (i.e., ' '), and the single digits (0-9) have been used as chain ids. Up until the ribosomal structures came up, I guess, those 26 + 1 + 10 = 37 characters had been sufficient for the chain ids.

To the best of my knowledge, for a long time, most PDB parsers assume upper case chain ids. Indeed, 3DNA v1.5 automatically converts each ATOM/HETATM records to upper cases. The first time I became aware of lower case chain ids was when I saw a post in the 3DNA forum, titled "Small bug in find_pair", where a user reported the 'w' vs 'W' chain ids in PDB entry 1VSP. Then I refined 3DNA so that the case of chain ids can be preserved, through an undocumented command line option (as a feature for internal testing purpose).

My view to make 3DNA chain ids case-sensitive has been reinforced when I read the article "Crystal structures of CGG RNA repeats with implications for fragile X-associated tremor ataxia syndrome" recently published in Nucleic Acids Research. The asymmetric unit of the unmodified CGG-repeats-containing duplex (GCGGCGGC)2, NDB entry NA1017 / PDB entry 3R1C, contains a total of 36 chains: designated as A-Z, plus a-j. Without distinguishing cases of the chain ids, the 3DNA output would become quite confusing.

Thus, in future releases of 3DNA, the default would be switched to preserve the case of chain ids. This chain id 'case' serves as an excellent example that scientific software products, unlike publications per se, are not fixed but need continuous care and maintenance to meet the challenges of an evolving world.