Saturday, August 7, 2010

Calculation of DNA bending angle

Once in a while, I receive the question (via email or on the 3DNA forum) on how to calculate DNA bending angle, mostly associated with DNA-protein complexes. This question certainly qualifies as an FAQ; Indeed, I did consider writing an entry on it in the 3DNA website. However, I've hesitated to do so because the topic is more subtle and complicated than it appears.

On its face, an angle is defined by two vectors; let's call them a and b, and if each is normalized, then the angle (in degrees) between them is: acos(dot(a, b)) * 180/pi. Geometrically, after moving the tails of the vectors into the same position (e.g., origin), the heads would normally define a plane, unless a and b are strictly parallel (0°) or anti-parallel (180°).

DNA structures are three-dimensional, normally far more complicated than a single number can quantify. The concept of DNA bending angle, as I understand it, is only applicable to DNA structures with two relatively straight fragments (as in CAP-DNA complexes). Under such situations, one can fit a least-squares (ls) linear helical axis to each of the two fragments, and calculate the angle between them. Towards this end, 3DNA outputs the following section when it judges that the input structure is not strongly curved. Using 355d/bdl084, which is distributed with 3DNA, as an example:
Global linear helical axis defined by equivalent C1' and RN9/YN1 atom pairs                                    
Deviation from regular linear helix: 3.30(0.52)
Helix:    -0.127  -0.275  -0.953
HETATM 9998  XS    X X 999      17.536  25.713  25.665
HETATM 9999  XE    X X 999      12.911  15.677  -9.080
Average and standard deviation of helix radius:
P: 9.42(0.82), O4': 6.37(0.85),  C1': 5.85(0.86)
Where the Helix: line gives the normalized vector along the "best-fit" helical axis. The two HETATM records provides the two end points on the helix, and they are directly related to the Helix: line by a simple equation. Following the above example, we have (Octave/Matlab code):

XE = [12.911 15.677 -9.080];
XS = [17.536 25.713 25.665];

dd = XE - XS
% -4.6250 -10.0360 -34.7450

Helix = dd / norm(dd)
% -0.12685 -0.27526 -0.95296 ==> [-0.127 -0.275 -0.953]

With the two HETATM records, one can easily add them into the original PDB file to display the helical axis using a molecular graphics programs (e.g., RasMol, Jmol, PyMol). Moreover, the two helix vectors can be used to reorient the original PDB structure into a view so that one helical fragment lies along the x-axis, and the other in the xy-plane. As documented in detail in recipes #4 on "Automatic identification of double-helical regions in a DNA–RNA junction" (2008 3DNA Nature Protocols paper), "The chosen view allows for easy visualization and protractor measurement of the overall bending angle between the two relatively straight helices."

The following points are well worth noting:
  • The ls fitting procedure used in 3DNA follows SCHNAaP, which was based on the algorithm in the well-known NewHelix program, maintained by Dr. Richard Dickerson upto the 1990s. While fitting a global linear helical axis to strongly curved DNA structures clearly makes no much sense with derived parameters (NewHelix itself has been replaced by FreeHelix, also from Dickerson), I do believe it is meaningful to fit a linear helix with relatively straight DNA fragment. That's why I have kept this functionality in SCHNArP and 3DNA; 3DNA bending angle calculation serves as an example illustrating the point it provides an "intuitive" way for biologists to understand how the bending angle is calculated; it can actually be measured directly.
  • Instead of directly ls-fitting a linear helical axis with 3DNA, one can alternatively superimpose a regular fiber model into the DNA fragment, and then derive the straight helical axis from the fitted coordinates. The two approaches normally gives slightly different numerical values, as would be expected.
  • Overall, bending angle is (at most) an approximate measure of DNA curvature. In my opinion, the concept is only applicable for comparing a set of structures, each with two relatively straight fragments. Even in such cases, the relative spatial relationship between two helical fragments is more complicated than a simple (bending) angle could quantify. Specially, it is not meaningful to make a selling point in slight differences of DNA bending angles.

Friday, August 6, 2010

Genomic code for nucleosome positioning?

Over the past few years, the research on nucleosome positioning has been a hot topic: Segal et al., through a series of high-profile publications, have proposed that "the intrinsic nucleosome sequence preferences contribute substantially to nucleosome organization and chromatin function in vivo" (i.e., there exists a "genomic code") and developed bioinformatics tools to predict patterns of nucleosome positions based on genomic DNA sequence. However, as is clear from the title of a paper by Zhang et al., "Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo", it is highly debatable if the nucleosome sequence preferences is strong enough to be called a "code", in the canonical sense as for the genetic code.

While I am (generally) aware of the controversy about the "genomic code", my understanding of the issue has become much clearer after reading the correspondences between both sides published in the 17(8), August 2010 issue of Nature Structural & Molecular Biology [NSMB]:
  • "Nucleosome sequence preferences influence in vivo nucleosome organization" (pp918 - 920) by Kaplan … and Segal
  • "Evidence against a genomic code for nucleosome positioning" Reply to “Nucleosome sequence preferences influence in vivo nucleosome organization” (pp920 - 923) by Zhang … and Struhl
  • "A preoccupied position on nucleosomes" (p923) by Pugh
Reading carefully through the text, it is clear how important the little details could be: 20-bp vs 40-bp windows? How significant the preference should be to be claimed as a "code"? Pugh clarifies another source of confusion, in the terminology of nucleosome ‘occupancy’ vs ‘positioning’. In my understanding, the dispute on such details itself speaks strongly against the existence of "genomic code".

It should be noted that early work by Drew & Travers (JMB85) and Satchwell et al. (JMB86) provided evidence that the rotational positioning of DNA about the histone octamer is regulated by DNA sequence-dependent bending preferences. Goodsell & Dickerson quantified the Satchwell trinucleotide model by assigning a set of roll angles to predict DNA bending from base sequence. This bending model, along with five others, was implemented by SCHNArP for building approximate 3D DNA structures (with fixed values for slide, twist etc other parameters). However, this functionality is dropped out from 3DNA, simply because of the arbitrariness/subtleties in picking up the parameters.

Sunday, August 1, 2010

Principle axes of inertia as a mean to reorient a molecule

In structural biology, it is often helpful to reorient a molecule in a specific view for easy visualization and comparison. With no a priori knowledge of a feature, the (objective) principle axes of inertia can be used as a mean to automatically reset a structure into its most extended orientation (and two other orthogonal views). In 3DNA, rotate_mol has such a functionality which is explored by blocview to generate the characteristic, simplified molecular images adopted by the RCSB PDB and NDB.

Recently, the thread (initiated by Dr. Pascal Auffinger) at the 3DNA forum titled "rotate_mol ambiguous orientations" has led me to think the topic deeper. As always, the issue is best illustrated with an example. The file 355d_A6_T19.pdb contains the coordinates of the base pair A6-T19 of PDB entry 355d. Let xyz be a variable in Octave (Matlab) containing the coordinates of all atoms in file 355d_A6_T19.pdb, the following m-file provides the detailed procedure:
xyz = [
13.954 12.080 15.071
13.092 11.246 15.953
14.769 11.453 14.020
13.056 13.202 14.406
12.184 13.965 15.222
11.308 14.846 14.376
12.033 15.991 13.896
10.654 14.188 13.168
9.262 14.480 13.256
11.373 14.820 11.974
11.810 16.185 12.511
13.051 16.742 11.953
14.253 16.097 11.802
15.197 16.853 11.305
14.585 18.084 11.106
15.066 19.322 10.605
16.329 19.520 10.209
14.198 20.357 10.535
12.938 20.162 10.946
12.370 19.048 11.444
13.259 18.034 11.497
14.865 29.619 5.710
14.172 30.844 5.281
16.107 29.695 6.515
13.846 28.701 6.510
12.684 28.174 5.864
11.909 27.311 6.826
12.626 26.090 7.129
11.584 27.960 8.173
10.189 27.833 8.395
12.409 27.166 9.184
12.544 25.808 8.521
13.716 24.979 8.867
13.496 23.688 9.310
12.381 23.202 9.467
14.631 22.971 9.551
15.943 23.388 9.393
16.852 22.613 9.632
16.107 24.759 8.938
17.488 25.309 8.745
14.995 25.475 8.711 ]

cov_mtx = cov(xyz)

% cov_mtx =
% 3.4094 2.3860 -1.4222
% 2.3860 33.7089 -15.7175
% -1.4222 -15.7175 7.8786

[v, lambda] = eig(cov_mtx)

% v =
% 0.094830 0.992835 -0.072709
% 0.419520 -0.106092 -0.901525
% 0.902779 -0.054989 0.426575

% lambda =
% 0.42535 0 0
% 0 3.23319 0
% 0 0 41.33836

x = v(:, 2);
y = v(:, 3);
z = cross(x, y);

% z =
% -0.094830
% -0.419520
% -0.902779

num = size(xyz, 1);
xyz2 = (xyz - ones(num, 1) * mean(xyz)) * [x y z]

% xyz2 =
% 1.148136 10.032849 -0.514731
% 0.332293 11.223635 -0.879359
% 2.081609 10.090517 0.619843
% .................................
% 2.277813 -4.170312 -0.501254
% 3.601180 -4.848890 -0.688714
% 1.110301 -4.831784 -0.491249
Among other subtitles, note that:
  1. lambda 3 has the largest value (41.33836), i.e., with the biggest variance, and its corresponding eigenvector is set as the vertical y-axis.
  2. lambda 2 shows intermediate variation (3.23319), and its corresponding eigenvector is set as the horizontal x-axis.
  3. The z-axis is derived as the cross-product of x- and y-axes to ensure a right-handed frame. Note that it is the opposite of eigenvector 1.
  4. The original xyz coordinates are moved to its geometric center [xyz - ones(num, 1) * mean(xyz)], and then projected onto the preferred frame to get the transformed coordinates (xyz2) with the molecule in the most extended view (vertical then horizontal).