Saturday, September 5, 2009

Double helix groove width parameters from 3DNA

In the 3DNA output (from the analyze program) for a DNA/RNA duplex structure, there is a section on "Minor and major groove widths: direct P-P distances and refined P-P distances which take into account the directions of the sugar-phosphate backbones". The underlying algorithm is that of El Hassan and Calladine (1998). ``Two Distinct Modes of Protein-induced Bending in DNA.'' J. Mol. Biol., v282, pp331-343. Note that the P-P distances need to be subtracted by 5.8 Å to take account of the vdw radii of the phosphate groups (2.9 Å), and for comparisons with NewHelix/FreeHelix and Curves.

Using 3DNA fiber models #1 for A-DNA (calf thymus) and #4 for B-DNA (calf thymus), the groove widths are as follows:
                 Minor Groove        Major Groove
P-P Refined P-P Refined
-----------------------------------------------------
A-DNA (#1) 18.5 16.7 15.2 11.1
B-DNA (#4) 11.7 11.7 17.2 17.2
-----------------------------------------------------
From the above table, it is clearly that for A-DNA, the minor and major groove widths for the refined set are smaller than their corresponding non-refined counterparts (i.e., those based on direct P-P distances). For B-DNA, there are no changes between the two sets. It should be noted that in real structures (i.e., non-perfectly regular, as in X-ray crystal structures in the NDB/PDB), there are nearly always some differences between the refined vs. direct P-P distances. As a general rule, the refined set should be used.

One of the key structural differences between A- and B-DNA is their opposite groove dimensions: for B-DNA, the major groove width (~17 Å) is about 5 Å wider than the minor groove width (~12 Å); whereas for A-DNA, the major groove width (~11 Å) is narrower than the minor groove width (~17 Å) by a similar amount. Since the grooves provide binding sites, the difference between A- and B-DNA grooves has important implications in DNA (groove) recognitions by ligands or proteins.

In retrospect, I implemented the El Hassan and Calladine algorithm for calculating the groove widths mainly because of its simplicity: I can understand clearly how it works visually. The algorithm is described in a two-page appendix of the above cited paper. For those who are interest in DNA structures in general and how groove widths are defined in particular, I would strongly recommend them to read the appendix carefully and try to implement it: there is no substitute for first hand experience. For an idealized cases, as the above for fiber A- and B-DNA, the implementation should be straightforward. To be more realistic, an implementation should account for missing phosphate groups in some structures (for testing purpose, simply delete one P atom from a structure), for example.

As is obvious, 3DNA does not calculate groove depths. Over the years, I have actually been approached with requests/suggestions to provide such parameters to complement groove widths. However, for various reasons, none of the algorithms fits with 3DNA. As a general principle, I do not add new functionality to 3DNA simply for the seek of it. I must understand a new piece clearly in order to integrate it with the rest and to be able to respond concretely to possible questions from users.

Some emacs tricks

As a Linux/Unix fan, I am very familiar with vi and use it for quick and simple text editing purposes. Over the years, however, I have been using emacs the most: I like its color coding and programming language-specific editing mode.

Emacs is well-known for its extensibility, so there are many ways to customize it to suit one's taste. In my experience, I have found the following settings convenient:
  • Highlight the current line with a background color (here "greenyellow")
    (require 'highlight-current-line)
    (highlight-current-line-on t)
    (highlight-current-line-set-bg-color "greenyellow")
  • Set transient mark mode on, so that selected text become more obvious
    (setq transient-mark-mode t)
  • Show line number and column number
    (setq line-number-mode t)
    (setq column-number-mode t)
There are many features in emacs that could be handy and I am always trying to learn more of it.

Sunday, August 30, 2009

JMB celebrates 50 years of protein structure determination

In the September 11, 2009 issue (vol.392, issue 1) of JMB, there are a series of three articles on the determination of the first two protein structures (myoglobin and haemoglobin), an achievement accomplished by Perutz and Kendrew and their colleagues at the Cambridge MRC laboratory in 1950s. Of special interest of this series is the three authors — Bror Strandberg, Richard Dickerson and Michael Rossmann — leading scientists in structure biology, then postdocs actively involved in the late stage of the structure determination.

These reviews are vividly written and provide interesting background information and some technical details on X-ray crystal structure determination (especially on phase angle), and have the following titles:
  1. "Building the Ground for the First Two Protein Structures: Myoglobin and Haemoglobin" by Bror Strandberg
  2. "Myoglobin: A Whale of a Structure!" by Richard Dickerson
  3. "Recollection of the Events Leading to the Discovery of the Structure of haemoglobin" by Michael Rossmann
With limited computing power and software support, protein structure determination was a difficult task at that time. Dickerson and Rossmann had to write their own programs to perform some calculations. However, the firsthand experience in both experiment and code development, on a significant project in a famous lab, may in part accounts for their success in structure biology.

I have known Dickerson's work on nucleic acid structures for a while, firstly through the famous Drew-Dickerson dodecamer (CGCGAATTCGCG), and I am intimately familiar with his NewHelix/FreeHelix programs. Nevertheless, it is only after reading his above article do I become aware of his initial protein experience. I like Dickerson's writing a lot. For example, on commenting the different styles of Kendrew and Perutz, he wrote: "John was the mentor, guide, and organizer. …… In contrast, Max was a hands-on bench biochemist whose center of gravity was always the laboratory itself. …… Both styles had their merits: one learned from John, but one learned with Max."

An interesting point from Rossmann's article is his description of a secret he kept to himself for many decades: "I [Rossmann] had been privileged to work on the haemoglobin project with Max, but it was also a project that Max had given his whole life to develop. In my enthusiasm to look at the results, I stole the final discovery from Max. …… With the realization of what I had done, all desire to explore further was completely gone." While I vaguely remembered this story from reading the book "Max Perutz and the Secret of Life" by Georgina Ferry several months ago, Rossmann's personal account would make it unforgettable.

It is worth noting that Perutz and Rossmann were among those few who initiated the PDB in 1971 at a Cold Spring Harbor meeting. Finally, given the expertise of the three authors, it is not surprising to read in the Epilogue that "Indeed, structural biology has become the unifying factor of just about every aspect of biology."