Sunday, October 30, 2011

3DNA base-mutation functionality for the study of pretein-DNA interactions

In my last blog post titled "mutate_bases, a new 3DNA tool for in silico base mutations of 3D nucleic acid structures", I outlined the availability of the mutate_bases program and some areas of its possible applications, including to "perform base-pair mutations in DNA-protein complexes".

In the September 6, 2011 issue of PNAS (vol. 108, no. 36), AlQuraishi and McAdamsa from Stanford University published a high-profile article "Direct inference of protein–DNA interactions using compressed sensing methods" (pp. 14819–14824). See also the thoughtful Commentary by Vijay Pande, "(Compressed) sensing and sensibility". Essentially, by combining concepts from compressed sensing and statistical mechanics, AlQuraishi and McAdamsa have developed a novel approach to determine the energy potential of protein–DNA complexes, leading to "an impressive advance in predictive capability" -- ~90% accuracy compared with ~60% for the best-performing alternative computational methods.

I am so glad to find that 3DNA (ref nos. 12 and 13) was used in the work:
In silico mutagenesis was carried out using the 3DNA software package (12, 13), which maintains the backbone atoms of the DNA molecule, but replaces the base pair atoms in a way that is consistent with the backbone orientation in the crystal. (p.14821, middle of the right column)

Hopefully, the new and novel base mutation functionality in 3DNA will find more applications and be in wider use. Surely, I'll respond promptly to users' feedback and refine related 3DNA tools as necessary.

Tuesday, October 25, 2011

mutate_bases, a new 3DNA tool for in silico base mutations of 3D nucleic acid structures

In response to repeated requests from 3DNA users over the years, I recently (well, a few months ago) released a new utility program, mutate_bases, to the 3DNA suite of software package. As its name implies, mutate_bases can be used for mutating bases in a nucleic acid structure, with two key and unique features: (1) the sugar-phosphate backbone conformation is untouched; (2) the base reference frame (position and orientation) is reserved, i.e., the mutated structure shares the same base-pair/step parameters as the original one.

mutate_bases is a standalone ANSI C program, on a par with other major 3DNA programs (e.g., find_pair, analyze, rebuild, fiber etc). As seen from the help message below, it can be used for any nucleic-acid-containing structures (DNA, RNA, or their complexes) in PDB format:
Usage: mutate_bases mutinfo pdbfile outfile
    'mutinfo' can contain upto 5 fields for each mutation:
        [name=residue_name] [icode=insertion_code]
        chain=chain_id seqnum=residue_number
    Alternatively, 'mutinfo' can be specified with the '-l' option
        followed by a mutation file: '-l mutations.dat'
    o The five fields per mutation can be in any order or CaSe
    o Each field can be abbreviated to its first character
    o Multiple mutations are separated by ';'
    o Fields in [] (i.e., name and icode) are optional
    o Mutation info should be QUOTED to be taken as one entry

    mutate_bases 'c=a s=2 m=DA' 355d.pdb 355d_G2A.pdb
    # mutate G2 in chain A of B-DNA 355d to A
    mutate_bases 'c=a s=2 m=DA; c=B s=23 m=DT' 355d.pdb 355d_mutfile.pdb
    # mutate base-pair G-C (C23 in chain B) to A-T
        # also create file 'mutations.dat', see below
    mutate_bases -l mutations.dat 355d.pdb 355d_mutfile.pdb
        # 355d_mutfile.pdb and 355d_mutfile.pdb would be identical
    mutate_bases 'c=A s=74 m=U' 1evv.pdb 1evv_C74U.pdb
    # mutate C74 in chain A of tRNA 1evv to U

mutate_bases is designed to solve the base mutation problem in a practical sense: robust and efficient, getting its job done and then out of the way. The program can have many possible applications: in addition to perform base-pair mutations in DNA-protein complexes, it should also prove handy RNA modeling and in providing initial structures for QM/MM/MD energy calculations, and in DNA/RNA modeling studies.

Friday, July 29, 2011

Tracking 3DNA citations: Google scholar vs. Web of Science

Over the years, I have been following citations to 3DNA on a regular basis, using both Google Scholar and Web of Science. From a personal perspective, following the citations has turned out to be an excellent way to keep myself informed of progress related to nucleic acid structures.

Up to early this year, I had found 3DNA citations based on Google Scholar to be significantly larger than those from Web of Science. For example, for the 2003 Nucleic Acids Research (NAR) paper, the difference was well over one hundred. Over the past few months, however, I have noticed that citation numbers to the 2003 NAR paper fluctuating up and down in dozens. Moreover, I have been receiving significantly less email notifications to 3DNA citations from Google Scholar. Apparently, Google has been revising the algorithms for Google Scholar, leading to more conservative citation numbers that are now comparable to those from Web of Science.

Here are the current citation numbers to the three 3DNA publications, from both Google Scholar and Web of Science:

Google ScholarWeb of Science
2003 NAR544473
2008 Nature Protocols4341
2009 NAR (w3DNA)1113

It is interesting to note that for the w3DNA paper, Web of Science gives a larger number (13) than Google Scholar. A careful check of the citation result from Web of Science shows that all of the 13 citations are legitimate journal articles. Clearly, Web of Science is missing something in this case.

Overall, I trust Web of Science more, which is why I have been using it to compile citations to the 2003 NAR paper. However, I use Google Scholar more frequently for quick reference simply because it is free and easy to access.

Thursday, July 14, 2011

GpU dinucleotide platform, the smallest unit with key RNA structural features

Compared to DNA, RNA has three salient structural features: it contains ribose sugar, uracil, and is normally single-standed. The O2'(G)...O2P(U) H-bond stabilized GpU dinucleotide platform may turn out to be the smallest unit with all those RNA hallmarks (see Figure below).

Firstly, it must have the guanosine ribose to form the O2'(G)...O2P(U) H-bond.

Secondly, the methyl group in position 5 of thymine would cause steric clash with guanosine, thus disrupting the N2(G)...O4(U) base-base H-bond to form the GpU dinucleotide platform.

Thirdly, a dinucleotide, by definition, is single-standed. The two H-bonds, plus the covalent linkage, makes the GpU platform extremely rigid (see Figure 1 of our 2010 NAR paper).

Moreover, the GpU platform is directional: swapping the two bases while keeping the sugar-phosphate backbone fixed does not allow for a base-base H-bond, thus no UpG dinucleotide platform.

Sunday, June 26, 2011

PDB v3.3 and partial atom occupancy

From the PDB mailing list, I know of the recent announcement "PDB Archive Version 4.0 to be Released July 13, 2011". This "ambitious" review of the PDB archive has resulted in a new set of corrected files in ten categories, including biological assemblies, residual B factors, peptide inhibitors/antibiotics, polymers containing nonstandard polymer linkages, and partial occupancy etc. "These data reflect the wwPDB's continuing commitment to providing accurate and detailed data to users worldwide."

I am interested in changes in the PDB format, and read the PDF document "Description of Changes and Corrections for PDB July 2011 Remediation Release":
PDB format files updated in this remediation release comply with PDB Format Version 3.30. PDBx and PDBML data files comply with the PDB Exchange Dictionary v.4.0, and PDBML XML Schema V4.0, respectively.
Specifically, I checked carefully the section on "Partial occupancy", which is quoted in full below:

In the 2009 remediation, occupancies were corrected in 490 X-ray
and neutron entries. A mistake was made in 104 of these entries:
for atoms with alternate conformer labels and with summed total
occupancy less than 1.0, the occupancies were re-scaled as 1.0/n,
where n is the number of conformers.


The originally deposited occupancies of the affected atoms were
restored and the remediation was then carried out properly, via:
  • Atoms with multiple conformations but identical coordinates and B-values were merged and their occupancies were summed.
  • Atoms which now have (total) occupancies <= 1.0 were left as deposited.
  • Atoms with (total) occupancies > 1 were rescaled proportionally to a sum of 1.0

The occupancies have been corrected in these entries.

This partial occupancy issue reminds me of an extensive and very informative discussion a few months back in CCP4BB, under the thread "what to do with disordered side chains" and its derivatives, about setting "zero" occupancy and/or high B values for PDB ATOM/HETATM records in disordered regions. Over the course of the discussion, Frances Bernstein made the following comment:
I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54).
3DNA does pay attention beyond column 54 (up to 80, actually) for ATOM/HETATM records, but internally it does not make use of the occupancy info. In future releases of 3DNA, I am planning to take occupancy/B-factor into consideration, probably through configurable parameters.

Reading through the "Description of Changes and Corrections for PDB July 2011 Remediation Release", I noticed cases of re-corrections of previous corrections in wwPDB remediation efforts. A concrete example is about partial occupancy in the 2009 remediation: among the 490 corrected X-ray and neutron entries, a mistake was made in 104 of them. As a side note, there was a post by Morten Kjeldgaard, titled "PDB changes data in entries?" early this year in CCP4BB, where partial occupancy was used as an example.

Sunday, June 19, 2011

Sugar pucker correlates with phosphorus-related distance

The sugar puckers in DNA/RNA structures are predominately in either C3'-endo or C2'-endo (see Figure below), corresponding to the A- or B-form conformation in a DNA duplex.

Recently, I (re-)read a few articles related to the RNA backbone by Jane Richardson et al., including
I somehow became interested in the correlation between sugar pucker and a simple distance parameter, as reported in these papers:
C3'-endo and C2'-endo sugar puckers are highly correlated to the perpendicular distance between the C1'–N1/9 glycosidic bond vector and the following phosphate: > 2.9 Å for C3'-endo and < 2.9 Å for C2'-endo. (p.16 of the MolProbity paper)

Out of curiosity and to get a better understanding of this correlation, I played around with some sample cases both visually in RasMol and numerically. Overall, this is a simple geometric problem, i.e., the shortest distance from a point to a line in 3-dimensional space. Given below is the Octave/Matlab script for calculating the distances for G175 and U176 of PDB entry 1JJ2 (the large ribosomal subunit of Haloarcula marismortui):
function d = get_p3_nc_dist(P3, C1, N)
    N_C1 = N - C1;                 # vector from N to C1'
    nv_N_C1 = N_C1 / norm(N_C1);   # normalized vector
    C1_P3 = P3 - C1;               # vector from C1 to P3
    proj = dot(C1_P3, nv_N_C1);
    d  = norm(C1_P3 - proj * nv_N_C1);

## G175 (1jj2)
P3 = [70.104 112.366  44.586];
C1 = [73.017 109.666  45.304];
N = [74.445 109.380  45.288];
d1 = get_p3_nc_dist(P3, C1, N)    # 2.2 Å -- C2'-endo

## U176 (1jj2)
P3 = [66.871 116.402  46.804];
C1 = [68.213 112.454  49.279];
N = [69.678 112.480  49.438];
d2 = get_p3_nc_dist(P3, C1, N)    # 4.6 Å -- C3'-endo
The GpU used in the above example forms a dinucleotide platform, where the sugar of G175 adopts a C2'-endo conformation, and that of U176 has C3'-endo. Indeed, the distance for the G175 nucleotide is 2.2 Å, less than 2.9 Å; whilst the value for U176 is 4.6 Å, greater than 2.9 Å.

It is worth noting the above mentioned articles from Richardson et al. are focused on RNA backbone, without paying attention to the base (pair) geometry. The Zp parameter, which quantifies the z-coordinate of the phosphorus atom in the mean reference frame (see "A-form conformational motifs in ligand-bound DNA structures", JMB 2000), can be easily adapted to the analysis of single stranded RNA structures. For example, the vertical distances of the 3' phosphorus atoms to the G175 and U176 base planes are 1.9 Å and 4.4 Å, respectively.

Since base planes and the phosphorus atoms are the most accurately located entities in a given nucleic acid structure, the nucleotide-based Zp variant presumably would have some advantage over the distance from phosphorus to the glycosidic bond. Naturally, this Zp parameter will be added in future releases of 3DNA.

Saturday, June 11, 2011

Conformation of the sugar ring in nucleic acid structures

The conformation of the five-membered sugar ring in DNA/RNA structure can be characterized using the five corresponding endocyclic torsion angles (see Figure below).
v0: C4'-O4'-C1'-C2'
v1: O4'-C1'-C2'-C3'
v2: C1'-C2'-C3'-C4'
v3: C2'-C3'-C4'-O4'
v4: C3'-C4'-O4'-C1'
Due to the ring constraint, the conformation can be characterized approximately by 5 - 3 = 2 parameters. Using the concept of pseudorotation of the sugar ring, the two parameters are the amplitude (τm) and phase angle (P).

One set of widely used formula to convert the five torsion angles to the pseudorotation parameters is due to Altona & Sundaralingam: "Conformational Analysis of the Sugar Ring in Nucleosides and Nucleotides. A New Description Using the Concept of Pseudorotation" [J. Am. Chem. Soc., 1972, 94(23), pp 8205–8212]. As always, the concept is best illustrated with an example. Here I use the sugar ring of G4 (chain A) of the Dickerson-Drew dodecamer (1bna/bdl001), with Matlab/Octave code:
# xyz coordinates of the sugar ring: G4 (chain A), 1bna/bdl001
ATOM     63  C4'  DG A   4      21.393  16.960  18.505  1.00 53.00
ATOM     64  O4'  DG A   4      20.353  17.952  18.496  1.00 38.79
ATOM     65  C3'  DG A   4      21.264  16.229  17.176  1.00 56.72
ATOM     67  C2'  DG A   4      20.793  17.368  16.288  1.00 40.81
ATOM     68  C1'  DG A   4      19.716  17.901  17.218  1.00 30.52

# endocyclic torsion angles:
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4
Pconst = sin(pi/5) + sin(pi/2.5)  # 1.5388
P0 = atan2(v4 + v1 - v3 - v0, 2.0 * v2 * Pconst);  # 2.9034
tm = v2 / cos(P0);  # amplitude: 48.469
P = 180/pi * P0;  # phase angle: 166.35 [P + 360 if P0 < 0]
The Altona & Sundaralingam (1972) pseudorotation parameters are what have been adopted in 3DNA. The Curves+ program, however, uses another set of formula due to Westhof & Sundaralingam: "A Method for the Analysis of Puckering Disorder in Five-Membered Rings: The Relative Mobilities of Furanose and Proline Rings and Their Effects on Polynucleotide and Polypeptide Backbone Flexibility" [J. Am. Chem. Soc., 1983, 105(4), pp 970–976]. The two sets of formula by Altona & Sundaralingam (1972) and Westhof & Sundaralingam (1983) give slightly different numerical values for the two pseudorotation parameters  (amplitude τand phase angle P).

Since 3DNA and Curves+ are currently the most commonly used programs for conformational analysis of nucleic acid structures, the subtle differences in pseudorotation parameters may cause confusions for users who use both programs. With the same G4 (chain A, 1bna) sugar ring, here is the Matlab/Octave script showing how Curve+ calculates the pseudorotation parameters:
# xyz coordinates of sugar ring G4 (chain A, 1bna/bdl001)

# endocyclic torsion angles, same as above
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4

v = [v2, v3, v4, v0, v1]; # reorder them into vector v[]
A = 0; B = 0;
for i = 1:5
    t = 0.8 * pi * (i - 1);
    A += v(i) * cos(t);
    B += v(i) * sin(t);
A *= 0.4;   # -48.476
B *= -0.4;  # 11.516

tm = sqrt(A * A + B * B);  # 49.825

c = A/tm; s = B/tm;
P = atan2(s, c) * 180 / pi;  # 166.64

For this specific example, i.e., the sugar ring G4 (chain A, 1bna/bdl001), the pseudorotation parameters as calculated by 3DNA following Altona & Sundaralingam (1972) and Curves+ following Westhof & Sundaralingam (1983) are as follows:

         amplitude (τm)     phase angle (P)
3DNA        48.469             166.35
Curves+     49.825             166.64
Needless to say, the differences are subtle, and few people will notice/bother at all. For those who do care about such little details, however, hopefully this post will help you understand where the differences actually come from.

Sunday, June 5, 2011

Lower case chain identifiers in PDB format

First formulated in early 1970s, the PDB format is rigid with fixed columns for designated contents in its ATOM/HETATM records. Specificlly, a single column, #22, is assigned for the chain identifier (id). Traditionally, the 26 upper case letters of English alphabet (A-Z), space (i.e., ' '), and the single digits (0-9) have been used as chain ids. Up until the ribosomal structures came up, I guess, those 26 + 1 + 10 = 37 characters had been sufficient for the chain ids.

To the best of my knowledge, for a long time, most PDB parsers assume upper case chain ids. Indeed, 3DNA v1.5 automatically converts each ATOM/HETATM records to upper cases. The first time I became aware of lower case chain ids was when I saw a post in the 3DNA forum, titled "Small bug in find_pair", where a user reported the 'w' vs 'W' chain ids in PDB entry 1VSP. Then I refined 3DNA so that the case of chain ids can be preserved, through an undocumented command line option (as a feature for internal testing purpose).

My view to make 3DNA chain ids case-sensitive has been reinforced when I read the article "Crystal structures of CGG RNA repeats with implications for fragile X-associated tremor ataxia syndrome" recently published in Nucleic Acids Research. The asymmetric unit of the unmodified CGG-repeats-containing duplex (GCGGCGGC)2, NDB entry NA1017 / PDB entry 3R1C, contains a total of 36 chains: designated as A-Z, plus a-j. Without distinguishing cases of the chain ids, the 3DNA output would become quite confusing.

Thus, in future releases of 3DNA, the default would be switched to preserve the case of chain ids. This chain id 'case' serves as an excellent example that scientific software products, unlike publications per se, are not fixed but need continuous care and maintenance to meet the challenges of an evolving world.

Sunday, May 22, 2011

NAR's top ten articles

Recently, I noticed a new feature in the website of Nucleic Acids Research (NAR), i.e., its selection of top ten articles:
NAR’s Top Ten Articles are updated monthly and show recent articles that have been most often accessed in HTML and PDF formats in the specified month.
In the age of information explosion with flood of scientific journals and articles, it is easy get lost. NAR's pick of top ten and featured articles draws my attention to significant work I may otherwise overlook.

The current top ten articles (March 2011) are all selected from 2009/2010 publications in 'Database', 'Methods online', and 'Survey and Summary'. I am browsing the 2009 article by Thomas LaFramboise, "Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances", to get a better understanding of SNPs.

Sunday, May 15, 2011

Posts in the 3DNA forum reach 600

As of May 6, the total number of posts in the 3DNA forum has reached 600. Created in March 2007, with my debut post titled "Welcome message from Xiang-Jun Lu", the forum is now over four years old. Overall, the forum has served its purpose pretty well. In answering questions, I've been increasingly referring to the posts in the forum. As a concrete example, see the thread of a recent question "Base pair step parameters with a missing base pair".

At less than three posts (about one question) per week on average, I've not felt too much stress in supporting the forum (and maintaining 3DNA) in my spare time. For the most part, I've enjoyed interacting with 3DNA users from everywhere in the world, and with diverse backgrounds. Following the Unix philosophy ("Write programs that do one thing and do it well. Write programs to work together."), 3DNA has proved to be robust and flexible in serving its ever-growing user community. As a matter of fact, few questions I received a couple of years ago were beyond my original consideration of the details while I wrote the code. It is this intimated knowledge of all the underlying algorithms and every bit of their implementations that allows me to answer users' questions quickly and concretely.

As time passes by, however, it has become evident to me that 3DNA needs to be further refined and extended to meet the ever changing needs of its user community. For example, over the past few months, several questions asked in the 3DNA forum are directly relevant but clearly beyond 3DNA's current capabilities. While I'd be interested in implementing some of the requested functionality that make sense to me, doing so is certainly over my spare time limit. On the other hand, my increased understanding of nucleic acid structures and accumulated software expertise make it simply an issue of time and effort to move 3DNA to the next level, far beyond its current application scope and impact.

With posts in the 3DNA forum reaching 600, and citations to 3DNA articles over 600 (Google scholar), I am hopeful something good will happen to the 3DNA project. After all, 6 is a lucky number in traditional Chinese culture.

Fifty years of operon

In the latest issue of Science, there is a one-page editorial titled "The Birth of the Operon" by François Jacob, who won the Nobel Prize in Physiology or Medicine in 1965:
What is the operon, whose 50th anniversary is being celebrated this week? The word heralded the discovery of how genes are turned on and off, and it launched the now-immense field of gene regulation. ... we cannot presume to know how new ideas will arise and where scientific research will lead.
In the next three paragraphs, Jacob provides an insightful and vivid description of his research related to the discovery of the "operon" – a structural gene-regulatory gene ensemble. In consonant with his comment on scientific discovery, he concludes:
Our breakthrough was the result of “night science”: a stumbling, wandering exploration of the natural world that relies on intuition as much as it does on the cold, orderly logic of “day science.” In today’s vastly expanded scientific enterprise, obsessed with impact factors and competition, we will need much more night science to unveil the many mysteries that remain about the workings of organisms.
It is worth noting that the Journal of Molecular Biology (JMB) has recently published a special issue [Volume 409, Issue 1, Pages 1-88 (27 May 2011)], titled "The Operon Model and its Impact on Modern Molecular Biology" with historical accounts and reviews to celebrate operon's 50th anniversary. It is because of this event that motivated me to read the Jacob and Monod 1961 JMB review article "Genetic regulatory mechanisms in the synthesis of proteins" – I have come across this paper so many times before, and should have definitely read it long ago!

Curves+ web server

Through Google Scholar, I become aware of the article online in Nucleic Acids Research (NAR), titled "CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures" by Richard Lavery's group:
Curves+, a revised version of the Curves software for analyzing the conformation of nucleic acid structures, is now available as a web server. This version, which can be freely accessed at, allows the user to upload a nucleic acid structure file, choose the nucleotides to be analyzed and after optionally setting a number of input variables, view the numerical and graphic results online or download files containing a set of helical, backbone and groove parameters that fully describe the structure. PDB format files are also provided for offline visualization of the helical axis and groove geometry.
The website looks quite streamlined, with required input information all in a single page, and the test page also ran smoothly. In less than two years following the publication of Curves+, it is nice to see the Curves+ web server version available, making this analysis tool more readily available to the nucleic acids community.

Nowadays, it seems safe (to the best of my knowledge) to say that only 3DNA and Curves+ conform to the 1999 Tsukuba convention for the description of nucleic acid base-pair geometry, and each of them provides a web interface: web 3DNA and web Curves+.

Sunday, May 1, 2011

Scientific journals on nucleic acids

In my knowledge, Nucleic Acids Research (NAR) is a highly respected scientific journal with a broad impact in the field of nucleic acids. Over the years, I have been browsing NAR webpage on a regular basis to keep myself up to date to the latest development in this area. It is thus no surprise that the initial 3DNA paper was submitted to and published in NAR in 2003. Among the 500+ citations to that 3DNA paper, over 1/5 (100+) articles are from NAR itself (as an example, please see my January 22, 2011 blog post titled "Three structural biology papers in the latest issue of NAR cite 3DNA"). My latest contribution to NAR is the GpU story, which was actually selected as a featured article.

Another related journal I am quite familiar with is RNA, a publication of the RNA society. As the "About" section of its webpage succinctly summarizes,
RNA serves as an international forum for publishing original reports on RNA research in the broadest sense. The journal aims to unify this field by cutting across established disciplinary lines and focusing on "RNA-centered" science.
RNA currently has an impact factor (IF) of 5.198 (2009), slightly lower than NAR's 7.479. It is, nevertheless, a very decent journal in RNA-related research, and I frequently visit its website. As a side note, the GpU paper was initially submitted to RNA for its RNA-specific content and as a way to diversify my publication spectrum (as mentioned above, 3DNA was initially published in NAR). Unfortunately, the GpU paper was rejected by the RNA journal after two rounds of review, spanning over 6 months.

Another journal closely related to RNA (name wise) is called RNA Biology, which even has a slightly higher IF of 5.56. Admittedly, I was not familiar with this journal at all. Browsing through its website, I am interested in seeing the journal's explicit policy to reconsider papers "rejected by high impact journals [CNS] for reasons of novelty and impact, rather than the importance of the study or the integrity of the data." By enclosing "the reviewers’ and/or editorial comments" from these high impact journals, "it is possible the article might be accepted [by RNA Biology] based on its previous review. This will allow the urgent and competitive research to be published on the day of submission."

I became aware of the journal DNA Research quite recently through an email. From its website, "DNA Research is an internationally peer-reviewed journal which aims at publishing papers of highest quality in broad aspects of DNA and genome-related research." The journal currently has an IF of 4.917. Browsing a couple of its online issues, I sense that the journal is more on genome- than structure-related research.

While following up 3DNA citations recently, I noticed the paper titled "Insights into the Structures of DNA Damaged by Hydroxyl Radical: Crystal Structures of DNA Duplexes Containing 5-Formyluracil" by Tsunoda and Taknaka. It was published in the Journal of Nucleic Acids, which I have never (but probably should have) heard of before. From its website, "Journal of Nucleic Acids is a peer-reviewed, open access journal that publishes original research articles as well as review articles in all areas of nucleic acids." By virtue of this structure paper and its citation to 3DNA, I think the journal is surely of personal interest, and I have added it into my watch-list.

To sum up, there are currently four scientific journals (I know of) that are devoted to nucleic acids:
Do I still miss something? Please make your suggestion in the comment area.

[revised on May 17, 2011 by adding RNA Biology]

Saturday, April 23, 2011

Ebook "Gregory Petsko in Genome Biology: The first 10 years"

Over the years, I have read some of Gregory Petsko's monthly columns in Genome Biology while browsing the journal online, and I like his sensible and entertaining columns quite a bit. Recently, I became aware of the ebook from BioMed Central, "Gregory Petsko in Genome Biology: The first 10 years":
Structural biologist Gregory Petsko has contributed a thought-provoking and entertaining monthly column to the scientific journal Genome Biology every month since its launch in 2000. To mark the 10th anniversary of Genome Biology this eBook brings together 10 years of Petsko's columns.
I downloaded the epub version of the book, and googled around, trying to find a corresponding ebook reader for my MacBook Pro (Snow Leopard) – even though I have some ebooks in the generic PDF format, I am not that familiar with epub or mobi. I finally settled with NOOK for Mac from B&N. It turns out reading ebooks with specifically-desinged apps such as NOOK is quite a different, yet more enjoyable, experience than through a PDF reader.

Now the ebook has become the top one in casual reading list. I am reading it from the very beginning, one column at a time, to have a historical perspective. So far I found the columns indeed very "thought-provoking and entertaining".

Friday, April 8, 2011

Tips and tricks from "The Geek Stuff"

As a devoted command line user, I am always interested in learning new tricks to make my life more enjoyable. Recently, I came across Ramesh Natarajan’s blog “The Geek Stuff” which is full of “instruction guides, how-to, troubleshooting tips and tricks on Linux, database, hardware, security and web” to help solve practical problems.

For example, in the section “Best of the Blog”, I recently benefitted quite a bit by reading the following posts:
There are many other helpful tips/tricks as well; since I have bookmarked the site, I will surely come back!

Sunday, April 3, 2011

Scripting in Ruby is fun

Over the years, I have played around with various scripting languages, including awk, bash, Perl, Python and Ruby. By far, I have enjoyed Ruby the most; nowadays, I write scripts nearly exclusively in Ruby.

Created by Yukihiro "Matz" Matsumoto in Japan during the mid-1990s, Ruby became popular worldwide in mid-2000s, with the Rails web application framework. Indeed, I first dug into Ruby through Rails, and by reading David Black's book "Ruby for Rails; Ruby techniques for Rails developers". As an exercise, I implemented the current 3DNA v2.0 website with Rails v1.x. Then I quickly realized that the rapidly evolving Rails framework was beyond my time and interest to follow. However, I did begin to appreciate Ruby's simplicity, consistency and expressiveness. Over the past few years, I have collected over a dozen Ruby-related (e)books, including "The Well-Grounded Rubyist" (David Black, covering v1.9), "The Ruby Programming Language" (David Flanagan and Yukihiro Matsumoto), and "Metaprogramming Ruby: Program Like the Ruby Pros" (Paolo Perrotta). Just as my experience with (ANSI) C, I feel Ruby "wears well as one's experience with it grows" (K&R, in the preface of "The C Programming Language"). The better I know Ruby, the more I enjoy using it.

I recently wrote two Ruby scripts for the analysis of molecular dynamics (MD) simulation trajectories using 3DNA. Honestly, I would not have bothered with Perl for the task (otherwise, it would have been done long time ago), given the sideline nature of my support of 3DNA. Yet, writing and refining the Ruby scripts (with help of git and rake) have turned out to be a pleasant experience. Another reason why scripting in Ruby is fun is due to its large, active and friendly user community; there are many user-contributed libraries (gems) that serve well of common programming needs. As an example, in the 3DNA-MD scripts, I took advantage of the elegant Trollop commandline option parser by William Morgan. I picked Trollop among many other choices because it is self-contained in a single file, simple to use, and "gets out of your way".

In the Ruby community, exciting new developments are happening all the time. Recently, I was drawn to thor, "a simple and efficient tool for building self-documenting command line utilities". Over the past couple of years, I have browsed Sinatra and Sequel – they also look brilliant! Of course, for bioinformatics, there is the BioRuby project.

Overall, in my experience, scripting in Ruby is fun and exciting. Are you a Rubist yet?

Saturday, March 26, 2011

DNA fiber models ABC

Among the 55 fiber models available in 3DNA, the A-, B- and C-DNA types are the most generic – they can be built with bases A, C, G and T in any combination (see table below). Moreover, in addition to the well-known Arnott fiber models (#1, #4 and #7, all from calf thymus), there are newer variants from van Dam & Levitt (#46 and #47) and Premilat & Albiser (#53 to #55).
 1   32.7   2.548  A-DNA (calf thymus)
 4   36.0   3.375  B-DNA (calf thymus)
 7   38.6   3.310  C-DNA (calf thymus)
46   36.0   3.38   B-DNA (BI-type nucleotides)
47   40.0   3.32   C-DNA (BII-type nucleotides)
53  -38.7   3.29   C-DNA (depreciated)
54   32.73  2.56   A-DNA [cf. #1]
55   36.0   3.39   B-DNA [cf. #4]
As shown in Figure 9 of the 3DNA 2003 NAR paper (linked below), the A-, B- and C-DNA fiber models are all right-handed regular straight helices, yet each has distinguished features.
While I could easily envisioned possible applications of the fiber models, especially in connection with analysis and rebuilding routines in 3DNA, it was still a nice surprise to see a recent article by Gossett and Harvey, titled "Computational Screening and Design of DNA-Linked Molecular Nanowires" [Nano Lett., 2011, 11 (2), pp 604–608]. The abstract is quoted below:
DNA can be used as a structural component in the process of making conductive polymers called nanowires. Accurate molecular models could lead to a better understanding of how to prepare these types of materials. Here we present a computational tool that allows potential DNA-linked polymer designs to be screened and evaluated. The approach involves an iterative procedure that adjusts the positions of DNA-linked monomers in order to obtain reasonable molecular geometry compatible with normal DNA conformations and with the properties of the polymer being formed. This procedure has been used to evaluate designs already reported experimentally, as well as to suggest a new design based on pyrrylene vinylene (PV) monomers.
In the article, 3DNA (the web interface version w3DNA) was cited as follows:
The selection of DNA structures is important because the DNA remains fixed throughout the procedure. To reduce the risk of an incorrect result, one should choose a subset of DNA structures that are in some sense representative of DNA conformational space. The DNA structures (A-, B-, and C-form DNA) were obtained using the Web 3DNA web server. We used a poly(dG)-poly(dC) sequence with ideal geometry for each DNA structure. A-DNA was constructed with rise = 2.548 Å and twist = 32.7˚ , B-DNA was constructed with rise = 3.375 Å and twist = 36.0 ˚, and C-DNA was constructed with rise = 3.310 Å and twist = 38.6 ˚.
Indeed, this is a novel application of fiber DNA ABC models!

Sunday, March 20, 2011

3DNA citations reach over 500

On Friday, June 5, 2009, I blogged on the topic titled "3DNA citations reach over 300". At that time, I wrote (towards the end):
I still remember that the number of citations to 3DNA was less than 150 nearly two years ago [~ summer 2007], when I started to wrote the first draft of our 2008 Nature Protocols paper. Now it is more than doubled! I would blog on this topic again when the number reaches 500.
When I checked Google scholar for 3DNA citations right now, the citation number is already over 500 for the initial 2003 3DNA NAR paper alone. Combined with the two direct follow-ups – the 2008 Nature Protocols paper and the 2009 NAR web server paper – the three 3DNA publications have been cited a total of 550 times.

Again, as noted in that blog post,
In my opinion, some of 3DNA features are still (heavily) underused. Now that we have a sizable user community, 3DNA could only become better and would be more widely used. I have every reason to believe that in the not-so-distant-future, the citations to 3DNA would reach over 1000.
A decade after its initial humber release, 3DNA has been successfully applied to many real-world problems. As spare time permits, I have actively maintained and continuously refined 3DNA based largely on users' feedbacks. Over the time, I also see clearly that 3DNA can be moved to the next level both in functionality and usability to enjoy an even larger/broader impact.

Now more than half-way through, it won't be long when citations to 3DNA reach 1000, and then beyond.

Sunday, March 13, 2011

Review article on NMR analysis of protein–DNA interactions by Milon et al

Through Google scholar, I became aware of a recent review article by Milon et al., titled "Nuclear magnetic resonance analysis of protein–DNA interactions" in the journal J. R. Soc. Interface:
This review focuses on the experimental strategies currently employed to solve structures of protein–DNA complexes and to analyse their dynamics. It highlights how these approaches can help in understanding detailed molecular mechanisms of target recognition.
I browsed through the text to get myself more familiar with NMR the methodology and its applications in protein-DNA recognition. I was surprised that 3DNA was cited in the article, especially with respect to its unique analyze/rebuild complementarity:
In addition, several software programs have been developed to model DNA bending such as the 3DNA program, which allows analysis of DNA structural parameters and enables it to be rebuilt with customized DNA models [76]. Several Web servers have been created recently and provide interesting tools to analyse and rebuild DNA models [77,78].
I am only wishing that 3DNA's neat features could be more widely recognized; hopefully I'd have the opportunity to further refine 3DNA and move it to the next level.

Saturday, March 5, 2011

Retraction of scientific publications

Once in a while, I come across retraction notices of scientific publications in leading journals/magazines. Even for cases not directly related to my research areas, I normally browse through them.

In the March 3, 2011 issue of Nature, there is a retraction of the Letter "Mediation of pathogen resistance by exudation of antimicrobials from roots" [Nature 434, 217–221 (2005)]. I am intrigued by the first sentence of the note:
The authors wish to retract this Letter after a key reference by Walker et al. (ref. 9 in this Letter) was retracted from the scientific literature.

It turns out that the 2003 Walter et al. J. Agric. Food Chem. paper (withdrawn in October 2009) and the 2005 Nature Letter were from the same group. Overall, it took ~6 years each for the two papers to be retracted. As of today, they have been cited 76 and 84 times respectively accordingly to Google scholar.

Sunday, February 27, 2011

Evidences for transient Hoogsteen base pairs in canonical DNA duplex

In the February 24, 2011 issue of Nature, there is an interesting article by Nikolova et al., titled "Transient Hoogsteen base pairs in canonical duplex DNA". Its main discovery is succinctly summarized in the abstract:
By using nuclear magnetic resonance relaxation dispersion spectroscopy in concert with steered molecular dynamics simulations, we have observed transient sequence-specific excursions away from Watson–Crick base-pairing at CA and TA steps inside canonical duplex DNA towards low-populated and short-lived A•T and G•C Hoogsteen base pairs. The observation of Hoogsteen base pairs in DNA duplexes specifically bound to transcription factors and in damaged DNA sites implies that the DNA double helix intrinsically codes for excited state Hoogsteen base pairs as a means of expanding its structural complexity beyond that which can be achieved based on Watson–Crick base-pairing.
Geometrically, the Hoogsteen base pair is related to the Watson-Crick base pair by a 180-degree rotation about the glycosidic bond (N9–C1'). While the A•T Hoogsteen base pair is classic, the similar G•C+ Hoogsteen pair (with protonation of cytosine N3) is equally possible. The A•T and G•C Hoogsteen base pairs have two perfect H-bonds, so they are energetically stable. As for their existence in DNA duplex, the most direct evidence comes from the "trap" experiments (see Fig.3 of the paper). In the News & Views section, Honig and Rohs provide a nice recap of the main point and implications of this work.

As also observed in another recent publication, "Replication infidelity via a mismatch with Watson–Crick geometry", the base sequence has a subtle role in influencing the base-pairing schemes, three-dimensional structures and biological functions of DNA. However, we should not forget that only the Watson-Crick base pairs, and to a less extent, the G-U wobble pair, have the correct symmetry to ensure a "regular" double helical structure.

Sunday, February 20, 2011

Canned responses in gmail make it easy to send common messages

Through Gary Rosenzweig's MacMost Now video #509 "Gmail Labs" (January 28, 2011), I first heard of "Canned Responses" in Gmail Labs:
Email for the truly lazy. Save and then send your common messages using a button next to the compose form.
This is a truly handy feature that I have long been waiting for! Yet even though I am aware of Gmail Labs and enabled quite a few experimental features a while ago, I've not been searching Gmail Labs for new features ever since.

Over the past few weeks, I have found "Canned Responses" increasingly indispensable in my support of the 3DNA forum (as a sideline project):
  1. When I activate a new 3DNA forum registration, I've always included a "standard" message to "make the forum policy upfront and explicit, in order to avoid misunderstandings or surprises." Previously, I had to copy-and-paste, e.g., from a specifically created text file or elsewhere. Surely, this worked, but I had felt intuitively that there must be a better way to get the job done. Well, that's exactly where "Canned Responses" fit in!
  2. Over the past few months, I have been ever more bothered by spam registrations. So as a further filter, I have been sending the following enquiry message to each suspicious registration for activation:
    Thanks for your registration at the 3DNA forum. Please tell me a little bit about yourself and elaborate on how 3DNA could be useful to your project; we would like to make the forum spam-free.

    See also "Further notes on forum registration and posting" -- you may not need to register.
    Here once again, the "Canned Responses" feature makes my life much easier! Moreover, this step turns out to be extremely effective; a large percentage of registrations is filtered out at this final stage.
Are you using gmail? If so, you may also want to give "Canned Responses" a try.

Sunday, February 13, 2011

Making data maximally available?

In the February 11, 2011 issue of Science (Vol. 331 no. 6018 p. 649), there is an editorial, titled "Making Data Maximally Available". Indeed, the issue contains a special section on "Dealing with Data".
Science is driven by data. New technologies have vastly increased the ease of data collection and consequently the amount of data collected, while also enabling data to be independently mined and reanalyzed by others ... It is obvious that making data widely available is an essential element of scientific research.
Especially, I like the following two (proposed) new policies:
  1. To extended data access requirement "to include computer codes involved in the creation or analysis of data." If properly implemented/enforced, this policy could significantly increase the repeatability and assessment of published results. In my experience, I have observed too many times that secrets are hidden in the seemingly "little" subtle details.
  2. "To produce a single list that combines references from the main paper and the SOM" (supporting online material) to "provide credit and reveal data sources more clearly". Potentially, this will also increase the citation of method papers.
Hopefully, other journals will follow Science's lead to make data maximally available, and to present data more transparently.

Sunday, February 6, 2011

A G-T mismatch with perfect Watson-Crick geometry

In the February 1, 2011 issue of PNAS [108(5)], there is an interesting article "Replication infidelity via a mismatch with Watson–Crick geometry" by Bebenek et al. They solved the Pol λ DL ternary complex (PDB id: 3PML) which has a G-T nascent mispair in "perfect" Watson-Crick geometry (see their Fig 4, linked below).

From an H-bonding (energitic) point of view, the G-T mispair (with three "H-bonds") can only be possible if G or T is in the rare enol tautomeric state, or is ionized. The pH dependence of single nucleotide disincorporation seems to be consistent with an ionized base pair. Note the G-T mispair is different from a Wobble pair in which G and T have a relative sheared motion (see also Fig. 4C above).

I am glad to find that 3DNA was used in deriving the parameters. By design, 3DNA should be able to identify such "unusual" mispair as easily as for a normal Watson-Crick pair. As noted in our 2008 3DNA Nature Protocols paper,
By taking advantage of the standard base reference frame and selected geometric features, the find_pair program within 3DNA can identify all possible nucleic-acid base pairs, whether they are canonical Watson–Crick or noncanonical pairs and are made up of normal or modified bases, in any tautomeric form or protonation state. (p1217)
Moreover, 3DNA does notice and signify (wit a * instead of the normal -) the atypical H-bonding feature of the G-T mispair to draw further attentions.
5 T-*---g  [3]  O2 - N2  3.06  N3 * N1  2.97  O4 * O6  2.66
Hopefully, this example helps illustrate some of 3DNA's unique features that would hopefully be more widely recognized and applied.

Sunday, January 30, 2011

Trial of scientific papers by blogs and tweets?

In the January 20, 2011 issue of Nature, there is an interesting News Feature article by Apoorva Mandavilli, titled "Peer review: Trial by Twitter":
Blogs and tweets are ripping papers apart within days of publication, leaving researchers unsure how to react.
Specifically, two widely publicized papers in Science are singled out: one is about the longevity genes identified through genome-wide association study (GWAS), published last July; another is a more recent one on arsenic bacteria, published last December. In each case, while "the popular media was trumpeting the finding, other researchers were taking to the web to criticize the paper’s methodology." Yet, the authors failed to hold up their claims in the papers.

With great interest, I've been following the story on arsenic bacteria. I first noticed this work through Science Podcast, and found the topic of "arsenic" life intriguing. So for general knowledge, I read carefully the abstract and browses through the text. One week later came the Nature editorial "Response required", and from which, I followed the link to Rosie Redfield's blog post "Arsenic-associated bacteria (NASA's claims)". I read the post, and many of the comments therein; while I do not understand many of the technical details, I had no difficulty in following her argument. The lead author of the arsenic bacteria paper, Dr. Felisa Wolfe-Simon, did respond to comments on December 16, 2010. Science also published an interview with Wolfe-Simon, titled "Discoverer Asks for Time, Patience Over Arsenic Bacteria Controversy". On the same day, Redfield was quick to write another blog post "Comments on Dr. Wolfe-Simon's Response", which again has received many comments. So far, the story is still unfolding, and Science has promised to publish technical comments and responses in early 2011.

As a related topic, throughout the CCP4bb, I noticed the letter to editor titled "Is too ‘creative’ language acceptable in crystallography?" by Alexander Wlodawer et al. I agree fully with the authors that "While figures of speech are often useful and even educational, flashy titles combined with hyperbolae and imprecise language can mislead or deceive nonspecialist readers and should therefore be avoided."

In the Internet Age, bloggers and tweeters clearly have an important role to play in the assessment of research findings. In scientific publications, what counts is not how much one claims, but to what extent one can hold up such claims. Solid work holds up over time and the scrutiny of peers.

Saturday, January 22, 2011

Three structural biology papers in the latest issue of NAR cite 3DNA

While browsing the latest 39(2) January 2011 issue of Nucleic Acids Research (NAR), I found, to my great surprise, three papers that cite 3DNA. These papers, all under the "structural biology" section, are of interest to me from their titles and abstracts, so I downloaded the PDF versions and read through each of them.

For this blog post, #100 by incidence, it would be intriguing to look into the context to see how 3DNA is cited.

"Asymmetric DNA recognition by the OkrAI endonuclease, an isoschizomer of BamHI" by Vanamee et al. (Mount Sinai School of Medicine, and New England Biolabs):

Analysis of the stereochemical quality of the protein model and assignment of secondary structure were conducted with PROCHECK (13). DNA analysis was performed with 3DNA (14). Solvent-accessible surface areas were calculated in CNS with the algorithm of Lee and Richards employing a 1.4-Å probe(15). Figures were prepared using PyMOL ( [p713, from bottom left to middle right]

"DNA intercalation without flipping in the specific ThaI–DNA complex" by Firczuk et al. (Poland, Germany and UK):
An oligoduplex with the correct sequence in standard B-DNA geometry was generated with the program 3DNA (44), and manually adjusted to fit the highly distorted DNA in the structure. ... The programs COOT (45), REFMAC (46) and CNS (47) were used for refinement. [p747, top left]
Analysis with the 3DNA software (44) shows that the intercalation increases the rise between base pairs to about 7 Å or approximately twice its usual value (Figure 5B). Phosphorus–phosphorus (Pn–Pn+1) distances in the DNA backbone are only mildly altered (values range from 5.6 to 7.0 Å). Instead, the extra height of the two CG steps comes at the expense of the twist, which is reduced from its usual value of about 36° (360°/10) to between 10 and 15°. A view toward the major groove shows that the inner base pairs of the recognition sequence are strongly tilted (Figure 5). According to the 3DNA software (44), the first CG step has a negative tilt of about ~12°, which results in the oblique orientation of the following base pairs. The central GC step is characterized by a tilt close to 0°, reflecting the nearly parallel arrangement of the middle bases. Finally, the second CG step has a positive tilt of about 15° which restores the standard orientation of the downstream base pairs. A side view of the DNA indicates a bend at the center of the recognition sequence which is primarily due to the positive ~12° roll of the central GC step into the major groove (Table 1). The 3DNA program also indicates that the propeller twist is positive for the specifically recognized sequence, and (as expected for the standard B-DNA) negative for most of the flanking base pairs. [p749, top right]
Table 1. DNA distortion in complex with ThaI restriction endonuclease: all parameters were calculated with the 3DNA software (44). [p750, middle left]

"On the molecular basis of uracil recognition in DNA: comparative study of T-A versus U-A structure, dynamics and open base pair kinetics" by Fadda and Pomès (Ireland and Canada):
MD simulations were run with versions 3.3.3 up to 4.0.4 of the GROMACS software package (47,48).

Structural parameters were determined with the 3DNA software package (51,52). The pymol (www software package was used to generate figures. [p769, bottom right]

Established in 1974 and currently with an impact factor of 7.479, NAR has also been chosen by the Special Libraries Association as one of the top 100 most influential journals in medicine and biology over the last 100 years. The citations by the three papers in the latest issue of NAR illustrate unambiguously 3DNA's big impact in structural biology.

Wednesday, January 19, 2011

Ruby scripts for 3DNA analysis of molecular dynamics simulation trajectories

Over the years, I've been very pleased to see 3DNA's ever-increasing applications for the analysis of molecular dynamics (MD) simulation trajectories of nucleic acid structures. Among its other features, this illustrates that the command-line driven approach of 3DNA makes it easily integrable into the MD analysis pipeline (with some scripting, of course).

However, the lack of direct support of 3DNA to the ever more popular field of MD simulations has caused several obvious problems:
  • Repeated efforts – virtually every lab or even MD practitioner could come up with an ad hoc scripting solution.
  • Hinderance to 3DNA's even wider adoption – new comer to the MD field, or bench scientists interested in dynamics simulations would be scared off.
  • Known issues with existing approaches – most predominately the unnecessary repetitive run of find_pair to deduce base pairing information for each snapshot (model), which not only takes time, but more seriously some pairs could be missing due to melted out or distortion along the trajectory.
I've been following 3DNA's citations for years and I am well aware of the above issues: in addition to answering relevant questions in the 3DNA forum, I have blogged specifically on the topic a few times:
Of course, I am in a unique position to help solve the problem. Indeed, for the past couple of years, I've been thinking of writing scripts to make life easier for MD practitioners who care to use 3DNA. However, due to my lack of experience in MD simulations, constraints of "spare" time (plus laziness), and a want of suitable collaborator, I've never found the incentive to get the job done.

I finally decided to write some Ruby scripts to streamline the process of using 3DNA in MD simulations, after a recent question from Aneesh on "script for extracting data from 3DNA output file" in the 3DNA forum. After a few exchanges of views with Aneesh, and especially with Alpay's contribution of Python script and sample dataset, I've finished up two standalone yet connected Ruby scripts to analyze MD simulation trajectories with 3DNA and then extract various structural parameters. The details, including source code and test examples, are available in the 3DNA forum under "Ruby scripts for the analysis of MD simulation trajectories", in a newly created section titled "Molecular dynamics simulations".

The sample file ("sample_md0.pdb") distributed with the current v0.1 of the scripts contains 21 snapshots (models, 0..20), separated by MODEL/ENDMDL pairs. While the sample is based on a trajectory file from AMBER, any MD simulation packages, or NMR ensembles, can be similarly handled as well.

Now the ball is rolling. As time goes by, and with users' feedback, I will refine and expand the functionality of the scripts as necessary. I am confident to see more applications of 3DNA in the "dynamic" molecular simulation field.

Sunday, January 9, 2011

Open dictionary with command-control-d in Mac OS X

In a recent MacMost Newsletter, I came across the following handy trick in Mac OS X: while the cursor is over a word (not necessarily selected), one can press command-control-d to open dictionary which pops up a little window with the word's definition. This works in Safari and Mail, but not in Preview (unfortunately).

Previously, when I need to check the definition of a word, I right-click on it and then follow the link "Look Up in Dictionary". This will launch or pop up Dictionary with detailed information about the word (Dictionary/Thesaurus/wikipedia).

The right-click method seems to integrate better with other Mac OS X applications. For example, it works with Preview as well. However, now that I know it, I sometimes prefer the command-control-d approach better; it is quick and non-obstructive.

IUPAC nucleotide symbols and their complements

Recently, I was interested in knowing the complements of all the IUPAC nucleotide symbols. As blogged previously, I am quite familiar with the "meaning of nucleotide IUPAC codes" (namely A/C/G/T, and R/Y/N etc). However, when I first check the Gene Infinity website on nucleotide symbols, it still puzzled me for awhile to figure out the meaning of the DNA alphabet (with complements), as except below:
A  C  G  T    M  R  W  S  Y  K    B  D  H  V    N
|  |  |  |    |  |  |  |  |  |    |  |  |  |    |
T  G  C  A    K  Y  W  S  R  M    V  H  D  B    N
For example, some degenerated IUPAC symbols are complemented to themselves (e.g., W–W and S–S), while others are seemingly "hard" to apprehend (e.g., B–V and D–H).

After thinking it for a bit, things begin to become clear. They are based on the complementarity of Watson-Crick base-pairs (A–T and G–C) and the meaning of each degenerated IUPAC nucleotide symbol. For example,
  • W represents A/T, meaning weak (with only two hydrogen-bonds). The complements of A/T are T/A respectively, which is W again.
  • B (not A) represents C/G/T, and their complements are G/C/A respectively, which is V (not U/T).
It is easy to verify that all other complementary pairs follow exactly the same basic principle.