Xiang-Jun's Corner

Sunday, January 30, 2011

Trial of scientific papers by blogs and tweets?

In the January 20, 2011 issue of Nature, there is an interesting News Feature article by Apoorva Mandavilli, titled "Peer review: Trial by Twitter":

Blogs and tweets are ripping papers apart within days of publication, leaving researchers unsure how to react.

Specifically, two widely publicized papers in Science are singled out: one is about the longevity genes identified through genome-wide association study (GWAS), published last July; another is a more recent one on arsenic bacteria, published last December. In each case, while "the popular media was trumpeting the finding, other researchers were taking to the web to criticize the paper’s methodology." Yet, the authors failed to hold up their claims in the papers.

With great interest, I've been following the story on arsenic bacteria. I first noticed this work through Science Podcast, and found the topic of "arsenic" life intriguing. So for general knowledge, I read carefully the abstract and browses through the text. One week later came the Nature editorial "Response required", and from which, I followed the link to Rosie Redfield's blog post "Arsenic-associated bacteria (NASA's claims)". I read the post, and many of the comments therein; while I do not understand many of the technical details, I had no difficulty in following her argument. The lead author of the arsenic bacteria paper, Dr. Felisa Wolfe-Simon, did respond to comments on December 16, 2010. Science also published an interview with Wolfe-Simon, titled "Discoverer Asks for Time, Patience Over Arsenic Bacteria Controversy". On the same day, Redfield was quick to write another blog post "Comments on Dr. Wolfe-Simon's Response", which again has received many comments. So far, the story is still unfolding, and Science has promised to publish technical comments and responses in early 2011.

As a related topic, throughout the CCP4bb, I noticed the letter to editor titled "Is too ‘creative’ language acceptable in crystallography?" by Alexander Wlodawer et al. I agree fully with the authors that "While figures of speech are often useful and even educational, flashy titles combined with hyperbolae and imprecise language can mislead or deceive nonspecialist readers and should therefore be avoided."

In the Internet Age, bloggers and tweeters clearly have an important role to play in the assessment of research findings. In scientific publications, what counts is not how much one claims, but to what extent one can hold up such claims. Solid work holds up over time and the scrutiny of peers.

Saturday, January 22, 2011

Three structural biology papers in the latest issue of NAR cite 3DNA

While browsing the latest 39(2) January 2011 issue of Nucleic Acids Research (NAR), I found, to my great surprise, three papers that cite 3DNA. These papers, all under the "structural biology" section, are of interest to me from their titles and abstracts, so I downloaded the PDF versions and read through each of them.

For this blog post, #100 by incidence, it would be intriguing to look into the context to see how 3DNA is cited.

"Asymmetric DNA recognition by the OkrAI endonuclease, an isoschizomer of BamHI" by Vanamee et al. (Mount Sinai School of Medicine, and New England Biolabs):

Analysis of the stereochemical quality of the protein model and assignment of secondary structure were conducted with PROCHECK (13). DNA analysis was performed with 3DNA (14). Solvent-accessible surface areas were calculated in CNS with the algorithm of Lee and Richards employing a 1.4-Å probe(15). Figures were prepared using PyMOL (www.pymol.org). [p713, from bottom left to middle right]

"DNA intercalation without flipping in the specific ThaI–DNA complex" by Firczuk et al. (Poland, Germany and UK):

An oligoduplex with the correct sequence in standard B-DNA geometry was generated with the program 3DNA (44), and manually adjusted to fit the highly distorted DNA in the structure. ... The programs COOT (45), REFMAC (46) and CNS (47) were used for refinement. [p747, top left]

Analysis with the 3DNA software (44) shows that the intercalation increases the rise between base pairs to about 7 Å or approximately twice its usual value (Figure 5B). Phosphorus–phosphorus (Pn–Pn+1) distances in the DNA backbone are only mildly altered (values range from 5.6 to 7.0 Å). Instead, the extra height of the two CG steps comes at the expense of the twist, which is reduced from its usual value of about 36° (360°/10) to between 10 and 15°. A view toward the major groove shows that the inner base pairs of the recognition sequence are strongly tilted (Figure 5). According to the 3DNA software (44), the first CG step has a negative tilt of about ~12°, which results in the oblique orientation of the following base pairs. The central GC step is characterized by a tilt close to 0°, reflecting the nearly parallel arrangement of the middle bases. Finally, the second CG step has a positive tilt of about 15° which restores the standard orientation of the downstream base pairs. A side view of the DNA indicates a bend at the center of the recognition sequence which is primarily due to the positive ~12° roll of the central GC step into the major groove (Table 1). The 3DNA program also indicates that the propeller twist is positive for the specifically recognized sequence, and (as expected for the standard B-DNA) negative for most of the flanking base pairs. [p749, top right]

Table 1. DNA distortion in complex with ThaI restriction endonuclease: all parameters were calculated with the 3DNA software (44). [p750, middle left]

"On the molecular basis of uracil recognition in DNA: comparative study of T-A versus U-A structure, dynamics and open base pair kinetics" by Fadda and Pomès (Ireland and Canada):

MD simulations were run with versions 3.3.3 up to 4.0.4 of the GROMACS software package (47,48).

Structural parameters were determined with the 3DNA software package (51,52). The pymol (www .pymol.org) software package was used to generate figures. [p769, bottom right]

Established in 1974 and currently with an impact factor of 7.479, NAR has also been chosen by the Special Libraries Association as one of the top 100 most influential journals in medicine and biology over the last 100 years. The citations by the three papers in the latest issue of NAR illustrate unambiguously 3DNA's big impact in structural biology.

Wednesday, January 19, 2011

Ruby scripts for 3DNA analysis of molecular dynamics simulation trajectories

Over the years, I've been very pleased to see 3DNA's ever-increasing applications for the analysis of molecular dynamics (MD) simulation trajectories of nucleic acid structures. Among its other features, this illustrates that the command-line driven approach of 3DNA makes it easily integrable into the MD analysis pipeline (with some scripting, of course).

However, the lack of direct support of 3DNA to the ever more popular field of MD simulations has caused several obvious problems:

Repeated efforts – virtually every lab or even MD practitioner could come up with an ad hoc scripting solution.
Hinderance to 3DNA's even wider adoption – new comer to the MD field, or bench scientists interested in dynamics simulations would be scared off.
Known issues with existing approaches – most predominately the unnecessary repetitive run of find_pair to deduce base pairing information for each snapshot (model), which not only takes time, but more seriously some pairs could be missing due to melted out or distortion along the trajectory.

I've been following 3DNA's citations for years and I am well aware of the above issues: in addition to answering relevant questions in the 3DNA forum, I have blogged specifically on the topic a few times:

3DNA for the analysis of molecular dynamics simulations [Saturday, July 24, 2010]
3DNA in the PCCP nucleic acid simulations themed issue [Sunday, December 6, 2009]
3DNA in molecular dynamics simulations [Sunday, October 4, 2009]

Of course, I am in a unique position to help solve the problem. Indeed, for the past couple of years, I've been thinking of writing scripts to make life easier for MD practitioners who care to use 3DNA. However, due to my lack of experience in MD simulations, constraints of "spare" time (plus laziness), and a want of suitable collaborator, I've never found the incentive to get the job done.

I finally decided to write some Ruby scripts to streamline the process of using 3DNA in MD simulations, after a recent question from Aneesh on "script for extracting data from 3DNA output file" in the 3DNA forum. After a few exchanges of views with Aneesh, and especially with Alpay's contribution of Python script and sample dataset, I've finished up two standalone yet connected Ruby scripts to analyze MD simulation trajectories with 3DNA and then extract various structural parameters. The details, including source code and test examples, are available in the 3DNA forum under "Ruby scripts for the analysis of MD simulation trajectories", in a newly created section titled "Molecular dynamics simulations".

The sample file ("sample_md0.pdb") distributed with the current v0.1 of the scripts contains 21 snapshots (models, 0..20), separated by MODEL/ENDMDL pairs. While the sample is based on a trajectory file from AMBER, any MD simulation packages, or NMR ensembles, can be similarly handled as well.

Now the ball is rolling. As time goes by, and with users' feedback, I will refine and expand the functionality of the scripts as necessary. I am confident to see more applications of 3DNA in the "dynamic" molecular simulation field.

Sunday, January 9, 2011

Open dictionary with command-control-d in Mac OS X

In a recent MacMost Newsletter, I came across the following handy trick in Mac OS X: while the cursor is over a word (not necessarily selected), one can press command-control-d to open dictionary which pops up a little window with the word's definition. This works in Safari and Mail, but not in Preview (unfortunately).

Previously, when I need to check the definition of a word, I right-click on it and then follow the link "Look Up in Dictionary". This will launch or pop up Dictionary with detailed information about the word (Dictionary/Thesaurus/wikipedia).

The right-click method seems to integrate better with other Mac OS X applications. For example, it works with Preview as well. However, now that I know it, I sometimes prefer the command-control-d approach better; it is quick and non-obstructive.

IUPAC nucleotide symbols and their complements

Recently, I was interested in knowing the complements of all the IUPAC nucleotide symbols. As blogged previously, I am quite familiar with the "meaning of nucleotide IUPAC codes" (namely A/C/G/T, and R/Y/N etc). However, when I first check the Gene Infinity website on nucleotide symbols, it still puzzled me for awhile to figure out the meaning of the DNA alphabet (with complements), as except below:

A  C  G  T    M  R  W  S  Y  K    B  D  H  V    N
|  |  |  |    |  |  |  |  |  |    |  |  |  |    |
T  G  C  A    K  Y  W  S  R  M    V  H  D  B    N

For example, some degenerated IUPAC symbols are complemented to themselves (e.g., W–W and S–S), while others are seemingly "hard" to apprehend (e.g., B–V and D–H).

After thinking it for a bit, things begin to become clear. They are based on the complementarity of Watson-Crick base-pairs (A–T and G–C) and the meaning of each degenerated IUPAC nucleotide symbol. For example,

W represents A/T, meaning weak (with only two hydrogen-bonds). The complements of A/T are T/A respectively, which is W again.
B (not A) represents C/G/T, and their complements are G/C/A respectively, which is V (not U/T).

It is easy to verify that all other complementary pairs follow exactly the same basic principle.

Monday, December 13, 2010

Extract images from PDF files using 'pdfimages'

Once in a while, I need to extract an image (or a portion thereof) from a PDF file. Usually, I open the PDF file using 'preview' or Adobe 'Acrobat Reader', and take a screenshot which is then cropped to the desired slice. This manual method "works", albeit a bit tedious.

Recently, I came across the handy command-line utility program 'pdfimages' which allows for automatic extraction of all images from a PDF file. The basic usage is simply:

pdfimages   yourfile.pdf   prefix

This will extract all images contained in yourfile.pdf to files prefix-001.ppm, prefix-002.ppm etc. With option "-j", images in DCT format are saved as JPEG files.

Sunday, December 5, 2010

Nice blog on English writing by Lynn Gaertner-Johnston

A while back, I was somehow curious to figure out the differences between entitled vs. titled. A colleague referred me to a blog post on this topic by Lynn Gaertner-Johnston. After reading the post, it became clear that over the years I had made the same mistake numerous times. I then corrected nearly all occurrences of "entitled" to "titled" in my blog and 3DNA forum posts. Ever since, I have been following Lynn Gaertner-Johnston's blog, titled "Business Writing", on a regular basis, and found it helpful to improve my English writing skills.

Sunday, November 28, 2010

Under-citation of method papers?

Recently in the CCP4BB, there is an interesting thread with extensive discussions on "Citations in supplementary material". The original poster refers to the recent Acta Cryst D editorial with the same title in which the authors highlights the issue of under-citation to papers published in the International Union of Crystallography (IUCr) journals.

The main point is that method papers are more likely to be cited in the supplementary materials only, which are not indexed by PubMed, Scopus, Web of Science or Google Scholar etc. As a result, they are statistically undercounted, and "Journals and scientists that focus on publishing methodologically oriented papers are particularly affected." Specifically, through a survey of articles on protein or nucleic acid structure determination published in Cell, Nature, Science, and PNAS in 2009, the authors found that "almost half of all references to publications in IUCr journals end up being published in the supplementary material only."

The findings of the editorial resonate with my observations, and I cannot agree more with the authors that "in the end, methods need to be continuously developed and refined in order to ensure progress."

On the other end of the spectrum, some highly influential method papers are heavily cited. As an extreme case, the large number of citations to the 2008 paper "A short history of SHELX" by George Sheldrick helps rocket up the impact factor of Acta Crystallographica A by 20-fold to 49.9 this year!

Sunday, November 21, 2010

Belorussian translation of 3DNA webpages

Recently, I communicated with Paul Bukhovko on the translation of 3DNA webpages http://rutchem.rutgers.edu/~xiangjun/3DNA/ into Belorussian. As the author of the original website, referred to as http://rutchem.rutgers.edu/~olson/3DNA/ (which is simply a soft link to the above URL) in the 2003 3DNA NAR paper, I was very surprisingly pleased when Paul asked for permission to perform the translation, which I gladly granted.

Regarding the process, Paul commented:

Was a pleasure to translate this page! It's kinda fresh and related to my professional interests, so I thought - why not, if the author allows to do so.

The translated page is at URL: http://www.movavi.com/opensource/3DNA-be. Interestingly, when I used Google Translate to convert the Belorussian version back to English, the outcome is pretty readable. In contrast, when the original English version is directly translated to Chinese, the result is beyond recognition!