Xiang-Jun's Corner

Sunday, May 15, 2011

Posts in the 3DNA forum reach 600

As of May 6, the total number of posts in the 3DNA forum has reached 600. Created in March 2007, with my debut post titled "Welcome message from Xiang-Jun Lu", the forum is now over four years old. Overall, the forum has served its purpose pretty well. In answering questions, I've been increasingly referring to the posts in the forum. As a concrete example, see the thread of a recent question "Base pair step parameters with a missing base pair".

At less than three posts (about one question) per week on average, I've not felt too much stress in supporting the forum (and maintaining 3DNA) in my spare time. For the most part, I've enjoyed interacting with 3DNA users from everywhere in the world, and with diverse backgrounds. Following the Unix philosophy ("Write programs that do one thing and do it well. Write programs to work together."), 3DNA has proved to be robust and flexible in serving its ever-growing user community. As a matter of fact, few questions I received a couple of years ago were beyond my original consideration of the details while I wrote the code. It is this intimated knowledge of all the underlying algorithms and every bit of their implementations that allows me to answer users' questions quickly and concretely.

As time passes by, however, it has become evident to me that 3DNA needs to be further refined and extended to meet the ever changing needs of its user community. For example, over the past few months, several questions asked in the 3DNA forum are directly relevant but clearly beyond 3DNA's current capabilities. While I'd be interested in implementing some of the requested functionality that make sense to me, doing so is certainly over my spare time limit. On the other hand, my increased understanding of nucleic acid structures and accumulated software expertise make it simply an issue of time and effort to move 3DNA to the next level, far beyond its current application scope and impact.

With posts in the 3DNA forum reaching 600, and citations to 3DNA articles over 600 (Google scholar), I am hopeful something good will happen to the 3DNA project. After all, 6 is a lucky number in traditional Chinese culture.

Fifty years of operon

In the latest issue of Science, there is a one-page editorial titled "The Birth of the Operon" by François Jacob, who won the Nobel Prize in Physiology or Medicine in 1965:

What is the operon, whose 50th anniversary is being celebrated this week? The word heralded the discovery of how genes are turned on and off, and it launched the now-immense field of gene regulation. ... we cannot presume to know how new ideas will arise and where scientific research will lead.

In the next three paragraphs, Jacob provides an insightful and vivid description of his research related to the discovery of the "operon" – a structural gene-regulatory gene ensemble. In consonant with his comment on scientific discovery, he concludes:

Our breakthrough was the result of “night science”: a stumbling, wandering exploration of the natural world that relies on intuition as much as it does on the cold, orderly logic of “day science.” In today’s vastly expanded scientific enterprise, obsessed with impact factors and competition, we will need much more night science to unveil the many mysteries that remain about the workings of organisms.

It is worth noting that the Journal of Molecular Biology (JMB) has recently published a special issue [Volume 409, Issue 1, Pages 1-88 (27 May 2011)], titled "The Operon Model and its Impact on Modern Molecular Biology" with historical accounts and reviews to celebrate operon's 50th anniversary. It is because of this event that motivated me to read the Jacob and Monod 1961 JMB review article "Genetic regulatory mechanisms in the synthesis of proteins" – I have come across this paper so many times before, and should have definitely read it long ago!

Curves+ web server

Through Google Scholar, I become aware of the article online in Nucleic Acids Research (NAR), titled "CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures" by Richard Lavery's group:

Curves+, a revised version of the Curves software for analyzing the conformation of nucleic acid structures, is now available as a web server. This version, which can be freely accessed at http://gbio-pbil.ibcp.fr/cgi/Curves_plus/, allows the user to upload a nucleic acid structure file, choose the nucleotides to be analyzed and after optionally setting a number of input variables, view the numerical and graphic results online or download files containing a set of helical, backbone and groove parameters that fully describe the structure. PDB format files are also provided for offline visualization of the helical axis and groove geometry.

The website looks quite streamlined, with required input information all in a single page, and the test page also ran smoothly. In less than two years following the publication of Curves+, it is nice to see the Curves+ web server version available, making this analysis tool more readily available to the nucleic acids community.

Nowadays, it seems safe (to the best of my knowledge) to say that only 3DNA and Curves+ conform to the 1999 Tsukuba convention for the description of nucleic acid base-pair geometry, and each of them provides a web interface: web 3DNA and web Curves+.

Sunday, May 1, 2011

Scientific journals on nucleic acids

In my knowledge, Nucleic Acids Research (NAR) is a highly respected scientific journal with a broad impact in the field of nucleic acids. Over the years, I have been browsing NAR webpage on a regular basis to keep myself up to date to the latest development in this area. It is thus no surprise that the initial 3DNA paper was submitted to and published in NAR in 2003. Among the 500+ citations to that 3DNA paper, over 1/5 (100+) articles are from NAR itself (as an example, please see my January 22, 2011 blog post titled "Three structural biology papers in the latest issue of NAR cite 3DNA"). My latest contribution to NAR is the GpU story, which was actually selected as a featured article.

Another related journal I am quite familiar with is RNA, a publication of the RNA society. As the "About" section of its webpage succinctly summarizes,

RNA serves as an international forum for publishing original reports on RNA research in the broadest sense. The journal aims to unify this field by cutting across established disciplinary lines and focusing on "RNA-centered" science.

RNA currently has an impact factor (IF) of 5.198 (2009), slightly lower than NAR's 7.479. It is, nevertheless, a very decent journal in RNA-related research, and I frequently visit its website. As a side note, the GpU paper was initially submitted to RNA for its RNA-specific content and as a way to diversify my publication spectrum (as mentioned above, 3DNA was initially published in NAR). Unfortunately, the GpU paper was rejected by the RNA journal after two rounds of review, spanning over 6 months.

Another journal closely related to RNA (name wise) is called RNA Biology, which even has a slightly higher IF of 5.56. Admittedly, I was not familiar with this journal at all. Browsing through its website, I am interested in seeing the journal's explicit policy to reconsider papers "rejected by high impact journals [CNS] for reasons of novelty and impact, rather than the importance of the study or the integrity of the data." By enclosing "the reviewers’ and/or editorial comments" from these high impact journals, "it is possible the article might be accepted [by RNA Biology] based on its previous review. This will allow the urgent and competitive research to be published on the day of submission."

I became aware of the journal DNA Research quite recently through an email. From its website, "DNA Research is an internationally peer-reviewed journal which aims at publishing papers of highest quality in broad aspects of DNA and genome-related research." The journal currently has an IF of 4.917. Browsing a couple of its online issues, I sense that the journal is more on genome- than structure-related research.

While following up 3DNA citations recently, I noticed the paper titled "Insights into the Structures of DNA Damaged by Hydroxyl Radical: Crystal Structures of DNA Duplexes Containing 5-Formyluracil" by Tsunoda and Taknaka. It was published in the Journal of Nucleic Acids, which I have never (but probably should have) heard of before. From its website, "Journal of Nucleic Acids is a peer-reviewed, open access journal that publishes original research articles as well as review articles in all areas of nucleic acids." By virtue of this structure paper and its citation to 3DNA, I think the journal is surely of personal interest, and I have added it into my watch-list.

To sum up, there are currently four scientific journals (I know of) that are devoted to nucleic acids:

Do I still miss something? Please make your suggestion in the comment area.

[revised on May 17, 2011 by adding RNA Biology]

Saturday, April 23, 2011

Ebook "Gregory Petsko in Genome Biology: The first 10 years"

Over the years, I have read some of Gregory Petsko's monthly columns in Genome Biology while browsing the journal online, and I like his sensible and entertaining columns quite a bit. Recently, I became aware of the ebook from BioMed Central, "Gregory Petsko in Genome Biology: The first 10 years":

Structural biologist Gregory Petsko has contributed a thought-provoking and entertaining monthly column to the scientific journal Genome Biology every month since its launch in 2000. To mark the 10th anniversary of Genome Biology this eBook brings together 10 years of Petsko's columns.

I downloaded the epub version of the book, and googled around, trying to find a corresponding ebook reader for my MacBook Pro (Snow Leopard) – even though I have some ebooks in the generic PDF format, I am not that familiar with epub or mobi. I finally settled with NOOK for Mac from B&N. It turns out reading ebooks with specifically-desinged apps such as NOOK is quite a different, yet more enjoyable, experience than through a PDF reader.

Now the ebook has become the top one in casual reading list. I am reading it from the very beginning, one column at a time, to have a historical perspective. So far I found the columns indeed very "thought-provoking and entertaining".

Friday, April 8, 2011

Tips and tricks from "The Geek Stuff"

As a devoted command line user, I am always interested in learning new tricks to make my life more enjoyable. Recently, I came across Ramesh Natarajan’s blog “The Geek Stuff” which is full of “instruction guides, how-to, troubleshooting tips and tricks on Linux, database, hardware, security and web” to help solve practical problems.

For example, in the section “Best of the Blog”, I recently benefitted quite a bit by reading the following posts:

There are many other helpful tips/tricks as well; since I have bookmarked the site, I will surely come back!

Sunday, April 3, 2011

Scripting in Ruby is fun

Over the years, I have played around with various scripting languages, including awk, bash, Perl, Python and Ruby. By far, I have enjoyed Ruby the most; nowadays, I write scripts nearly exclusively in Ruby.

Created by Yukihiro "Matz" Matsumoto in Japan during the mid-1990s, Ruby became popular worldwide in mid-2000s, with the Rails web application framework. Indeed, I first dug into Ruby through Rails, and by reading David Black's book "Ruby for Rails; Ruby techniques for Rails developers". As an exercise, I implemented the current 3DNA v2.0 website with Rails v1.x. Then I quickly realized that the rapidly evolving Rails framework was beyond my time and interest to follow. However, I did begin to appreciate Ruby's simplicity, consistency and expressiveness. Over the past few years, I have collected over a dozen Ruby-related (e)books, including "The Well-Grounded Rubyist" (David Black, covering v1.9), "The Ruby Programming Language" (David Flanagan and Yukihiro Matsumoto), and "Metaprogramming Ruby: Program Like the Ruby Pros" (Paolo Perrotta). Just as my experience with (ANSI) C, I feel Ruby "wears well as one's experience with it grows" (K&R, in the preface of "The C Programming Language"). The better I know Ruby, the more I enjoy using it.

I recently wrote two Ruby scripts for the analysis of molecular dynamics (MD) simulation trajectories using 3DNA. Honestly, I would not have bothered with Perl for the task (otherwise, it would have been done long time ago), given the sideline nature of my support of 3DNA. Yet, writing and refining the Ruby scripts (with help of git and rake) have turned out to be a pleasant experience. Another reason why scripting in Ruby is fun is due to its large, active and friendly user community; there are many user-contributed libraries (gems) that serve well of common programming needs. As an example, in the 3DNA-MD scripts, I took advantage of the elegant Trollop commandline option parser by William Morgan. I picked Trollop among many other choices because it is self-contained in a single file, simple to use, and "gets out of your way".

In the Ruby community, exciting new developments are happening all the time. Recently, I was drawn to thor, "a simple and efficient tool for building self-documenting command line utilities". Over the past couple of years, I have browsed Sinatra and Sequel – they also look brilliant! Of course, for bioinformatics, there is the BioRuby project.

Overall, in my experience, scripting in Ruby is fun and exciting. Are you a Rubist yet?

Saturday, March 26, 2011

DNA fiber models ABC

Among the 55 fiber models available in 3DNA, the A-, B- and C-DNA types are the most generic – they can be built with bases A, C, G and T in any combination (see table below). Moreover, in addition to the well-known Arnott fiber models (#1, #4 and #7, all from calf thymus), there are newer variants from van Dam & Levitt (#46 and #47) and Premilat & Albiser (#53 to #55).

 1   32.7   2.548  A-DNA (calf thymus)
 4   36.0   3.375  B-DNA (calf thymus)
 7   38.6   3.310  C-DNA (calf thymus)
46   36.0   3.38   B-DNA (BI-type nucleotides)
47   40.0   3.32   C-DNA (BII-type nucleotides)
53  -38.7   3.29   C-DNA (depreciated)
54   32.73  2.56   A-DNA [cf. #1]
55   36.0   3.39   B-DNA [cf. #4]

As shown in Figure 9 of the 3DNA 2003 NAR paper (linked below), the A-, B- and C-DNA fiber models are all right-handed regular straight helices, yet each has distinguished features.

While I could easily envisioned possible applications of the fiber models, especially in connection with analysis and rebuilding routines in 3DNA, it was still a nice surprise to see a recent article by Gossett and Harvey, titled "Computational Screening and Design of DNA-Linked Molecular Nanowires" [Nano Lett., 2011, 11 (2), pp 604–608]. The abstract is quoted below:

DNA can be used as a structural component in the process of making conductive polymers called nanowires. Accurate molecular models could lead to a better understanding of how to prepare these types of materials. Here we present a computational tool that allows potential DNA-linked polymer designs to be screened and evaluated. The approach involves an iterative procedure that adjusts the positions of DNA-linked monomers in order to obtain reasonable molecular geometry compatible with normal DNA conformations and with the properties of the polymer being formed. This procedure has been used to evaluate designs already reported experimentally, as well as to suggest a new design based on pyrrylene vinylene (PV) monomers.

In the article, 3DNA (the web interface version w3DNA) was cited as follows:

The selection of DNA structures is important because the DNA remains fixed throughout the procedure. To reduce the risk of an incorrect result, one should choose a subset of DNA structures that are in some sense representative of DNA conformational space. The DNA structures (A-, B-, and C-form DNA) were obtained using the Web 3DNA web server. We used a poly(dG)-poly(dC) sequence with ideal geometry for each DNA structure. A-DNA was constructed with rise = 2.548 Å and twist = 32.7˚ , B-DNA was constructed with rise = 3.375 Å and twist = 36.0 ˚, and C-DNA was constructed with rise = 3.310 Å and twist = 38.6 ˚.

Indeed, this is a novel application of fiber DNA ABC models!

Sunday, March 20, 2011

3DNA citations reach over 500

On Friday, June 5, 2009, I blogged on the topic titled "3DNA citations reach over 300". At that time, I wrote (towards the end):

I still remember that the number of citations to 3DNA was less than 150 nearly two years ago [~ summer 2007], when I started to wrote the first draft of our 2008 Nature Protocols paper. Now it is more than doubled! I would blog on this topic again when the number reaches 500.

When I checked Google scholar for 3DNA citations right now, the citation number is already over 500 for the initial 2003 3DNA NAR paper alone. Combined with the two direct follow-ups – the 2008 Nature Protocols paper and the 2009 NAR web server paper – the three 3DNA publications have been cited a total of 550 times.

Again, as noted in that blog post,

In my opinion, some of 3DNA features are still (heavily) underused. Now that we have a sizable user community, 3DNA could only become better and would be more widely used. I have every reason to believe that in the not-so-distant-future, the citations to 3DNA would reach over 1000.

A decade after its initial humber release, 3DNA has been successfully applied to many real-world problems. As spare time permits, I have actively maintained and continuously refined 3DNA based largely on users' feedbacks. Over the time, I also see clearly that 3DNA can be moved to the next level both in functionality and usability to enjoy an even larger/broader impact.

Now more than half-way through, it won't be long when citations to 3DNA reach 1000, and then beyond.