Saturday, June 6, 2009

ANSI C program memory check with valgrind

Over the years, ANSI C has been my selected language for scientific computations, from SCHNAaP/SCHNArP, 3DNA, to my current research projects. As noted by K&R in the preface of "The C Programming Language" book (2nd edition), C "wears well as one's experience with it grows". To me, ANSI C is a language that the better I know it, the more I love it.

Using gcc with strict compiler options -ansi -pedantic -Werror -Wall -W -Wshadow, my ANSI C programs (e.g., 3DNA) compile cleanly without any change on all the OSes I have access to. Any C programmer should be well aware of the memory leak type of bugs that are so difficult to fix, since it is not checked by gcc. That's where valgrind comes in and I have been using it in 3DNA and my current projects on a regular basis.

The usefulness of the valgrind toolset becomes obvious when a SCHNArP user posted a question about some possible memory leak bugs. Traditionally, SCHNArP works interactively from command line and builds one model (DNA structure) at a time. When the program exists, any meomry leaks are reclaimed by the system so such bugs do not show up. At the time SCHNAaP/SCHNArP was developed, I was not aware of valgrind (or maybe it even did not exist yet!). The user has been using some functions from SCHNArP to build many structures automatically, thus any small amount of memory leaks scales up and eventually aborts out-of-memory. As a test case, the user supplied me with a reproducible example as follows:
    while(1){
GLH_build();
}
Without knowing of valgrind and actually applied it in this situation, fixing such bugs would have been too time consuming than I could manage to spend. However, with the help of valgrind, solving this puzzle did not take that long at all. More specifically, recompiling the sample program with -g option to have debug information, and running it with valgrind --leak-check=full prefixed immediately revealed a couple of locations with memory leaks. Knowing where the problems are, fixing them became straightforward. Now SCHNArP is more robust.

Overall, valgrind is an essential tool for any serious C programers.

Friday, June 5, 2009

3DNA citations reach over 300

When checking Web of Science today, I found, to my great satisfaction, that citations to the 3DNA 2003 NAR paper have reached over 300. While the number is still not comparable to true classics, it is nevertheless a significant one. It is worth noting that the 300+ citations span 77 scientific journals in the broad biomedical and chemical research fields, including Cell, Nature and Science (well, if CNS really means anything). The top six journals (with over double-digit citations) are:
  • 49 -- NUCLEIC ACIDS RESEARCH
  • 27 -- JOURNAL OF MOLECULAR BIOLOGY
  • 23 -- JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
  • 19 -- BIOCHEMISTRY
  • 13 -- JOURNAL OF PHYSICAL CHEMISTRY B
  • 11 -- BIOPHYSICAL JOURNAL
In retrospect, the wide acceptance of 3DNA by the community is certainly not by chance. Listed below are some of the major factors:
  1. 3DNA is built upon the best selected features from a thorough understanding of seven popular DNA structural analysis programs, including Curves, NewHelix/FreeHelix, CEHS, CompDNA, and RNA (Running Nucleic Acids by Babcock and Olson). This is summarized back in June 2003, nearly five years ago, in an email I sent to a group of experts before a Workshop at the 13th Conversation at Albany (NY):
    Looking back, 3DNA has never been intended to set up a new standard for the exact definition of various parameters. We selected what made best sense to us from what were already available. This was made possible by knowing the exact technical details of how other programs work. In the meantime, 3DNA also provide a program called "cehs" which gives the original CEHS/SCHNAaP parameters, to which Freehelix parameters would be similar. Believe it or not, "cehs -r" would give the authentic RNA parameters. We also see Curves unique in defining global parameters, bending analysis and groove dimensions."
  2. Active support through emails and the 3DNA forum. Over the years, I have always striven to get back to each 3DNA-related question quickly and concretely. In my memory, I have never ignored users' questions. Instead, I have taken each and every question as a way to improve 3DNA, and refined the code accordingly.
  3. 3DNA's integrated approach that incorporates structural analysis, modeling building, and visualization in a single suite of software package. Our 2008 Nature Protocols paper provides some examples.
  4. 3DNA has been extensively checked against all NDB entries before each major release, so that I can confidently say that it works in real world, not just in some contrived limited example cases.
I still remember that the number of citations to 3DNA was less than 150 nearly two years ago, when I started to wrote the first draft of our 2008 Nature Protocols paper. Now it is more than doubled! I would blog on this topic again when the number reaches 500.

In my opinion, some of 3DNA features are still (heavily) underused. Now that we have a sizable user community, 3DNA could only become better and would be more widely used. I have every reason to believe that in the not-so-distant-future, the citations to 3DNA would reach over 1000.

For current status of 3DNA citations, check Google Scholar.