Monday, May 25, 2009

Examples make it clear

In reading scientific publications, I always quickly get confused with long, complicated equations. Initially, I thought it was because of my lack of domain knowledge: when I get more familiar with a field, things will become easier. Of course, this statement is true. However, it is not exactly the point in the following two senses:
  1. Having been in the field of nucleic acid structures for more than ten years, I can claim I know the field good enough. Yet, I can still easily get lost with large portions of mathematical formula in an article. An example is the Babcock and Olson 1994 JMB paper titled "Nucleic acid structure analysis. Mathematics for local Cartesian and helical structure parameters that are truly comparable between structures". Overall, it is a solid piece of work but I doubt how many people in this field really understand it above the conceptual level.
  2. When I get to the bottom of something, things do become truly simple. But even then the equations still feel complicated. In resolving the discrepancies among nucleic acid conformational analyses, I went through the source code to understanding from the bottom up how the Babcock and Olson method works. It is actually not that complicated, and the implementation with regard to finding the local helical axis and calculating a set of helical parameters can be greatly simplified using a geometric approach, as in SCHNAaP/3DNA. Moreover, some details from the above Babcock and Olson 1994 JMB paper are actually inaccurate.
Another wonderful case I have had is with the CEHS scheme (due to Calladine and El Hassan) to calculate a set of local base-pair parameters. It was only after checking an example did I uncover a subtle discrepancy between the simple CEHS formula (in its original form) and its description, in the draft version of Dr. El Hassan's PhD thesis. This experience led to the joint SCHNAaP/SCHNArP papers with El Hassan, and inspired me later on to work through NewHelix, Curves, CompDNA etc DNA structure analysis programs to reveal "the reasons why the different algorithms come to conflicting structural interpretations".

So, if you are a beginner in a research field, do not be scared off by the fancy mathematical formula. If you want to really understand something, work through an example and/or check through the source code implementations. For the authors, if you would like to get your message across, provide step-by-step examples to the key points you want to make for others to follow.