Some underlying considerations
- Whenever possible/applicable, I always try to get a thorough understanding of related software available. By thorough I mean at the source code level to see how an algorithm is implemented. The benefits of doing this are several folds:
- Not to recreate the wheel, but to build upon previous work.
- It is the most effective way to learn — many math formulae or text descriptions of algorithms, while useful for getting general ideas, are lack in details or vague in nature. In bioinformatics, most of the fundamental mathematics are not new. It is more about a novel combination of various known-parts, applying to a specific problem. Reading the source code is the only way to see unambiguously the implementation details.
- With a clear understanding of how the parameters are actually calculated, one can make better use of a software tool, know its limitations and thus avoid misinterpretations.
- To make objective, convincing (even to the original authors) comparisons/comments on existing tools, and importantly, to create something better if needed.
- Specially, 3DNA has benefited the most from my SCHNAaP/SCHNArP programs: the underlying algorithms in 3DNA for calculating the various structural parameters (propeller, buckle, shear, roll, slide, x-displacement, inclination etc) are exactly the same as there — analyze and rebuild are direct derivatives from SCHNAaP (a for analysis) and SCHNArP (r for rebuilding/reconstruction), respectively; the ideas of standard stacking diagrams, the Zp parameter, building atomic models with sugar-phosphate backbone, and the base/bp rectangular block representations are also from SCHNAaP/SCHNArP. I also borrowed ideas from Babcock's RNA, Bansal's NUPARM, Dickerson's NewHelix/FreeHelix etc., which will be mentioned explicitly in following sessions.
- Code wise, 3DNA was created from scratch in strict ANSI C (over 25K lines). In addition to implementing a unique combination of existing algorithms, I have added many significant novel methods including the find_pair program. However, feature-rich has never been my goal. Instead, I only consider to add new functions that I understand clearly and find useful. On the other hand, I am always quick to fix bugs.
- It is hard to understand other people's implementations of algorithms since it needs to read between the lines. This is especially the case with a unfamiliar computer language; undocumented, or bad code. It is easier to create one's own program than to understand and modify third-party software.
- For publication purpose, it is more attractive to work on something "new" instead of building on others.
In the following sessions, I will provide details on how 3DNA calculates the two sets of parameters to quantify the relative geometry of a dinucleotide step.
No comments:
Post a Comment
You are welcome to make a comment. Just remember to be specific and follow common-sense etiquette.