Xiang-Jun's Corner

Friday, February 26, 2010

Mac OS X Snow Leopard -- I'm loving it (mostly)!

Recently, when it was time for a new laptop, I decided to buy a MacBook Pro (Intel-based with Mac OS X 10.6.2 -- Snow Leopard). Over the past few days, I have been playing around with it, migrating files from my Ubuntu Linux box. So far, things have gone through smoothly, thus by and large, I am enjoying my new Mac.

Over the years, I have been using Ubuntu Linux and I have been very happy with it, especially for software development. Lyx and OpenOffice are handy for writing technical documents. However, I have realized that when it comes to write a manuscript for publication, and to communicate effectively with non-Linux collaborators, MS Word (with EndNote) is the standard. So I set up a Windows XP virtual machine via VirtualBox on my Ubuntu Linux box, which avoids the problem of dual booting and allows for easy file sharing between Linux and Windows.

Mac OS X is Unix/Linux based but has native support for MS Office and Adobe Suite of programs, so it seems an ideal choice for a new laptop. Mac OS X 10.6 (Snow Leopard) is claimed to be "The world's most advanced operating system. Finely tuned." Other things aside, I do appreciate the fact that 10.6 (Snow Leopard) is a refinement of 10.5 (Leopard) from installation to shutdown -- "In ways big and small, Mac OS X Snow Leopard makes your Mac faster, more reliable, and easier to use."

So far, I have configured Mail to access my Columbia emails. I must say that Mail is way better than Columbia's CubMail web-interface, and I like Mail's native integration with iCal and Address Book. Safari still needs some getting used to, from my mostly Firefox experience. However, it is nice to find that some websites, which does not work in Firefox but IE, display properly with Safari. Preview appears to be powerful for PDF and image viewing and manipulations. I have installed Xcode, and may explore it more, if nothing but to see what an IDE has to offer. Of course, it is nice to have direct access to MS Office (mostly for Word and PowerPoint, so no need to play around with OpenOffice), EndNote, Adobe (Acrobat, Photoshop, Illustrator), etc.

Some nuisances up to this point:

Keyboard missing numeric keypad and Home/End/PgUp/PgDn
Ctrl-C/V etc keyboard shortcuts I am used to now become Command-C/V etc
File and directory names are not case-sensitive -- most surprising!

Overall, my new MacBook Pro is a very nice toy to play. As I become more familiar with it, I may like it more, hopefully.

Saturday, February 6, 2010

NSMB editorial: "Scientific writing 101"

In the February 2010 issue of Nature Structural & Molecular Biology [NSMB, 17(2), p.139], there is a nice Editorial titled "Scientific writing 101". This short one-page essay is a good example of a (scientific) writing that is "a pleasures of reading".

"Less is more when it comes to writing a good scientific paper. Tell a story in clear, simple language and keep in mind the importance of the ‘big picture’."

Specifically, the editorial makes the following points:

Tell a story. A scientific paper is not a chronology; the data should be presented and interpreted in context.
Be clear. "Clear, simple language allows the data and their interpretation to come through."
Provide an informative title and abstract. "Make the abstract clear and try to get the ‘big picture’ across."
Make the introduction short and concise.
Clearly distinguish Results from Discussion. "Discussion should put those results in a broader context." It "should be an interpretation of those results..."
Cover letter is important. You should spell check your manuscript, and number the pages, etc.

In this blog post, I am just recapping the key points of the editorial, and taking the opportunity to re-read it. Following the simple principles outlined in the editorial would be beneficial to everyone in the scientific community.

Saturday, January 30, 2010

Chinese new year greeting card

Recently, I received the following nice new year greeting card from Dr. Pascal Auffinger (IBMC-CNRS, France). While never meeting each other in person, we have communicated via numerous emails: over the years, we have discussed extensively on 3DNA-related topics, and on nucleic acid structures in general. Pascal is one of my respected scientists, and his greeting card makes my support of 3DNA so gratifying.

Somehow, the picture reminds me Mountain Tai I climbed while at college, and the famous poem "会当凌绝顶，一览众山小".

According to the Chinese Zodiac, the Year of 2010 is the Year of Tiger (Feb. 14, 2010 to Feb. 2, 2011). Interestingly, this Chinese New Year's day coincides with Valentine's Day, and it is on Sunday.

Saturday, January 23, 2010

Chemical diagram of Watson-Crick base-pairs

Once in awhile, I need to refresh my memory about the chemical identities of the most common nucleobases: A, C, G, T/U. Sometimes, it is also necessary to explain to non-(bio)chemists about the concept of H-bond donor vs acceptor, and the major- vs minor-groove of the DNA double helix. In such cases, I use the following (customized) chemical diagram of Waton-Crick base-pairs (WC-bp):

Before taking the effort to create my own version of the Waton-Crick base-pair diagram, I googled around and found many illustrations (like the one in wikipedia). However, none of them suits my needs perfectly:

Trained as a chemist, I would like to see chemical bond types (double vs. single bond);
Working extensively with PDB format, I want to have the atom numbering information as well.

So I ended up to (re)create my own version of the WC-bp diagram: I used Chemtool to sketch the framework, and Xfig to fine-tune it. Overall, the diagram serves my purpose quite well, and hopefully others would find it useful as well.

Thursday, January 7, 2010

Requests for SCHNAaP/SCHNArP source code

Recently, I received several requests for the source code of the SCHNAaP/SCHNArP, a software package for the analysis and rebuilding of double helical nucleic acid structures. This suite of programs was developed ten years ago during my PhD work on DNA base-stacking interactions with Dr. Chris Hunter at the University of Sheffield, England.

Users become interested in SCHNAaP/SCHNArP mostly because of 3DNA, which can be taken as its superseded, more popular version. Due to Rutgers' policy of not releasing the source code of 3DNA, users who would like to understand details of the underlying algorithms thus turn back to SCHNAaP/SCHNArP. The interface is a bit aged, but the mathematics is still valid: it could serves well as a start point for those who really want to get into the world of nucleic acid structures.

Overall, though, I have a mixed feeling in this situation. On one hand, I am happy to see people becoming interested in my (previous) work. On the other hand, however, it also becomes clear that Rutgers' current licensing policy has blocked 3DNA's further circulation and adoption by the scientific community. Given the current trend of open-source software development, I see no reason to continue keeping 3DNA closed source. Making 3DNA open source (under a proper license term, of course) would allow for interested users to get more directly involved in the project, and thus to move the software to the next level.

Saturday, December 19, 2009

ORCID -- an international research identification system?

From the Nature news article titled "Credit where credit is due" in (462:7275, p. 825 on December 17, 2009), I came cross the ORCID initiative:

Name ambiguity and attribution are persistent, critical problems imbedded in the scholarly research ecosystem. The ORCID Initiative represents a community effort to establish an open, independent registry that is adopted and embraced as the industry’s de facto standard. Our mission is to resolve the systemic name ambiguity, by means of assigning unique identifiers linkable to an individual's research output, to enhance the scientific discovery process and improve the efficiency of funding and collaboration.

Overall, I think it is a good idea. If properly implemented and widely adopted, ORCID could help solve lots of issues associated with various ways of spelling a person's name due to, e.g., cultural differences. For example, put Chinese way, one's family name comes before one's given name, just the opposite of the western convention. Additionally, when a given name has two characters (quite common), there are could be a space or a hyphen (as I normally put in Xiang-Jun) or nothing in between. Combined with possible first name initials, there are already many ways to spell out a Chinese name.

The above Nature article, “Credit Where Credit is Due”, helps introduce the ORCID Initiative. As an specific example, it points to another article on page 843, where Nature profiles a research group trying to "complete the reference human genome sequence, which is still full of errors nearly a decade after the first draft was announced in 2000." Nature acknowledges that "It is essential work", "But it is also work that offers few academic rewards beyond the satisfaction of a job well done — it is unlikely to result in a high profile publication." Hopefully, by adopting the ORCID system, contributions of such types (e.g., software support and maintenance) would be more properly acknowledged by the scientific community.

Given the high profile of the founding parties, I am hopeful that the ORCID initiative would move forward as promised. I will keep an eye on it and see how it evolves.

Friday, December 18, 2009

Ribosomal structure: it helps to know some background information

This year's Nobel Prize in Chemistry has been awarded to Venki Ramakrishnan, Tom Steitz, and Ada Yonath "for studies of the structure and function of the ribosome."

My connection with ribosomal structure began with the 50S large subunit of Haloarcula marismortui solved by Tom Steitz's group. Ever since the fully refined crystal structure at 2.4 Å resolution was published in 2001 (PDB entry: 1jj2; NDB code: rr0033), I have been using it to check 3DNA's applicability. In the two 3DNA papers (2003 NAR and 2008 NP), 1jj2 was used as an example to illustrate how find_pair can identify higher-order base-associations in complicated RNA containing structures. At the time, though, my understanding of the ribosomal RNA structure was purely geometrical: for quite a while, I got overwhelmed by the various biological terminologies, including the various S-es: 50S large ribosomal subunit vs. the 23S and 5S rRNA; and of course, the 30S small subunit vs. 16S rRNA.

Over the past year or so, I have become more interested in RNA structures. After reading a lot of related articles, gradually I feel things are becoming clearer than before. Nevertheless, there is something still missing, since my focus has (mostly) been on recent X-ray crystal structure-related work. My understanding of the ribosomal structure was finally put into context, thanks to following two recent publications:

One in Cell by James Williamson, titled "The Ribosome at Atomic Resolution".
Another one in Mol. Cell (in parallel and at the same time) by Joseph Puglisi, titled "Resolving the Elegant Architecture of the Ribosome".

These two papers not only summarized the significance of work of the three Nobel laureates — "the atomic resolution structures of the ribosomal subunits provide an extraordinary context for understanding one of the most fundamental aspects of cellular function: protein synthesis" — but also provided background information of decades of work from other players, including Harry Noller, Peter Moore, and Joachim Frank. Solving the ribosomal structure serves as a good example of how the fact that scientific research is both cooperative and competitive in nature.

Friday, December 11, 2009

Not all PDB entries are reliable; some could be plain fake

With interest, I have browsed the recent thread in the PDB mailing list (pdb-l), "Retraction of 12 Structures" posted by Michael Sadowski and followed-up by Kevin Karplus et al. The story is about Krishna Murthy, a former scientist at the University of Alabama at Birmingham (UAB), who has been alleged to fabricate protein structures and published papers on them. Here is an informative comment by firebug36 from the above link:

I am a protein crystallographer myself, so just trust me - the results this gentleman [Murthy] published were falsified, and not in a smart way. The structures [for C3b] deposited in the Protein Data Bank made no physical sense.
Allegations against UAB group were first brought to light by several prominent people in the field, and not UAB officials:

http://www.nature.com/nature/journal/v448/n7154/full/nature06102.html

Accordingly to the post of Kevin Karplus, "several of the PDB files by Krishna Murthy's group were identified as problematic in the RosettaHoles paper". Naturally, then, comes the question, "should we remove ALL the PDB files from Krishna Murthy's group as suspect?"

The way Murthy's case coming to spotlight may represent an exception rather than norm. Imagine the scenario that he did not publish his C3b structure in Nature which caught the attention from leading crystallographers (Bert Janssen1, Randy Read2, Axel Brünger and Piet Gros), maybe Murthy is still publishing on protein structures today. In a sense, it is a hard to believe how Murthy could falsify 12 protein structures and published 9 papers in prestigious journals (including Nature, Cell, PNAS, JMB, Biochemistry, JBC etc) which have been cited 449 times.

PDB contains the state-of-the-art experimental data of bio-macromolecular structures. Yet, the archive is certainly full of inconsistencies/errors of various types. It would be helpful to know how many PDB entries are largely or partially wrong, and which can be taken as "gold standard" as far data quality is concerned.

This case gives an excellent lesson for those performing data-mining on macromolecular structures. Nowadays, PDB structures are many and keep increasing rapidly, but they are clearly of varying quality. Structural bioinformatics is about solving biology problems using informatics tools. Thus knowing the caveats of your data (how reliable are they?) and tools (what are their limitations?) is a prerequisite to draw sound scientific conclusions.

Sunday, December 6, 2009

3DNA in the PCCP nucleic acid simulations themed issue

While checking 3DNA-related citations through Web of Science for this past week, I found a total of nine times, as follows:

Five times to the 3DNA 2003 NAR paper
Once to the 3DNA 2008 NP paper
Three times to the 2001 standard base reference frame paper

Most interestingly, all the citations are from the same nucleic acid simulations themed issue of Physical Chemistry Chemical Physics 11 (45). Honestly, I was quite a bit (nicely) surprised by the fact, so I browsed the articles online. Edited by Charles Laughton and Modesto Orozco, the 2009 PCCP "themed issue exemplifies the rich diversity of cutting-edge research in the field of nucleic acids simulation." Indeed, quite a few well-known experts are among the authors of the two perspectives and 16 papers.

While not an "energetic" person myself, over the years I have been keeping an eye on MD simulations and MM calculations of nucleic acid structures. It is my pleasure to see that the 3DNA is being widely used (certainly more than I originally expected) by the nucleic acid simulations community. Given time, and with a suitable collaborator, I am open to consider adapting 3DNA to currently available MD simulation packages to make life easier for practitioners in this "dynamic" field.