Saturday, October 30, 2010

Publication of scientific programming code

Recently on the Nature website, I read with great interest a news article, titled "Publish your computer code: it is good enough", by Nick Barnes, a professional software engineer:
Freely provided working code — whatever its quality — improves programming and enables others to engage with your research
Clearly the author knows the "trade secret" in scientific programming. He lists several common reasons why scientists are reluctant to share their source code, and then provides his responses:
  1. The code is low quality — "software in all trades is written to be good enough for the job intended". All software has bugs. Sharing code would help improve the code itself and advance the research field.
  2. Not a common practice — this is going to change or is already changing.
  3. Demand for support — "Nobody is entitled to demand technical support for freely provided code."
  4. Intellectual property issue — The most value part "lies in your expertise", code not backed by skilled experts is called abandonware. (I cannot agree more with this point.)
  5. Polishing code takes time/effort — not need to, just supply, as supplementary materials in a website, the original code used in your publication.
As is evident from the many comments, this assay is well echoed by the community. As an active computational scientist for over a decade, I share mostly the same opinions. Essentially, the transparency of source code is to ensure repeatability of scientific publications. In the field of computational biology (bioinformatics), it is virtually impossible to reproduce exactly a published figure/table without direct access to details, including the source code.