Open-source science software

Thursday 21st July, 2022 - Bruce Sterling

*Generally speaking, if nobody’s hired to maintain it, nobody maintains it.  This article is from 2019.

On 10 April, astrophysicists announced that they had captured the first ever image of a black hole. This was exhilarating news, but none of the giddy headlines mentioned that the image would have been impossible without open-source software. The image was created using Matplotlib, a Python library for graphing data, as well as other components of the open-source Python ecosystem. Just five days later, the US National Science Foundation (NSF) rejected a grant proposal to support that ecosystem, saying that the software lacked sufficient impact.

It’s a familiar problem: open-source software is widely acknowledged as crucially important in science, yet it is funded non-sustainably. Support work is often handled ad hoc by overworked graduate students and postdocs, and can lead to burnout. “It’s sort of the difference between having insurance and having a GoFundMe when their grandma goes to the hospital,” says Anne Carpenter, a computational biologist at the Broad Institute of Harvard and MIT in Cambridge, Massachusetts, whose lab developed the image-analysis tool CellProfiler. “It’s just not a nice way to live.”

Scientists writing open-source software often lack formal training in software engineering, which means that they might never have learnt best practices for code documentation and testing. But poorly maintained software can waste time and effort, and hinder reproducibility. Biologists who use computational tools routinely spend “hours and hours” trying to get other researchers’ code to run, says Adam Siepel, a computational biologist at Cold Spring Harbor Laboratory in New York, and a maintainer of PHAST, a tool used for comparative and evolutionary genomics. “They try to find it and there’s no website, or the link is broken, or it no longer compiles, or crashes when they’ve tried to run it on their data.”

But there are resources that can help, and models to emulate. If your research group is planning to release open-source software, you can prepare for the support work and the questions that will arise as others begin to use it. It isn’t easy, but the effort can yield citations and name recognition for the developers, and improve efficiency in the field, says Wolfgang Huber, a computational biologist at the European Molecular Biology Laboratory in Heidelberg, Germany. Plus, he adds, “I think it’s fun.”

Have a plan

For developers of scientific software, release day isn’t the end of the labour, but often the beginning. Tim Hopper, a data scientist at Cylance in Raleigh, North Carolina, says on Twitter, “Give a man a fish and you feed him for a day. Write a program to fish for him and you maintain it for a lifetime.” Carpenter hired a full-time software engineer to handle maintenance for CellProfiler, which logs about 700 questions and 100 bug reports or feature requests per year, or about 15 per week. But most open-source software maintenance is done on a volunteer basis. “I did this myself, like after midnight,” says Siepel of his tech-support efforts on PHAST….