Accountable research software

2012 August 27

Accountable research software

Iddo Friedberg asks the seemingly reasonable question “Can we make accountable research software?” on his blog Byte Size Biology. As he points out, most research software is built by rapid prototyping methods, rather than careful software development methods, because we usually have no idea what algorithms and data structures are going to work when we start writing the code. The point of the research is often to discover new methods for analyzing data, which means that there are a lot of false starts and dead ends in the process.

The result, however, is that research code is often incredibly difficult to distribute or maintain. Like some others in the bioinformatics community, he feels that the solution is for code to be rewritten and carefully tested before publication of results. He is aware of at least one of the reasons this is not currently done—it is damned expensive and funding agencies have shown almost no willingness to support rewriting research code into distributable code (I know, as I’ve tried to get funding for that).

The rapid prototyping skills needed for research programming and the careful specification, error checking, and testing needed for software engineering are almost completely disjoint. Some people would even argue that the thinking styles needed for the two types of programming are incompatible. I wouldn’t go quite that far, but they are certainly very different modes of programming. It will often be the case that different programmers need to be hired for developing research code and for converting it into distributable code.

The “solution” that Iddo proposes, passed on from Ben Templeton, is the Bioinformatics Testing Consortium, which is a volunteer group of researchers to do some of the quality assurance (QA) steps of software development for each other (code review and testing). Quite frankly, I don’t see this as being much of a solution. First, the software has to be in a nearly finished, polished state before the QA steps that they propose make much sense—and getting the code to that state is 90% of the problem. Second, the volunteer nature of the consortium could easily result in the “tragedy of the commons”, where everyone wants to take more out of the system than they put in. This is already happening in peer review of papers, with people writing more papers than they review, with the result that editors are finding it harder and harder to get competent reviewers. Third, the people involved are either going to be careful software developers (who are not the main problem in undistributable research code) or rapid prototypers who don’t have the patience and methodical approach of professional testers.

Note: I think that the Bioinformatics Testing Consortium is a good idea. Like many other volunteer projects, it is addressing a real need, though only a small part of the need and with inadequate resources.

I do worry a little about one of the justifications given for distributing research code—the need to replicate experiments. A proper replication for a computational method is not running the same code over again (and thus making the same mistakes), but re-implementing the method independently. Having access to the original code is then useful for tracking down discrepancies, as it is often the case that the good results of a method are due to something quite different from what the original researchers thought. I fear that the push to have highly polished distributable code for all publications will result in a lot less scientific validation of methods by reimplementation, and more “ritual magic” invocation of code that no one understands. I’ve seen this already with code like DSSP, which almost all protein structure people use for identifying protein secondary structure with almost no understanding of what DSSP really does nor exactly how it defines H-bonds. It does a good enough job of identifying secondary structure, so no one thinks about the problems.

I fear that the push for polished code from researchers is an attempt to replace computational researchers with software publishing teams. The notion is that the product of the research is not the ideas and the papers, but just free code for others to use. It treats bioinformaticians as servants of “real” researchers, rather than as researchers in their own right. It’s like demanding that no papers on possible drug leads be published until Phase III trials have been completed (though not quite that expensive), and then that the drug be distributed for free.

Certainly there is a place for bioinformatics as a service—the UCSC genome browser is a good example of such a service, and the team of developers, QA people, and IT people needed to build and maintain such a service is big and expensive—more expensive than the researchers involved in the effort. There are enough uses and enough users for that service to justify the price, but are if we hold all bioinformatics researchers to that level of code quality, we’ll stifle a lot of new ideas.

Requiring that code be turnkey software before publication is not a desirable goal for bioinformatics as a research community.

Comments (7)

7 Comments »

Why not adopt the test-driven development paradigm? As long as you know WHAT you want your software to do, you can build the test cases in advance. It isn’t a substitute for real QA, but would still improve things. The other thing you can do is to adopt agile development methods, such as constant refactoring, to improve the situation.

The situation is similar in industry. I’ve worked with a number of large systems that were horrific, utterly impossible to change or maintain, that were really just prototypes that management rushed into production because “the customers loved the demo so much”.

Comment by Bonnie — 2012 August 30 @ 06:04 | Reply
- The main point of research software is that you *don’t* know what you want the software to do in advance. Test-driven development is a form of extremely explicit specifications—appropriate for well-understood applications where the goal is to provide a better implementation of something that is well understood, but not very applicable to exploratory “I wonder what this will do” research coding. Almost all research coding is rapid prototyping, but people expect code that is robust, easily installed, and user friendly, which rarely results from rapid prototyping, but requires reimplementation from the ground up using a totally different development process. No one is willing to pay for that (often expensive) redevelopment.
  
  Comment by gasstationwithoutpumps — 2012 August 30 @ 09:28 | Reply
You can do test driven development in environments where you don’t know exactly where your software is heading. In fact, that was largely the point of agile processes, to handle software development in environments where the requirements are rapidly changing. Test driven development is a central tenet of agile. The idea is not that you are going to be writing macro tests, which would imply that you know exactly what the overall system should be doing, but that you will be writing micro tests. And as your software evolves, you scrap tests and add tests. So at a given point, you may not know where you are heading overall, but you know that RIGHT NOW, you need some code that will do a certain data conversion. So you first write your test, then you write the method(function, procedure, whatever) and then you test. Later on, you may decide that you didn’t need that data conversion after all, so you scrap the code plus the test, or you may decide that you need a different conversion, in which case you rewrite the test and the code. It may seem like it takes longer as you are prototyping, but you will have better trust in your final results, you will have a test suite that you can run through whenever you make a change, the overall system will be more robust, though perhaps not at the same level as a fully QA’ed system, and you will be closer to being able to build a fullscale system should that need arise.

I’ve seen a number of systems that grew out of researchy prototypes written by scientists, and quite frankly, I can’t see how anyone can trust those systems. How do you know the system is really correct? Better testing gives you a higher degree of trust. Some of the worst offenders are in the financial world, where quants with PhDs in physics come up with financial algorithms and then write brain-dump code to prototype them. Invariably the prototype goes into production. Those systems are RIFE with errors.A lot of those high speed trading systems are built like that, and one day, an error in one of those systems is going to take down our financial system.

Hmm, I wonder if there is any computer science research potential in developing software engineering methods that really work for scientists who need to prototype? My software engineering research brain is perking up.

Comment by Bonnie — 2012 August 31 @ 13:48 | Reply
- I do sometimes devise unit tests for code before writing it, but all the software I’ve seen for doing unit testing has pretty large learning curves and much too clunky interfaces for rapid code development. If the code is simpler and cleaner than the test, it is hard to see what extra value the test is providing. Maintaining the tests as the code changes is even more tedious than maintaining the in-program documentation, which I have a hard enough time getting students to do.
  
  Quite frankly, software engineering practices are not tuned for throwaway code, which over 90% of research code is. Unfortunately, the 10% that doesn’t get thrown away probably should have been, with a complete redesign from the ground up. Almost no one is willing to pay for that reimplementation, and the buggy throwaway code gets used repeatedly.
  
  More is likely to be gained by reimplementing moderately successful code than by trying to raise all research code to software engineering standards.
  
  Comment by gasstationwithoutpumps — 2012 August 31 @ 14:41 | Reply
[…] and software quality are heating up. A recent salvo on Gas Stations Without Pumps is titled “Accountable research software“, and one statement in particular caught my eye: The rapid prototyping skills needed for […]

Pingback by Software Carpentry » Not Really Disjoint — 2012 September 3 @ 18:19 | Reply
[…] few days ago, I posted a response to a post by Iddo about accountable research […]

Pingback by Iddo responds « Gas station without pumps — 2012 September 4 @ 21:37 | Reply
[…] Un blog anonyme réagit à Iddo en disant que le but de la science n’est pas des logiciels, et donc des tests systématiques ne sont pas justifiés. […]

Pingback by Doit-on montrer le code informatique des scientifiques même s’il est moche ? | Tout se passe comme si — 2013 February 8 @ 00:36 | Reply

RSS feed for comments on this post. TrackBack URI

Gas station without pumps

2012 August 27