Iddo has a new post that contains some thoughtful reflections on my suggestions: Should research code be released as part of the peer review process?.
I think I’ve convinced Iddo that not all research code should be carefully tested, released code, ready for users to apply blindly. I think he and I are both in agreement that some code should be polished in that manner, though I remain less convinced than him that the Bioinformatics Testing Consortium will do much to further that aim.
There have been some other responses to Iddo’s original post.
Deepak Singh takes the view that scientists don’t publish their code because they are horrible, lazy programmers, and that if they would just work harder and be trained better then the problem would go away. If only it were that simple! He correctly points out that a lot of industrial code is also hacked together for rapid prototyping and used past the point where it should be. He believes, however, that industrial programmers are better at programming than scientists (true in some cases, and perhaps even on average, but there are certainly exceptions).
I remain unconvinced that simply training scientists to be better programmers would make much difference. I’m a pretty good, highly trained programmer (PhD in computer science, over 40 years of programming experience in various languages), but a lot of my research code is not in a state where I’d be willing to distribute it. Major refactoring is needed of most of the bigger programs, and I don’t have the time, the resources, nor the incentive to do it. I do try to write clean code with decent documentation, but a lot of the bigger projects involved student work that I did not have time to clean up.
Even where the code is all mine, after 5 or 10 years of intermittent work on a project the code has often drifted a long way from the original design, and design decisions that were good ones at the time they were made are no longer good choices—hence the need for major refactoring.
There is some discussion of the need for open source code triggered by Iddo’s article, but the discussion there doesn’t seem to go anywhere. One commenter advocates open-notebook science—a position I have some agreement with, but I’m often constrained by the fact that the data I’m analyzing is not my data, so I need permission of the data owner before saying anything about it. Because my programming is mostly driven by the data I’m analyzing (looking for patterns and anomalies in data), releasing my notes about what I’m programming and why would release the part of the data that the data owner regards as most precious (the interpretation), even if the data itself is kept hidden. The commenter would probably claim that the data should be released as soon as it is collected, but that is almost never my decision to make.