Much of my research is based on computer code. My code has bugs.
Does that mean that my research has errors? Quite possibly.
Are you, my fellow scientist, in the same boat? Probably.
What can we do about it?
We must adopt practices that are known to lead to more correct code. We must be willing to devote the time and energy that those practices require. Most of the time, this is neither fun nor glorious. We don’t try to cut corners when developing a mathematical proof or preparing an experiment – we know that leads to error. Writing computer code is no different; if anything, it requires that we take an even more methodical approach. It is time for computational scientists to own up to this. If we want to have confidence in our results, and if we want others to be able to build on them, there is no other choice.
The bonus is that adopting these practices can lead to cleaner, simpler code that is easier to understand, maintain, and debug. In the long term, I believe that these practices lead to an overall time savings, and the opportunity to spend more time using our computational tools to perform research.
In short, what I’m about to show you will allow you to:
So here it is: my 12-step program to writing scientific code that you can believe in:
I know what you’re thinking: that sounds like a lot of work. It is. But my goal in this series is to make each of these steps as straightforward and painless as possible. Thanks to a number of recently-developed tools, most of these tasks can be done very quickly, at least for small and simple projects.
There is nothing magical about this precise set of steps; I could have broken them down in a different way or included others. What’s essential are the underlying principles:
You could casually read through this series of posts and think about implementing these changes someday. But don’t. Pick a current scientific project with code, and commit yourself now to getting it in order by following these steps. Then go through the posts and apply the instructions at each step to the project you’ve chosen. When you’re done, you’ll have one rock-solid code project and the know-how to run all of your projects that way. The next one will be even easier.
I’ve tried to make these posts very simple, but there is a minimal amount of know-how I assume. In particular, you must be able to:
If you’ve taken a Software Carpentry course, you’re more than prepared.
I’m not going to tell you how to actually design better code. That’s much harder! For that, you should read a book on programming style, like The Pragmatic Programmer or Code Complete.
I’m not going to make you an expert in the topics listed above. My goal is just to get you over the initial psychological bump, from “I have no idea what that is” to “yeah, I can basically do that”. This is the 20% effort that gives 80% of the benefit.
Yes! In fact, I suggest starting with a small and simple project – the smaller the better. To demonstrate, I’ve written a tiny package that does LU factorization (Gaussian elimination) for square matrices.
Useful tutorials need to be concrete. The steps are applicable to any language, but the best tools for achieving them vary by language. Python is a popular language, and is what I use most.
If you’re asking this, go read Donoho et. al.’s paper on reproducible computing, from which I quote:
“Error is ubiquitous in scientific computing, and one needs to work very diligently and energetically to eliminate it.”
Okay, what are you waiting for? Go read step one!