Research workflow

A good research workflow and Research code should allow Reproducible research, and facilitate Collaboration and Automation.

The combination of Git and GitHub is probably the best way to manage computation-heavy research projects.

Books and tutorials

Tools

Environment

An important part of Reproducible research is managing and sharing environments. The exactly same code may not work in another machine.

say it works on my machine one more time

To make it reproducible and replicatable in someone else’s machine, we need to make sure that the essential computing environment that directly affects the functioning of the code is replicated across the machines.

What is important to understand is that there is a whole spectrum of strategies, each with its own trade-offs. For instance, the “light” version would be about sharing the list of required packages (e.g., via requirements.txt in Python). The other extreme would be developing inside a virtual environment and shipping it via tools like Docker.

That is how docer was born

The more strict you go, the code is more likely to run in the same way across machines. However, it’s heavy—a docker image can easily be more than a GB. On the other hand, just sharing the list of required packages (and its versions) can be easier—potentially just a bunch of plain text files and source code. But the code is less likely to work depending on subtle difference in the types of OS, their versions, and other dependency problems.

Standards

Statistics

Articles

Talks