Research code workflow
This page is designed to be a quick reference for the stages in building a research coding project. Please refer back to the main text for detailed guidance for any of these steps or stages.
If you select this tab, the instructions below will show information relevant to organising your project in a single project folder: this folder will contain all of your code, application, analysis, figures, and notes.
If you select this tab, the instructions below will show information relevant to organising your project in two project folders: one folder for your core code as an installable package, the other containing the application of this code, analysis, notes, figures etc.
Part One: building the core code
1. Brainstorm and gather requirements
- Functional requirements: what must the software do?
- Non-functional requirements: how should it work/behave?
- Constraints: what are the limitations or assumptions?
2. Create your project directory structure in a repository
3. Create a development environment
conda env create -f environment.yml
conda env update --file environment.yaml --prune
4. Write your pseudocode, comments, and code
- Write code for yourself in a year: name variables and functions sensibly, and use comments to add context
5. Write a test suite
Create your tests in a folder called test
. Call the file test_<YOUR-PYTHON-MODULE-NAME>.py
, and define the functions inside it as def test_<YOUR-FUNCTION-NAME>():
.
Use this format:
def test_example():
'''Test for the example function'''
# Arrange
= 0
test_variable_1 = 1
test_variable_2 = 7
expected_output
# Act
= your_function(test_variable_1, test_variable_2)
output
# Assert
assert output == expected_output
# No cleanup needed
All functions in your core code package should be tested.
6. Write documentation
You’ll want to add:
- Module and function level docstrings
- A README.md file
- A
pyproject.toml
file to document your core Python package - Possibly some example notebooks running the code
In your core code/package repository:
- Module and function level docstrings
- A README.md file
- A
pyproject.toml
file to document your core Python package
In your analysis/application file:
- Module and function level docstrings of code
- Example notebooks
- Instructions in your README.md on how to use the code in the package repository.
Part two: using the core code
1. Setting up your folder structure
You’ve already sorted this back in Step 2
This is more flexible, but it’s generally still a good idea to keep things organised. This folder might look like this:
pallasite-parent-body-evolution/ The project git repository
├── LICENSE
├── README.md
├── environment.yml The libraries I need for analysis (including the package from our package repository)
├── data I usually load in large data from storage elsewhere
│ ├── interim But sometimes do keep small summary datafiles in the repository
│ ├── processed
│ └── raw
├── docs Notes on analysis, process etc.
├── notebooks Jupyter notebooks used for analysis
├── reports For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports
│ └── figures Figures for the manuscript or reports
├── src Source code for this project
│ ├── data Scripts and programs to process data
│ ├── tools Any helper scripts go here
│ └── visualization Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related.
└── tests Test code for this project, benchmarking, comparison to analytical models
2. Creating a research environment
We have already done this back in Step 3.
You have a few options:
Just keep using your development environment from Step 3
As you have installed your package, you can continue to use your development environment from Step 3 across your system, in other folders.
Create a new environment and hardlink local package
Take the development environment you created in Step 3, and copy it into the analysis folder, and replace the following lines:
# keep this to install your Python package locally
- pip
- pip:
- --editable .
with:
# keep this to install your Python package locally
- pip
- pip:
- --editable /absolute/path/to/package-repo
This is still not easily reusable, but at least it’s now more obvious what environment you’re using, and you can still easily make changes to the package as you work.
Install your package via GitHub
Take the development environment you created in Step 3, and copy it into the analysis folder, and replace the following lines:
# keep this to install your Python package locally
- pip
- pip:
- --editable .
with:
# keep this to install your Python package locally
- pip
- pip:
- <PACKAGE_NAME>@git+https://github.com/<USER_NAME>/<REPO_NAME>
Replacing the variables <In angled brackets>
with the appropriate values.
This can be a bit tricky when you’re in the early stages of the project and need to frequently update your core code package (as you’ll need to push the package changes to GitHub and then update your environment file every time); I often leave this until my package code is fairly stable. You’ll definitely want to do this before releasing your code.
3. Do your research!
Reloading libraries
When working with an editable install, you may need to force reload the module after changing it to ensure the changes carry through.
Let’s say I have updated my package amazing_project
, and it’s installed in my current Conda environment as an editable install. I don’t need to update my environment, but if you’re using a notebook, you will have to force reload the module. This is very easy using importlib
(this is included in the core Python library, so doesn’t need to be added to your environment):
import amazing_project as ap
import importlib
reload(ap) importlib.
Using notebooks
Jupyter notebooks can be tricky when it comes to version control: they are filled html formatting that updates when you rerun cells, and this can obscure actual code changes. There are a few different options if you are keep to use notebooks:
- Jupytext: creating Jupyter notebooks in plain
.py
files - Marimo notebooks: an alternative notebook framework, again in plain
.py
.
4. Export a record of your environment
When you’ve created a “batch” of results that you’re happy with, you should record your exact dependencies at that point:
conda env export > env-record.yml # from inside the activated env
When you’ve created a “batch” of results that you’re happy with, you should record your exact dependencies in the environment used to produce those results at that point (so your research environment from Step 2), and save this in your analysis folder:
conda env export > env-record.yml # from inside the activated env
It’s often a good idea to create a release of your package (see Step 5 below) and install it properly using the version number and the GitHub url in your research environment, and then do your final analysis and export your research environment.
5. Create a release, synced with zenodo
Add your DOI to your citation file!