Making your research truly reproducible and reusable
Reproducibility, replicability and reusability are all interlinked.
Prevents future panic of not being able to find source files for results, or not being able to reproduce certain figures.
Ensures your contributions (and the contributions of collaborators) are recorded: this can be very useful when it comes to submitting your thesis and demonstrating what is your work and where collaborators supported!
conda env export
command): reproducibilityIt can often seem nearly impossible to implement any of the tools we have discussed on a messy codebase.
We use the DeReLiCT acronym to patch up falling-down code:
If there’s no sign of a requirements.txt
, a environment.yml
, or anything:
Scan through the Python scripts in the project and start to create a list of all dependencies manually
Use a little Bash script to find dependencies:
Install a Python package such as isort
in your environment and then run:
Install a Python package such as pipreqs
in your environment and then run:
If you have been using a Conda environment, but have no record of it (no reusable environment.yml
):
Create an exact export to record the state of the project now:
Create a reusable Conda environment file using the snippet below.
# Extract installed pip packages
pip_packages=$(conda env export | grep -A9999 ".*- pip:" | grep -v "^prefix: " | cut -f1 -d"=")
# Export conda environment without builds, and append pip packages
conda env export --from-history | grep -v "^prefix: " > new-environment.yml
echo "$pip_packages" >> new-environment.yml
This allows us to export all the Conda (and any pip) dependencies, without pinned versions (unless they were intentionally pinned on creation of the environment). This code snippet has been edited based on an answer posted by the GitHub user ekiwi111.
git init
git add FILENAME
main
branch working correctlysrc/
, and inside this put the folder <PACKAGE_NAME>/
.src/<PACKAGE_NAME>/__init__.py
pyproject.toml
file using the template provided in the course notes.It is worthwhile implementing testing even if you can’t test everything!
test_<PYTHON_-_FILE_-N_AME>.py
for the file you are tacklingtest_<FUNCTION_NAME>
Remember the test blueprint:
pip install <PACKAGE_NAME
without requiring the GitHub link)
pyproject.toml
file in the repository)