Making your research truly reproducible and reusable
Reproducibility, replicability and reusability are all interlinked.
Prevents future panic of not being able to find source files for results, or not being able to reproduce certain figures.
Ensures your contributions (and the contributions of collaborators) are recorded: this can be very useful when it comes to submitting your thesis and demonstrating what is your work and where collaborators supported!
conda env export command): reproducibilityIt can often seem nearly impossible to implement any of the tools we have discussed on a messy codebase.
We use the DeReLiCT acronym to patch up falling-down code:
If there’s no sign of a requirements.txt, a environment.yml, or anything:
Scan through the Python scripts in the project and start to create a list of all dependencies manually
Use a little Bash script to find dependencies:
Install a Python package such as isort in your environment and then run:
Install a Python package such as pipreqs in your environment and then run:
If you have been using a Conda environment, but have no record of it (no reusable environment.yml):
Create an exact export to record the state of the project now:
Create a reusable Conda environment file using the snippet below.
# Extract installed pip packages
pip_packages=$(conda env export | grep -A9999 ".*- pip:" | grep -v "^prefix: " | cut -f1 -d"=")
# Export conda environment without builds, and append pip packages
conda env export --from-history | grep -v "^prefix: " > new-environment.yml
echo "$pip_packages" >> new-environment.ymlThis allows us to export all the Conda (and any pip) dependencies, without pinned versions (unless they were intentionally pinned on creation of the environment). This code snippet has been edited based on an answer posted by the GitHub user ekiwi111.
git initgit add FILENAMEmain branch working correctlysrc/, and inside this put the folder <PACKAGE_NAME>/.src/<PACKAGE_NAME>/__init__.pypyproject.toml file using the template provided in the course notes.It is worthwhile implementing testing even if you can’t test everything!
test_<PYTHON_-_FILE_-N_AME>.py for the file you are tacklingtest_<FUNCTION_NAME>Remember the test blueprint:
pip install <PACKAGE_NAME without requiring the GitHub link)
pyproject.toml file in the repository)