Dependencies
Dependencies are the versions of different packages/modules that your code depends on, for example the version of Python you are using, and any libraries you have to import, like matplotlib
, scipy
, tensorflow
etc.
Dependencies are an important thing to keep track of when building scientific code. How many different external libraries does your code depend on? What versions of these libraries does it need? How do you install and update these different libraries?
Browse through this quick presentation to learn more.
Package management for Python
In Python, there are lots of different ways to install and manage packages and dependencies. These different tools generally involve using virtual environments in order to keep the dependencies for different projects separate and tidy. Some package installation and management tools include:
You can read more about Python package management tool recommendations here. The package management tool you use will vary depending on whether you want to build your code into a package itself, or are relying primarily on external libraries. Some of these package managers include entire workflows for building and publishing Python packages, while others focus on organising pre-existing packages.
Package management for other coding languages
Note that I do not have as extensive experience managing projects, dependencies and packages in the following languages so please proceed with caution.
Package management in R
conda for R: you can install conda via Miniforge as linked above, and then install R packages through this following these instructions for R with conda.
renv: the reproducible environment package for R has some very nice introductory documentation.
Package management in Julia
Pkg: Pkg is Julia’s built-in package manager.
Dependencies: step-by-step for existing projects (Python)
Ok, so we’ve looked at the basic behind-the-scenes of what dependency management is, and some of the different options available. But how do you retroactively apply dependency management to an existing messy code project?. While we can’t record things that we’ve done in the past, we can start from now.
Step 0: Pick your package manager
While I’ve mentioned a whole host of options for Python package managers above, I’m going to work through some basic instructions for just three options: conda
(installed via miniforge
), pixi
, and uv
.
If you have never before used a package management system, or work in science, conda
might be the best choice for you. See this conda blogpost (Murphy Quinlan 2024) for useful links to installation guides, and an in-depth use guide.
Conda is very widely used and recognised, especially amongst researchers in science and medical fields.
Pixi is great if you are using a lot of conda and PyPI packages together (which can get messy); it also can work with a pyproject.toml
file if you plan on packaging your code at some point. It is very fast.
Have a read through this blog post on testing pixi (Ma 2024).
Tab content
Step 1: Manually record what libraries you use
Scroll through all the scripts you use in your project, and record all the packages that you call as imports across these different Python files and Jupyter notebooks (*.py
and *.ipynb
files).
For example, I have a series of Python files in my project folder with the following first few lines:
# file1.ipynb
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# file2.py
import numpy as np
import pandas as pd
My list of jotted down dependencies are then:
If until now you’ve been running your Python programs directly using your system’s Python (so you have never set up an environment), let’s just see what versions of packages your system is using.
First, check the version of Python by running the following from the command line:
python --version
From the command line, run the following (replacing numpy
with your dependencies in turn):
python -c "import numpy; print(numpy.__version__)"
This gives you an idea of what version of each of these dependencies your system has been using. Copy these down.
If you have been using an environment but it’s become messy or broken and you want to start over, there are a few different options for you.
Depending on the package management software you used to build the environment, the method to export the environment will be different. Search your package manager software name and “export dependencies” to see how to do this automatically.
Alternatively, if you’ve already manually collected the libraries used, and you know there’s a lot of bloat in your existing environment (lot’s of unused packages), you can instead activate the environment and then from the command line run the following (replacing numpy
with your dependencies in turn):
python -c "import numpy; print(numpy.__version__)"
Also check the version of Python by running the following from the command line (again, with the environment active):
python --version
This gives you an idea of what version of each of these dependencies your system has been using. Copy these down.
Step 2: Create a new environment
Now that you know what packages you want to include in your environment, you can create a new environment. In the last step, we recorded the versions of different libraries we were using: right now, we’re not going to worry about pinning our versions to match our previous set-up unless something goes wrong. We’ll keep our manually recorded version numbers to hand just-in-case.
To create a new conda environment, you need to create an environment.yml
file. This will contain a list of your dependencies, like this:
name: my-env-name
dependencies:
- python=3.12
- numpy
- matplotlib
- pandas
- seaborn
- jupyter
Put this in your project folder. I’ve just pinned the Python version as an example of how to pin a specific version. Then, from the command line (within this folder), run:
conda env create -f environment.yml
If you need to add pip dependencies, then your environment.yml
will look like this:
name: my-env-name
dependencies:
- python=3.12
- numpy
- matplotlib
- pandas
- seaborn
- jupyter
- pip
- pip:
- black
Note: mixing conda
and pip
will cause issues; please read this post on mixing conda and pip (Murphy Quinlan 2024).
To create a new pixi environment for your pre-existing project, from inside the project directory run:
pixi init
This will create a file called pixi.toml
that will look something like this:
[project]
authors = [""]
channels = ["conda-forge"]
description = "Add a short description here"
name = "folder-name"
platforms = ["linux-64"]
version = "0.1.0"
[tasks]
[dependencies]
We can add pinned and unpinned dependencies from the command line:
pixi add python=3.12 numpy matplotlib pandas seaborn jupyter
This will fill in the dependencies section of our pixi.toml
file with some automatically assigned version restrictions (given our pinned Python version):
[dependencies]
python = "3.12.*"
numpy = ">=2.2.1,<3"
matplotlib = ">=3.10.0,<4"
pandas = ">=2.2.3,<3"
seaborn = ">=0.13.2,<0.14"
jupyter = ">=1.1.1,<2"
We can also fill in our dependencies (with as-of-yet no pinned versions except for Python as an example):
[dependencies]
python = "3.12.*"
numpy = "*"
matplotlib = "*"
pandas = "*"
seaborn = "*"
jupyter = "*"
If you need any pip/PyPI dependencies, then simply add this section to the file:
[pypi-dependencies]
black = "*"
Alternatively, run this from the command line:
pixi add --pypi black
which will add the following to your pixi.toml
:
[pypi-dependencies]
black = ">=24.10.0, <25"
Save any changes to your pixi.toml
file, then back in the command line in the folder containing your pixi.toml
, run the following:
pixi install
This will install the listed packages and create a pixi.lock
file.
Read the Pixi docs on lockfiles.
Tab content
Step 3: Activate the environment
To activate your conda environment, from the command line run:
conda activate my-env-name
and then either launch your Jupyter notebook or run your Python script.
From the project folder, run:
pixi shell
and then either launch your Jupyter notebook or run your Python script.
Tab content
Step 4: Export your environment
Exporting and recording your environment is an important step in ensuring reproducibility and reusability of your code.
There are a few different options when it comes to exporting your conda environment. Read more information here on the different ways to export.
To export a detailed record of your environment for reproducibility, use the following command:
conda env export > env-record.yml
Note: this might not be installable on a different machine due to build dependencies - see this post for more details on exporting.
From the project folder, run:
pixi shell
and then either launch your Jupyter notebook or run your Python script.
Tab content