Dependencies

Dependencies are the versions of different packages/modules that your code depends on, for example the version of Python you are using, and any libraries you have to import, like matplotlib, scipy, tensorflow etc.

Dependencies are an important thing to keep track of when building scientific code. How many different external libraries does your code depend on? What versions of these libraries does it need? How do you install and update these different libraries?

Browse through this quick presentation to learn more.

Package management for Python

In Python, there are lots of different ways to install and manage packages and dependencies. These different tools generally involve using virtual environments in order to keep the dependencies for different projects separate and tidy. Some package installation and management tools include:

Conda/Mamba
- You can install conda with Miniforge
pip and Pipenv
pixi
Poetry
uv

You can read more about Python package management tool recommendations here. The package management tool you use will vary depending on whether you want to build your code into a package itself, or are relying primarily on external libraries. Some of these package managers include entire workflows for building and publishing Python packages, while others focus on organising pre-existing packages.

Package management for other coding languages

Note that I do not have as extensive experience managing projects, dependencies and packages in the following languages so please proceed with caution.

Package management in R

conda for R: you can install conda via Miniforge as linked above, and then install R packages through this following these instructions for R with conda.

renv: the reproducible environment package for R has some very nice introductory documentation.

Package management in Julia

Pkg: Pkg is Julia’s built-in package manager.

Dependencies: step-by-step for existing projects (Python)

Ok, so we’ve looked at the basic behind-the-scenes of what dependency management is, and some of the different options available. But how do you retroactively apply dependency management to an existing messy code project?. While we can’t record things that we’ve done in the past, we can start from now.

Step 0: Pick your package manager

While I’ve mentioned a whole host of options for Python package managers above, I’m going to work through some basic instructions for just three options: conda (installed via miniforge), pixi, and uv.

If you have never before used a package management system, or work in science, conda might be the best choice for you. See this conda blogpost (Murphy Quinlan 2024) for useful links to installation guides, and an in-depth use guide.

Conda is very widely used and recognised, especially amongst researchers in science and medical fields.

Pixi is great if you are using a lot of conda and PyPI packages together (which can get messy); it also can work with a pyproject.toml file if you plan on packaging your code at some point. It is very fast.

Have a read through this blog post on testing pixi (Ma 2024).

Tab content

Step 1: Manually record what libraries you use

Scroll through all the scripts you use in your project, and record all the packages that you call as imports across these different Python files and Jupyter notebooks (*.py and *.ipynb files).

For example, I have a series of Python files in my project folder with the following first few lines:

# file1.ipynb
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# file2.py
import numpy as np
import pandas as pd

My list of jotted down dependencies are then:

numpy
matplotlib ¹
seaborn
pandas
jupyter ²

Plain system Python
An old environment

If until now you’ve been running your Python programs directly using your system’s Python (so you have never set up an environment), let’s just see what versions of packages your system is using.

First, check the version of Python by running the following from the command line:

python --version

From the command line, run the following (replacing numpy with your dependencies in turn):

python -c "import numpy; print(numpy.__version__)"

This gives you an idea of what version of each of these dependencies your system has been using. Copy these down.

If you have been using an environment but it’s become messy or broken and you want to start over, there are a few different options for you.

Depending on the package management software you used to build the environment, the method to export the environment will be different. Search your package manager software name and “export dependencies” to see how to do this automatically.

Alternatively, if you’ve already manually collected the libraries used, and you know there’s a lot of bloat in your existing environment (lot’s of unused packages), you can instead activate the environment and then from the command line run the following (replacing numpy with your dependencies in turn):

python -c "import numpy; print(numpy.__version__)"

Also check the version of Python by running the following from the command line (again, with the environment active):

python --version

This gives you an idea of what version of each of these dependencies your system has been using. Copy these down.

Step 2: Create a new environment

Now that you know what packages you want to include in your environment, you can create a new environment. In the last step, we recorded the versions of different libraries we were using: right now, we’re not going to worry about pinning our versions to match our previous set-up unless something goes wrong. We’ll keep our manually recorded version numbers to hand just-in-case.

To create a new conda environment, you need to create an environment.yml file. This will contain a list of your dependencies, like this:

name: my-env-name

dependencies:
  - python=3.12
  - numpy
  - matplotlib
  - pandas
  - seaborn
  - jupyter

Put this in your project folder. I’ve just pinned the Python version as an example of how to pin a specific version. Then, from the command line (within this folder), run:

conda env create -f environment.yml

If you need to add pip dependencies, then your environment.yml will look like this:

name: my-env-name

dependencies:
  - python=3.12
  - numpy
  - matplotlib
  - pandas
  - seaborn
  - jupyter
  - pip
  - pip:
    - black

Note: mixing conda and pip will cause issues; please read this post on mixing conda and pip (Murphy Quinlan 2024).

To create a new pixi environment for your pre-existing project, from inside the project directory run:

pixi init

This will create a file called pixi.toml that will look something like this:

[project]
authors = [""]
channels = ["conda-forge"]
description = "Add a short description here"
name = "folder-name"
platforms = ["linux-64"]
version = "0.1.0"

[tasks]

[dependencies]

We can add pinned and unpinned dependencies from the command line:

pixi add python=3.12 numpy matplotlib pandas seaborn jupyter

This will fill in the dependencies section of our pixi.toml file with some automatically assigned version restrictions (given our pinned Python version):

[dependencies]
python = "3.12.*"
numpy = ">=2.2.1,<3"
matplotlib = ">=3.10.0,<4"
pandas = ">=2.2.3,<3"
seaborn = ">=0.13.2,<0.14"
jupyter = ">=1.1.1,<2"

We can also fill in our dependencies (with as-of-yet no pinned versions except for Python as an example):

[dependencies]
python = "3.12.*"
numpy = "*"
matplotlib = "*"
pandas = "*"
seaborn = "*"
jupyter = "*"

If you need any pip/PyPI dependencies, then simply add this section to the file:

[pypi-dependencies]
black = "*"

Alternatively, run this from the command line:

pixi add --pypi black

which will add the following to your pixi.toml:

[pypi-dependencies]
black = ">=24.10.0, <25"

Save any changes to your pixi.toml file, then back in the command line in the folder containing your pixi.toml, run the following:

pixi install

This will install the listed packages and create a pixi.lock file.

Read the Pixi docs on lockfiles.

Tab content

Step 3: Activate the environment

To activate your conda environment, from the command line run:

conda activate my-env-name

and then either launch your Jupyter notebook or run your Python script.

From the project folder, run:

pixi shell

and then either launch your Jupyter notebook or run your Python script.

Tab content

Step 4: Export your environment

Exporting and recording your environment is an important step in ensuring reproducibility and reusability of your code.

There are a few different options when it comes to exporting your conda environment. Read more information here on the different ways to export.

To export a detailed record of your environment for reproducibility, use the following command:

conda env export > env-record.yml

Note: this might not be installable on a different machine due to build dependencies - see this post for more details on exporting.

From the project folder, run:

pixi shell

and then either launch your Jupyter notebook or run your Python script.

Tab content

References

Ma, Eric J. 2024. “It’s time to try out pixi!” Eric J. Ma’s Blog, August.

Murphy Quinlan, Maeve. 2024. “The Art of Conda.” https://murphyqm.github.io/posts/2024-11-27-conda-envs.

Footnotes

We don’t specify the module (the pyplot part of matplotlib.pyplot) in our requirements.↩︎
While jupyter isn’t imported in our files, we are using a Jupyter Notebook (file1.ipynb required Jupyter) so this must be included in our dependencies.↩︎