Research Software Development in Python

Workshop notes introducing reproducible, replicable and reusable software workflows for research coding.

Author
Affiliation

Maeve Murphy Quinlan

Published

June 24, 2025

This resource is intended to support researchers in embedding good software engineering practices into their workflows to make their computational research easier to manage, more reproducible, more reusable and sharable, and more robust.

This course uses Python to build an example project, in order to discuss and try out dependency management, version control, and testing. The concepts will be useful and applicable even if you use another language; however, if you do not know Python, it may be more useful for you to use these notes to read through the concepts in your own time rather than attending the course.

Course Pre-requisites

For this course, it is recommended that you have some basic familiarity with Bash/the Linux command line.

If you haven’t, please work through these course materials before attending the course to get the most out of the session. These course notes should take approximately 2 hour to work through.

You can attend without this experience; however, you may find it difficult to keep up with the pace of the course.

This resource is designed to be used in the following contexts:

There is no one answer

One of the great things about research computing is that it’s flexible, experimental, and should support you doing research in your areas of expertise.

It’s really important to keep these points in mind when working through this material:

  • You are the domain expert: you are the expert in your area, who knows the intricacies of how things are done and why they are done in a certain way. The information here may not be best approach for every discipline; these notes (or the facilitator guiding you through them) is not the font of all knowledge.
    • If there is something that you feel could do with extra context, or if you’d like to clairfy how things work differently in a certain field, I would really appreciate your contributions to these materials! You can see the page Contributing to see how you can get involved.
  • There are many different ways of doing things: In these notes, I present one or two methods or ideas for how to solve common research issues. These are not the only ways to approach these, or even the best ways: they are simply ways that we have found useful.

This course will use Python as an example language, but the general concepts are language-agnostic. If you create or work through examples in a different language, please feel free to contribute notes on these!

Course Objectives

The main objective of this specific iteration of the course is to introduce or showcase a selection of different tools and methods that can help to make your research more reproducible, and then (more importantly) provide you with some ideas and workflows to use these tools together.

Lots of documentation exists online for each individual tool we introduce, but something that’s missing is an idea of how all these different things fit together in a way that is easy to manage, doesn’t take huge amounts of time, and that makes sense in the framework of the research process.

This resource will provide you with a start-to-finish project template that you can use as-is, or you can pick and choose parts of this that are relevant to your work.

This material touches on:

  • Version control: your computational lab notebook;
  • Dependency management and project organisation: keeping your digital lab clean and tidy;
  • Writing pseudo-code: translating research problems into computer-speak;
  • Testing and automated workflows: preventing your research from going up in flames;
  • Documentation: helping future you understand your work;
  • Creating releases: archiving your research set-up to ensure reproducibility.

Getting the most out of this course

Everyone learns in different styles and in different ways; everyone brings different experience to the table when using research computing! This course is probably best used when moving at your own pace, following along with the step-by-step example project.

  • If solutions are hidden in dropdowns, it’s a good idea to try to predict the answer before opening them to check;
  • Learning to read and predict the outcome of code is one of the best skills to practise to improve your programming and computational skills: before you ever run a line of code, try to predict the outcome!
  • To give you a safe sandbox to work in, there are links to a virtual cloud Linux machine for you to experiment in to your heart’s content.

Course Content

This course introduces a reproducible project workflow that you can take and use for any future coding project. The below provides suggested durations for each session, with a timetable provided for synchronous courses.

Note that this is an intensive course that covers a lot of content. If you are attending a synchronous course, please revise the material between sessions. If sessions take longer on Day 1 of the course and not all the scheduled material is covered, please read through the presentations ahead of the next session. They will be put into use in the first practical so there will be a chance to review.

Session Approximate duration Key objectives Start time (taking breaks into account)
Day 1
Presentation: Introduction 15 mins Introduce reproducibility, replicability, and reusability in the context of research code. 10 am
Practical 1 15 mins Define requirements for a small example project as a group. 10.15 am
Presentation: Project organisation 20 mins Discuss ways of managing data files, code, scripts, results, and notes in a research project. 10.30 am (followed by 10 min break)
Practical 2 15 mins Set up a directory structure for the example project 11 am
Presentation: Version control 30 mins Introduce version control with git 11.15 am (followed by 5 min break)
Practical 3 30 mins Explore drafting pseudocode, notes, and comments, while using version control 11.50
Presentation: Dependency management 30 mins Introduce dependency management with Conda 12.20 pm
Q&A or headstart on Practical 4 10 mins Last ten minutes of day 1 12.50 pm
Note: If sessions take longer on this day, please complete outstanding practicals as homework
Day 2
Practical 4 40 mins Brief review of dependencies. Set up Python environment and begin drafting code 10 am
Presentation: Testing 20 mins Introduce testing workflows 10.20 am (followed by 5 min break)
Practical 5 30 mins Write a basic testing suite 10.45 am
Presentation: Releasing and archiving your code 30 mins Discuss the importance of creating releases and minting DOIs for your code; discuss exporting record of environment alongside results 11.15 am (followed by 10 min break)
Practical 6 30 mins Releasing and archiving your code for reproducibility 11.55 am
Summary and Q&A 35 mins Reviewing material covered 12.25 pm

This is a challenging course to deliver, as there are many tools involved. The following notes have been compiled over the iterations of deliveries of previous versions:

  • In advance of the course, run through the practical material to ensure everything still works. Updates etc. can lead to unexpected errors
    • Don’t make things more difficult for yourself by improvising and trying out new things in front of a classroom for the first time…
    • But equally, it’s perfectly valid (and a good idea) to model not knowing things: it’s better to offer to find an answer and post it somewhere shared at a later point, rather than making people sit through you live debugging.
  • Prioritise delivering the presentations and answering questions and discussion points; it’s ok to ask attendees to complete a practical after the session.
    • The practicals build on each other so you can’t skip an earlier practical and then do a later one and expect this to work. HIghlight this to students when assigning “homework”.
  • Live coding does not work:
    • It’s inaccessible, ineffective, and stressful for both the person delivering and the audience.
    • It’s ok to run through examples live on the screen as a recap or summary, or in response to a specific question, or when demonstrating steps in the practical, but it’s important to note:
      • Your typing on the screen should never be the only information source for important or key steps in the process: anything you do on screen should be also documented in the notes.
      • The attendees should be given time to work through the practical sessions while you provide support, rather than sitting transcribing/copying what you are doing on screen.
  • Note any errors in an issue on the GitHub repository for this website immediately after the course: you will likely forget if you leave it until you have time to update the materials.

Author’s Note

This material is an evolution of the original SWD3: Software development practices for Research course, written by Dr Patricia Ternes and run by Research Computing at the University of Leeds, but has been extensively reworked and updated.

This material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License, and is copyrighted by the University of Leeds.

Contribution guidelines

Please feel free to open issues or suggest edits to this material via our GitHub repository (see links in the sidebar). Note that when the changes you have made have been merged with the main branch, you may not see an update to the live website until a new release is made.

Please read the GitHub community guidelines before contributing.

Contributors

Below is a list of contributors (in chronological order of addition) with associated Contributor Role Taxonomy (CRediT) roles. Please see the CRediT website for definitions of roles.

  • Maeve Murphy Quinlan: conceptualization, writing (original draft), writing (review and editing), data curation, methodology.
    • Creation of website, Python workflows, GitHub actions.
    • Redesign of course.

Citation

If you use or reference this material, please cite it.

Citation

BibTeX citation:
@online{murphy_quinlan2025,
  author = {Murphy Quinlan, Maeve},
  title = {Research {Software} {Development} in {Python}},
  date = {2025-06-24},
  url = {https://murphyqm.github.io/research-software-development/},
  langid = {en}
}
For attribution, please cite this work as:
Murphy Quinlan, Maeve. 2025. “Research Software Development in Python.” June 24, 2025. https://murphyqm.github.io/research-software-development/.