Research Software Development in Python
Workshop notes introducing reproducible, replicable and reusable software workflows for research coding.
This resource is intended to support researchers in embedding good software engineering practices into their workflows to make their computational research easier to manage, more reproducible, more reusable and sharable, and more robust.
This course uses Python to build an example project, in order to discuss and try out dependency management, version control, and testing. The concepts will be useful and applicable even if you use another language; however, if you do not know Python, it may be more useful for you to use these notes to read through the concepts in your own time rather than attending the course.
For this course, it is recommended that you have some basic familiarity with Bash/the Linux command line.
If you haven’t, please work through these course materials before attending the course to get the most out of the session. These course notes should take approximately 2 hour to work through.
You can attend without this experience; however, you may find it difficult to keep up with the pace of the course.
This resource is designed to be used in the following contexts:
- As course notes for a guided, synchronous course (SWD3);
- As independent course notes for a self-directed course;
- As documentation to dip in and out of as needed.
One of the great things about research computing is that it’s flexible, experimental, and should support you doing research in your areas of expertise.
It’s really important to keep these points in mind when working through this material:
- You are the domain expert: you are the expert in your area, who knows the intricacies of how things are done and why they are done in a certain way. The information here may not be best approach for every discipline; these notes (or the facilitator guiding you through them) is not the font of all knowledge.
- If there is something that you feel could do with extra context, or if you’d like to clairfy how things work differently in a certain field, I would really appreciate your contributions to these materials! You can see the page Contributing to see how you can get involved.
- There are many different ways of doing things: In these notes, I present one or two methods or ideas for how to solve common research issues. These are not the only ways to approach these, or even the best ways: they are simply ways that we have found useful.
This course will use Python as an example language, but the general concepts are language-agnostic. If you create or work through examples in a different language, please feel free to contribute notes on these!
Course Objectives
The main objective of this specific iteration of the course is to introduce or showcase a selection of different tools and methods that can help to make your research more reproducible, and then (more importantly) provide you with some ideas and workflows to use these tools together.
Lots of documentation exists online for each individual tool we introduce, but something that’s missing is an idea of how all these different things fit together in a way that is easy to manage, doesn’t take huge amounts of time, and that makes sense in the framework of the research process.
This resource will provide you with a start-to-finish project template that you can use as-is, or you can pick and choose parts of this that are relevant to your work.
This material touches on:
- Version control: your computational lab notebook;
- Dependency management and project organisation: keeping your digital lab clean and tidy;
- Writing pseudo-code: translating research problems into computer-speak;
- Testing and automated workflows: preventing your research from going up in flames;
- Documentation: helping future you understand your work;
- Creating releases: archiving your research set-up to ensure reproducibility.
Getting the most out of this course
Everyone learns in different styles and in different ways; everyone brings different experience to the table when using research computing! This course is probably best used when moving at your own pace, following along with the step-by-step example project.
- If solutions are hidden in dropdowns, it’s a good idea to try to predict the answer before opening them to check;
- Learning to read and predict the outcome of code is one of the best skills to practise to improve your programming and computational skills: before you ever run a line of code, try to predict the outcome!
- To give you a safe sandbox to work in, there are links to a virtual cloud Linux machine for you to experiment in to your heart’s content.
Course Content
This course introduces a reproducible project workflow that you can take and use for any future coding project. The below provides suggested durations for each session, with a timetable provided for synchronous courses.
Note that this is an intensive course that covers a lot of content. If you are attending a synchronous course, please revise the material between sessions. If sessions take longer on Day 1 of the course and not all the scheduled material is covered, please read through the presentations ahead of the next session. They will be put into use in the first practical so there will be a chance to review.
Session | Approximate duration | Key objectives | Start time (taking breaks into account) |
---|---|---|---|
Day 1 | |||
Presentation: Introduction | 15 mins | Introduce reproducibility, replicability, and reusability in the context of research code. | 10 am |
Practical 1 | 15 mins | Define requirements for a small example project as a group. | 10.15 am |
Presentation: Project organisation | 20 mins | Discuss ways of managing data files, code, scripts, results, and notes in a research project. | 10.30 am (followed by 10 min break) |
Practical 2 | 15 mins | Set up a directory structure for the example project | 11 am |
Presentation: Version control | 30 mins | Introduce version control with git | 11.15 am (followed by 5 min break) |
Practical 3 | 30 mins | Explore drafting pseudocode, notes, and comments, while using version control | 11.50 |
Presentation: Dependency management | 30 mins | Introduce dependency management with Conda | 12.20 pm |
Q&A or headstart on Practical 4 | 10 mins | Last ten minutes of day 1 | 12.50 pm |
Note: If sessions take longer on this day, please complete outstanding practicals as homework | |||
Day 2 | |||
Practical 4 | 40 mins | Brief review of dependencies. Set up Python environment and begin drafting code | 10 am |
Presentation: Testing | 20 mins | Introduce testing workflows | 10.20 am (followed by 5 min break) |
Practical 5 | 30 mins | Write a basic testing suite | 10.45 am |
Presentation: Releasing and archiving your code | 30 mins | Discuss the importance of creating releases and minting DOIs for your code; discuss exporting record of environment alongside results | 11.15 am (followed by 10 min break) |
Practical 6 | 30 mins | Releasing and archiving your code for reproducibility | 11.55 am |
Summary and Q&A | 35 mins | Reviewing material covered | 12.25 pm |
This is a challenging course to deliver, as there are many tools involved. The following notes have been compiled over the iterations of deliveries of previous versions:
- In advance of the course, run through the practical material to ensure everything still works. Updates etc. can lead to unexpected errors
- Don’t make things more difficult for yourself by improvising and trying out new things in front of a classroom for the first time…
- But equally, it’s perfectly valid (and a good idea) to model not knowing things: it’s better to offer to find an answer and post it somewhere shared at a later point, rather than making people sit through you live debugging.
- Prioritise delivering the presentations and answering questions and discussion points; it’s ok to ask attendees to complete a practical after the session.
- The practicals build on each other so you can’t skip an earlier practical and then do a later one and expect this to work. HIghlight this to students when assigning “homework”.
- Live coding does not work:
- It’s inaccessible, ineffective, and stressful for both the person delivering and the audience.
- It’s ok to run through examples live on the screen as a recap or summary, or in response to a specific question, or when demonstrating steps in the practical, but it’s important to note:
- Your typing on the screen should never be the only information source for important or key steps in the process: anything you do on screen should be also documented in the notes.
- The attendees should be given time to work through the practical sessions while you provide support, rather than sitting transcribing/copying what you are doing on screen.
- Note any errors in an issue on the GitHub repository for this website immediately after the course: you will likely forget if you leave it until you have time to update the materials.
Citation
@online{murphy_quinlan2025,
author = {Murphy Quinlan, Maeve},
title = {Research {Software} {Development} in {Python}},
date = {2025-06-24},
url = {https://murphyqm.github.io/research-software-development/},
langid = {en}
}