RSE: How to

These module notes aim to provide you with a friendly introduction to Research Software Engineering (RSE) or Research Software Development. This is an evolution of the course notes Sustainable Software Development (Murphy Quinlan 2024), SWD3 - Software Development Practices in Python (Murphy Quinlan & Ternes, 2024), and SWD3: Software development practices for Research (Ternes, 2023) intended to be more flexible and modular for learners with different experience and knowledge. These notes and the course structure are experimental, and will be incorporated into our general Advanced Research Computing training materials following feedback from this course.

The first half of this course has been extensively updated with new material covering git, bash, and conda (these were previously pre-requisites of SWD3), while the remainder follows Sustainable Software Development (Murphy Quinlan 2024); we will be using the notes linked above for this second part of the course.

This course covers a lot of information in a short time: treat the synchronous portion of this course as a primer, and then explore the material in more depth after the course. Also, look through previous iterations of the course for some slightly different materials that goes more in-depth in some areas!

Why do software engineering skills matter to researchers?

Research computing competency and specifically programming skills are becoming ever more important in our data-driven world: Jacobs et al. (2016) argues that “we are rapidly approaching a point where innovations [in research] will primarily come from those who are able to translate an idea into an algorithm, and then into computer code.”

Research software engineering skills are the lab tools and competencies of the computational research world; much like practical lab safety training, researchers often pick up bad habits (ask me about my fear of HF acid). Also similarly to lab processes, procedures, and best practises, software engineering is awash with bizarre jargon and is very difficult to navigate without the support of a jaded old PhD student who has made every mistake imaginable (hello, it’s me, I escaped my PhD over a year ago now).

This course does not provide the holy grail, the singular correct way of doing things; instead, it should hopefully make the process of writing good, robust, reproducible code a little bit more straighforward.

This course consists of a series of different topics and notes, usually including a brief presentation and some exercises for you to follow along with. Where relevant, I’ve cited various articles, and have usually linked to a variety of different resources.

Current course

The most recent/current iteration of this course is:

Research Software Engineering: SOEE5261M

Note that the order of delivery/course contents may vary depending on pre-course surveys etc.

You will need a GitHub account for the interactive portions of this course.

There is some homework and follow-on work for this course - this is not mandatory, but will help you to embed the tools and workflows you have learned in your research going forward!

How to use this resource

There’s a lot of material here to absorb, so here are some guidelines to help you make the most of it. This material is also covered in the presentation linked below.

Launch fullscreen presentation ⤢

1. Use the PRIMM method where possible

The PRIMM method is a method to specifically learn a programming language (Sentance, Waite, and Kallia 2019), but is generally useful for learning computational commands etc. It’s helpful for you as a learner to understand the PRIMM structure so you can apply it while working through this course. Not every step will be relevant or used at every stage of the course!

PRIMM is a pedagogical method specifically aimed at teaching text-based programming. While research into adult programming learners is very limited (especially in terms of demographics; many key studies that are cited have overwhelmingly homogenous test groups), the PRIMM method has a few key benefits:

It supports learners with different ability levels and who learn at different speeds;
It can be applied by learners even if the course materials are not specifically built with it in mind;
It can be applied to asynchronous learning materials (for example, if you are using these notes online on your own).

The P in PRIMM stands for predict:

When you first see a command, script, or piece of code, before running it, predict what you think it will do. It’s ok to get this wrong: the important thing is to get into the habit of predicting! This helps to keep you actively engaged and focused, and begins to build an intuitive sense about the structure of commands.

What do you think the code is going to do generally?
What do you think the output in your terminal is going to look like?

The R in PRIMM stands for run:

Run the code or program;
How does the output/effect compare to your prediction?
- What did you get right?
- What did you misinterpret?
Do you understand what happened?

The I in PRIMM stands for investigate:

Let’s dig a little deeper into the structure of code you’ve used.

What options or arguments did you use, and what effect did they have?
Can you find some documentation on the command you used?
- Does the description match how you would describe the code?
  - If no, why does your understanding of it diverge?
- What other options or features are available?

The first M in PRIMM stands for modify:

Try running the code with different options:
- Change only a small thing at a time;
- Always predict what you think the output will be!
- Compare the actual output with your prediction;
- Compare your understanding to the available documentation.

This stage helps you to gradually increase the difficultly of the tasks you are doing!

The second M in PRIMM stands for make:

This stage is about making the code your own.

At this stage, you can try implementing snippets of code you’ve already learned, but to solve a new or different problem;
Again, use the previous stages when you are writing your code: predict what you think will happen, run the code and compare the output to your predictions, and investigate the structure of it, especially if it does not behave how you intended!

2. Kick back, relax, and do something else when you know something

Too frequently, I have attended broad-level, apparently introductory courses that have catered to the fastest in the class.

The aim of this course is to demystify aspects of research software engineering, like dependency management, version control, and project directory structure. These topics are basic skills needed for pursuing research in the computational field. Certain demographics are less likely to learn these skills in an informal setting, meaning that classes like this are their main access to these topics.

I am interested in helping the students who do not already know everything about these topics, and will not be pacing the course for the fastest in the group who have pre-existing knowledge. With that said, I know there is nothing more boring than sitting through a class where you already know all the material - please feel free to use this time to do extra reading, work on projects, do various administrative work; stick on headphones if you like, and join back in when something relevant comes up.

3. Learn in a way that suits you

Read before you write - research has proven repeatedly the importance of reading and predicting the output of code as a method of learning, over just getting straight into it and writing code.

Novice programmers need to acquire accuracy in tracing code before they can program independently
Trying to write code first leads to frustration and confusion

However, with that said learn in a way that suits you - if that is copying and pasting commands from the slides instead of trying to keep up with typing, that’s ok! Also, if that means taking handwritten notes and not typing, that’s also fine!

Please feel free to:

Get up and stretch when you need, leave for breaks, and stretch your eyes by looking away from the screen;
Wear headphones for background music if it helps you focus;
Kick off your shoes and sit comfortably;
Do whatever else helps you learn!

References

Jacobs, Christian T, Gerard J Gorman, Huw E Rees, and Lorraine E Craig. 2016. “Experiences with efficient methodologies for teaching computer programming to geoscientists.” J. Geosci. Educ. 64 (3): 183–98. https://doi.org/10.5408/15-101.1.

Murphy Quinlan, Maeve. 2024. “Sustainable Software Dev.” Zenodo. https://doi.org/10.5281/zenodo.13868947.

Sentance, Sue, Jane Waite, and Maria Kallia. 2019. “Teaching computer programming with PRIMM: a sociocultural perspective.” Comput. Sci. Educ. 29 (2-3): 136–76. https://doi.org/10.1080/08993408.2019.1608781.