--- title: Example Git action loop --- %%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#9fe1ff', 'primaryTextColor': '#470044', 'primaryBorderColor': '#000000', 'lineColor': '#9158A2', 'secondaryColor': '#e79aff', 'tertiaryColor': '#fffc58' } } }%% flowchart TD Untracked -->|**git add**| Staged Staged -->|**git commit**| Committed Committed -.->|*edit files*| Untracked
Learning git: why is it so difficult?
Recently I gave a talk at the Edinburgh Winter School “Code for whom?” talk. You can see all the amazing talks and presentations here. I’d really recommend watching the talks, all of them were interesting and useful.
One of the lightning talks by John Wilson about difficulties in learning git. The main points raised in this talk were:
- git is not particularly user-friendly, and oftentimes we build an incorrect mental picture of how it works (especially if taught in a simplified way - for example, if we are told “it’s just like track changes”);
- as teachers, we are often not totally confident in our own understanding of git, and can pass on our nervousness to students;
- there are multiple barriers to use, from installation pains, to adding complications and confusion between git and services like Bitbucket, GitLab, GitHub etc.
We had a brief chat at the coffee break during the conference about how to approach this, and I’ve been thinking about how my own mental model of git might be incorrect, and reflecting on difficulties I’ve had using it.
I’ve come up with a few questions I want to explore:
- How to we build a simple but accurate mental model of git? How do we illustrate this to learners?
- What level of complexity and usefulness do researchers etc. actually need to use git effectively?
- How do we separate understanding of git vs. remote cloud-hosted services?
- How do we make the onboarding process less painful?
- What’s the best way to install for different users?
- Is GUI vs. CLI easier for new users (in terms of building a mental model)?
This blog post is very much me just exploring some of these ideas, and absolutely does not contain all (or even any!) of the answers. What would be really useful if you’re interested would be adding your thoughts to this page via the Hypothes.is comments section (you should see a pop-out banner on the right-hand-side of this webpage; you’ll need a Hypothes.is account to comment). I’m looking for good resources, interesting tutorials or metaphors to teach git, and thoughts on the level of complexity needed for learners.
Part 1: Building a mental model of git
One of the points John brought up in his talk was that by using inaccurate comparisons (such as comparing git to “track changes” as available in Microsoft Word - which I absolutely have done in the past!), we end up building up an inaccurate mental model of git that then causes confusion for us down the line.
I’ve been thinking about my mental model of git, and whether it’s flawed. I found this interesting post describing one possible mental model for git.
This article lays out two main questions that git tries to address:
- How can teams of people work on the same documents without overwriting each other’s work and things getting messy?
- How can we create a working environment where we can edit and experiment with files without accidentally breaking everything? How do we build in a robust “undo” button?
I’d also like to add the following question I think git addresses:
- How do we record our work accurately, showing who built it and when?
I found point 1 a bit irrelevant for my initial use of git: as a PhD student researcher, my main collaboration was bringing figures and results to my supervisory meetings, and changing my code based on discussions about my results. My supervisors never edited my code. While there was more hands-on collaboration with writing projects, I had a large team of supervisors, some of whom were comfortable with git (but most who were not), and who wanted writing in different formats for editing (e.g., a pdf to either print and hand annotate, or digitally comment on; Word documents to edit and build on). While I now use git largely as a co-operative tool to work in parallel with my team, for a long time this just wasn’t part of my workflow and so it stayed very much a theoretical concept in my mental model for years.
Point 2 was the main draw for me to git: I was able to create a branch and experiment, run a bunch of tests and benchmarks to see if things were working as intended, and then merge the branch back into the main branch if I was happy with the changes. My response to failed and broken branches was pretty haphazard: sometimes I would just abandon them; sometimes I would delete and re-clone the repository from the remote (if I hadn’t pushed any changes), I haphazardly used stash
if something had gone a bit awry.
Point 3 became ever more important to me when I went on to publish scientific papers that used the results of my code: git (and GitHub) allowed me to reference specific commits that were used to generate results fo different iterations of the paper; I was able to record and version-control my code environment alongside this (see my conda blog post for more information on recording coding environments). Git and associated cloud-hosted remote repositories also make the labour involved in developing code visible (D’Ignazio and Klein 2020). This can tie back into point 1 if you are working with a team, ensuring your contributions are recorded.
Illustrating the mental model
While I’m still brainstorming graphical ways to present the early/conceptual stages of the git process (for example, the “people at different tables” visual analogy provided in this post describing one possible mental model for git), I’ve also been thinking about what git processes are useful to represent using a graph (without metaphor or simile). I’ve found a graphical representation of git to be really useful when I’m trying to deal with a snarl of weird branches. I’ve previously used the GitGraph extension for VSCode for personal development, and for teaching. I’ve also been using Mermaid to draw git graphs for a while in presentations etc., and definitely think they can be useful:
However, sometimes learners can become confused by the simultaneous process of the git add
, git commit
(and sometimes git push
) cycle that’s also happening simultaneously. I really like the way this is presented in the post linked above, as a cycle. I’ve recreated it below using Mermaid again:
I like the idea of possibly linking this loop to the graph in Figure 1; I’ll definitely be using this illustrative git loop in future presentations on this topic, perhaps noting that each node in Figure 1 represents the loop in Figure 2.
Part 2: What level of complexity do we need to convey to learners?
For the most part, my use of git could be described with repeated use of the following commands, with appropriate flags/arguments:
git clone
git add
git commit
git push
git pull
git branch
git checkout
git status
As I said above, my response to things going wrong has varied between a fully-nuclear response of deleting the local repository and just cloning it again, to variously successful attempts to remove mistakenly added files, resetting branches, using stash in a confused way. One key thing highlighted in John’s talk was that our nervousness about more complex git actions can bleed into our students, which is why I’m committing (get it?) to improving my confidence with git, specifically when trying to undo and fix tangled messes. I think I’ll be in a better position to judge what is useful for a learner to know when I’ve gained more skill and confidence in my own abilities.
Part 3: Local git vs. remote cloud-hosted services
I’ve rarely merged a branch locally from the command-line: I always push changes to my remote branch, then open up my cloud software and open a pull-request there. I find the GUI interface of the online system much more straightforward when it comes to comparing branches; but I also think that with new learners, this can introduce confusion when it comes to separating out the idea of git vs. GitHub or another cloud provider. Also, I do like some of the locally-available ways to visualise git branches/diffs etc. from the command line and through GUI systems (see my discussion of GitGraph for VSCode above).
In order to make this division more clear, should we instead stick purely to command line for the merging process? Maybe this is the point where a local git GUI can step in to help; I’m going to investigate some of the options available in the coming weeks (I’ve used GitHub desktop and found it to be severely lacking;). Also, for some researchers (for example, those working on an air-gapped secure research environment where they can’t access the internet or use a cloud host for their remote), working through complex merges etc. must all happen locally.
Is this perhaps something to divide up and tailor to specific groups? For example, for many researchers who intend on using GiHub, is it ok if there is a blurry line between these services? Does it matter if their mental model combines these, as long as their workflow works for them? One thing I do think is that GitHub actions/pages/workflows etc. should probably be avoided in a first introductory git class, since these veer far beyond the initial bounds of what git can do, and are specific to the GitHub hosting website. Perhaps they belong in an introduction to GitHub for researchers course, instead of overloading the already busy mental map of commits, staging, branches and merging.
I’m going to be looking a bit closer at some tools available such as:
- GitGraph with VSCode as a full local git management system (I previously have just used it as a visualising tool; free, available on Windows, Mac and Linux);
- Sourcetree (free, available on Windows and Mac);
- Lazygit (free, multiple different install options).
Any recommendations or suggestions with regards to tools for teaching, and CLI/local GUI/web-based remote are much appreciated.
Part 4: Painless (or at least less-painful) onboarding
There are a few different things that can make learning git for the first time difficult; I’ve listed a few here that spring to mind under two broad categories, but again please feel free to add your thoughts/things you found difficult.
Pain 1: installation
This is common of many pieces of scientific software: the installation of tools that integrate with a wide range of other tools (git, programming languages, environment and package managers) is quite a bit more involved and complicated that using the App Store. The multiple steps involved in installation of git (depending on the operating system/method of installation you are using) and esoteric-sounding questions about configuration can be very daunting if your experience has mainly been using user-friendly proprietary software, or computers managed by your employer/institution. Add to the confusion the fact that in a classroom where people bring their own machines/are joining online, you’ll be juggling support for multiple different operating systems and versions of those OSs.
Some possible solutions:
- Detailed, user friendly instructions including screenshots: the documentation of various different fantastic pieces of software can leave a lot to be desired when it comes to being beginner-friendly and actually useful for a novice. You should test out the installation process of software you are suggesting for a course and note down likely sticking points. Of course, you won’t be able to test every operating system etc., but maybe you can find helpful documentation online (outside of the official material).
- Providing drop-in sessions ahead of actual tutorials to solve set-up problems. This can be really useful for making sure the actual training/tutorial doesn’t get completely derailed into installation support.
- Using a virtual machine/cloud provider. We can get into further discussion of the whole “blurring the line between git and GitHub” here with this one, but I’ve found GH Codespaces a useful environment for introducing people to git, especially because they are a bit more comfortable experimenting without being afraid of affecting any actual work on their machine. The sandbox-like environment lets them build some confidence.
Pain 2: So many new things at once
Oftentimes, when we teach an introduction to git course, we unintentionally also teach:
- introduction to CLI applications;
- introduction to bash;
- introduction to CLI text editors like nano;
- introduction to line ending differences between systems;
- introduction to SSH keys or Personal Access Tokens;
- introduction to GitHub;
- introduction to file types, file endings, data types.
The list goes on and on, but to complete the process of edit file → git add file
→ git commit file -m "commit message here"
→ git push origin branchname
→ enter username/PAT or ssh key, we actually need to cycle through a whole load of skills that we can’t assume our learners will know. In fact, we shouldn’t assume they know any of the above; if we do, we are self selecting for people who already have a foot in the door and don’t need our help as much.
I don’t know the answer to this, but some options are:
- Embedding git courses in an introduction to bash/Linux course with plenty of time for absorbing the material;
- Leaning more heavily on GUI use for introducing the concept of git, so that you can focus on the process and workflow instead of remembering commands;
- Dividing the course into two parts and leaving discussion of remote repositories to part two to keep it moe clear in the learner’s minds;
- building useful cheat-sheets and reference guides that account for all of this new knowledge (so without assumptions of prior knowledge).
Conclusion
This is already very long and rambling, and sadly lacking in concrete answers to the above questions, but I think the main point is that git is an incredibly useful and powerful tool, that helps build reproducible and equitable research, but we need to widen access to it; the current teaching model clearly is not fit for purpose!
References
Citation
@online{murphy_quinlan2025,
author = {Murphy Quinlan, Maeve},
title = {Learning Git: Why Is It so Difficult?},
date = {2025-01-15},
url = {https://murphyqm.github.io/posts/2025-01-15-learning-git},
langid = {en}
}