Install Slurm on WSL: test environment
Step-by-step install and setup of Slurm on WSL2 for testing purposes. Mainly followed this guide: Slurm for Dummies with some modifications for a single-node WSL2 mini set-up.
Obviously this isn’t going to let you test all config changes, but will allow messing with user accounts/QoS permissions etc.
I’ve just copied the commands I used below in order; WSL2 was Ubuntu 24.05.5 LTS. Obviously, again, not matching the OS of the cluster.
sudo apt update
sudo apt upgrade
sudo apt install munge libmunge2 libmunge-dev
# to test if it worked
munge -n | unmunge | grep STATUS
# fix permissions
sudo chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
sudo chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/
sudo chmod 0755 /run/munge/
sudo chmod 0700 /etc/munge/munge.key
sudo chown -R munge: /etc/munge/munge.key
# so that it runs on startup
systemctl enable munge
systemctl restart munge
# checking if it works
systemctl status munge
# Install slurm
sudo apt install slurm-wlm
# To find config file
cd /usr/share/doc/slurmctld/
explorer.exe .
# launch the html file configure template
# to figure out name for ControlMachine, etc.:
hostname
# for cores
slurmd -C
# for memory, - approx 512 MB for safety
free -mOutput from slurmd -C:
NodeName=<hostname-output> CPUs=20 Boards=1 SocketsPerBoard=1 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=15707
UpTime=0-08:37:17Config settings:
- ClusterName: airetest
- SlurmctldHost:
- SlurmUser: slurm
- NodeName:
- RealMemory: 14500
See other output from slurmd -C above for CPUs etc.
- ProctrackType: proctrack/linuxproc
- TaskPlugin: task/none
- SelectType: select/cons_tres
- SelectTypeParameters: CR_Core_Memory
- JobAcctGatherType: jobacct_gather/linux (mimicking current set-up, also avoids fragile WSL cgroup shenanigans)
Other settings left default.
Click submit, and it generates a config file for you, copy and paste this into the following file:
sudo nano /etc/slurm/slurm.conf
systemctl enable slurmctld
systemctl restart slurmctld
systemctl status slurmctld
sinfoCan then submit a little test job to see what happens:
#!/bin/bash
sleep 30
env# 1) See node state/reason
sinfo -N -l
scontrol show node <hostname-output> | egrep "State=|Reason=|CfgTRES|RealMemory|CPUTot"
# 2) Check daemons/auth
systemctl is-active munge slurmctld slurmd
# 3) If any are not active:
sudo systemctl restart munge slurmctld slurmd
# 4) Clear DOWN/DRAIN
sudo scontrol update NodeName=<hostname-output> State=RESUME
# 5) Recheck
sinfo -N -l
squeueEnded up having to also run:
sudo systemctl enable --now munge slurmctld slurmd
sudo systemctl status slurmd --no-pager -l
sudo journalctl -u slurmd -n 80 --no-pager
sudo scontrol update NodeName=<hostname-output> State=RESUMEJobs can be submitted, queue and run. I’ll have to set up sacctmgr to properly test accounts/QoS/priority behaviour etc., but that’s for another day.