TDM 30200: Project 5 — 2023

Motivation: In this project we will slowly get familiar with SLURM, the job scheduler installed on Anvil.

Context: This is the first in a series of 3 projects focused on parallel computing using SLURM and Python.

Scope: SLURM, unix, Python

Learning Objectives

Use basic SLURM commands to submit jobs, check job status, kill jobs, and more.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

/anvil/projects/tdm/data/coco/unlabeled2017/*.jpg

Questions

Interested in being a TA? Please apply: purdue.ca1.qualtrics.com/jfe/form/SV_08IIpwh19umLvbE

Question 1

This project (and the next) will have different types of deliverables. Each question will result in an entry in a Jupyter notebook, and/or 1 or more additional Python and/or Bash scripts.

To properly save screenshots in your Jupyter notebook, follow the guidelines here. Images that don’t appear in your notebook in Gradescope will not get credit.

When you start your JupyterLab session this week, BEFORE you start your session, please change "Processor cores requested" from 1 to 4. We will use 4 processing cores this week.

Most of the supercomputers here at Purdue, and Anvil, contain one or more frontends. Users can log in and submit jobs to run on one or more backends. To submit a job, users use SLURM.

SLURM is a job scheduler found on about 60% of the top 500 supercomputers.^[1] In this project (and the next) we will learn about ways to schedule jobs on SLURM, and learn the tools used.

Let’s get started by using a program called salloc. A brief explanation is that salloc gets some resources (think memory and cpus), and runs the commands specified by the user. If the user doesn’t specify any commands, it will open the user’s default shell (bash, zsh, fish, etc.) in the allocated resource.

Open a terminal and give it a try.

salloc -A cis220051 -p shared -n 3 -c 1 -t 00:05:00 --mem-per-cpu=1918

After some output, you should notice that your shell changed. Type hostname followed by enter to see that your host has changed from loginXX.anvil.rcac.purdue.edu to aXXX.anvil.rcac.purdue.edu. You are in a different system! Very cool!

To find out what the other options are read slurm.schedmd.com/salloc.html

The -A cis220051 option could have also been written --account=cis220051. This indicates which account to use when allocating the resources (memory and cpus). You can also think of this as a "queue" or "the datamine queue". Jobs submitted using this option will use the resources we pay for. Only users with permissions can submit to our queue.
The -n 3 option could have also been written --ntasks=3. This indicates how many tasks we may need for the job.
The -c 1 option could have also been written --cpus-per-task=1. This indicates the number of cores per task.
The -t 00:05:00 option could have also been written --time=00:05:00. This indicates how long the job may run for. If the time exceeds the time limit, the job is killed.
The --mem-per-cpu=1918 option indicates how much memory (in MB) we may need for each cpu in the job.

To confirm, use the following script to see how much memory and cpus we have available to us in this salloc session. Copy and paste the contents of this script in a file called get_info.py in your $HOME directory. After saved, make sure it is executable by running the following command.

chmod +x $HOME/get_info.py

#!/usr/bin/env python3

import socket
import os
from pathlib import Path
from datetime import datetime
import time

def main():

    time.sleep(5)

    print(f'Hostname: {socket.gethostname()}')

    with open("/proc/self/cgroup") as file:
        for line in file:
            if 'cpuset' in line:
                cpu_loc = "cpuset" + line.split(":")[2].strip()

            if 'memory' in line:
                mem_loc = "memory" + line.split(":")[2].strip()

    base_loc = Path("/sys/fs/cgroup/")
    with open(base_loc / cpu_loc / "cpuset.cpus") as file:
        num_cpu_sets = file.read().strip().split(",")
        num_cpus = 0
        for s in num_cpu_sets:
            if len(s.split("-")) == 1:
                num_cpus += 1
            else:
                num_cpus += (int(s.split("-")[1]) - int(s.split("-")[0]) + 1)

        print(f"CPUs: {num_cpus}")

    with open(base_loc / mem_loc / "memory.limit_in_bytes") as file:
        mem_in_bytes = int(file.read().strip())
        print(f"Memory: {mem_in_bytes/1024**2} Mbs")

if __name__ == "__main__":
    print(f'started at: {datetime.now()}')
    main()
    print(f'finished at: {datetime.now()}')

To use it.

~/get_info.py

For this question, add a screenshot of running hostname on the salloc session, as well as ~/get_info.py to your notebook.

Items to submit

Code used to solve this problem.
Output from running the code.

Question 2

salloc can be useful, but most of the time we want to run a job.

Before we get started, read the answer to this stackoverflow post. In many instances, it is easiest to use 1 cpu per task, and let SLURM distribute those tasks to run. In this course, we will use this simplified model.

So what is the difference between srun and sbatch? This stackoverflow post does a pretty great job explaining. You can think of sbatch as the tool for submitting a job script for execution, and srun as the tool to submit a job to run. We will test out both!

In the previous question, we used salloc to get the resources, hop onto the system, and run hostname along with our get_info.py script.

Use srun to run our get_info.py script, to better understand how the various options work. Try and guess the results of the script for each configuration.

Be sure to give you get_info.py script execution permissions if you haven’t already.

chmod +x get_info.py

When inside a SLURM job, a variety of environment variables are set that alters how srun behaves. If you open a terminal from within Jupyter Lab and run the following, you will see.

env | grep -i slurm

These variables altered the behavior of srun. We can however, unset these variables, and the behavior will revert to the default behavior. In your terminal, run the following.

for i in $(env | awk -F= '/SLURM/ {print $1}'); do unset $i; done;

Confirm that the environment variables are unset by running the following.

env | grep -i slurm

You must repeat this process each new terminal you’d like to use within Jupyter Lab. This means that if you work on this project a while, and reopen it the next day to work on, you will need to repeat the bash command to remove the SLURM environment variables.

Great! Now, we can work in our nice Jupyter Lab environment without any concern that SLURM environment variables are changing any behaviors. Let’s test it out with something actually predictable.

first set of configurations to try

srun -A cis220051 -p shared -n 2 -c 1 -t 00:00:05 $HOME/get_info.py
srun -A cis220051 -p shared -n 1 -c 2 -t 00:00:05 $HOME/get_info.py

Note that when using -n 2 -c 1 it will create 2 tasks that each run $HOME/get_info.py. Since there is 1 cpu per task, we get 2 cpus. If we used -n 1 -c 2, we would get 1 tasks that runs $HOME/get_info.py and since we requested 2 cpus per task, we get 2 cpus.

second set of configurations to try

srun -A cis220051 -p shared -n 1 -c 2 --mem=1918 -t 00:00:05 $HOME/get_info.py
srun -A cis220051 -p shared -n 1 -c 2 --mem-per-cpu=1918 -t 00:00:05 $HOME/get_info.py
srun -A cis220051 -p shared -n 2 -c 1 --mem-per-cpu=1918 -t 00:00:05 $HOME/get_info.py

Note how --mem=1918 requests a total of 1918 MB of memory for the job. We end up getting the expected amount of memory for the last two srun commands as well.

third set of configurations to try

srun -A cis220051 -p shared -n 1 -c 2 --mem-per-cpu=1918 -t 00:00:05 $HOME/get_info.py
srun -A cis220051 -p shared -n 1 -c 2 --mem-per-cpu=1919 -t 00:00:05 $HOME/get_info.py

Here, take careful note that when we increase our memory per cpu from 1918 to 1919 something important happens — we are granted double the CPUs we requested! This is because, SLURM on Anvil is configured to give us at max 1918 MB of memory per CPU. If you request more memory, you will be granted additional CPUs. This is why ondemand.anvil.rcac.purdue.edu was recently configured to request only the number of cores you want — because if you requested 1 core, but 4 GB of memory, you would get 3 cores, but only 4GB of memory, when you could have received 1918*3 = 5754 MB of memory instead of just 4 GB.

fourth set of configurations to try

srun -A cis220051 -p shared -n 3 -c 1 -t 00:00:05 $HOME/get_info.py > $SCRATCH/get_info.out

Check out the get_info.py script. SLURM on Anvil uses cgroups to manage resources. Some of the more typical commands used to find the number of cpus and amount of memory don’t work accurately when "within" a cgroup. This script figures out which cgroups you are "in" and parses the appropriate files to get your resource limitations.

Reading the explanation from SLURM’s website is likely not enough to understand, running the configurations will help your understanding. If you have simple, parallel processes, that doesn’t need to have any shared state, you can use a single srun per task. Each with --mem-per-cpu (so memory availability is more predictable), -n 1, -c 1, followed by & (just a reminder that & at the end of a bash command puts the process in the background).

Finally, take note of the last configuration. What is the $SCRATCH environment variable?

For the answer to this question:

Add a screenshot of the results of some (not all) of you running the get_info.py script in the srun commands.
Write 1-2 sentences about any observations.
Include what the $SCRATCH environment variable is.

Items to submit

Code used to solve this problem.
Output from running the code.

Question 3

The following is a solid template for a job script.

my_job.sh

#!/bin/bash
#SBATCH --account=cis220051 (1)
#SBATCH --partition=shared (2)
#SBATCH --job-name=serial_job_test (3)
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL) (4)
#SBATCH [email protected]     # Where to send mail (5)
#SBATCH --ntasks=3                    # Number of tasks (total) (6)
#SBATCH --cpus-per-task=1             # Number of CPUs per task (7)
#SBATCH -o /dev/null                  # Output to dev null (8)
#SBATCH -e /dev/null                  # Error to dev null (9)

srun -n 1 -c 1 --mem-per-cpu=1918 --exact -t 00:00:05 $HOME/get_info.py > first.out & (10)
srun -n 1 -c 2 --mem-per-cpu=1918 --exact -t 00:00:05 $HOME/get_info.py > second.out & (11)
srun -n 1 -c 3 --mem-per-cpu=1918 --exact -t 00:00:05 $HOME/get_info.py > third.out & (12)

wait (13)

1	Sets the account to use for billing — in this case our account is cis220051.
2	Sets the partition to use — in this case we are using the shared partition.
3	Give your job a unique name so you can identify it in the queue.
4	Specify when you want to receive emails about your job. We have it set to notify us when the job ends or fails.
5	Specify the email address to send the emails to.
6	Specify the number of tasks, in total, to run within this job.
7	Specify the number of cores to use for each task.
8	Redirect the output of the job to `/dev/null`. This is a special file that discards all output. You could change this to `$HOME/output.txt` and the contents would be written to that file.
9	Redirect the error output of the job to `/dev/null`. This is a special file that discards all output. You could change this to `$HOME/error.txt` and the contents would be written to that file.
10	The first step of the job. This step contains a single task, that uses a single core.
11	The second step of the job. This step contains a single task, that uses two cores.
12	The third step of the job. This step contains a single task, that uses three cores.
13	Wait for all steps to complete. Very important to include.

Update the template to give your job a unique name, and to set the email to your Purdue email address.

To submit a job, run the following.

sbatch my_job.sh

Run the following experiments by tweaking my_job.sh, submitting the job using sbatch, and then checking the output of first.out, second.out, and third.out.

Run the original job script and note the time each of the steps finished relative to the other steps.
Change the job script --cpus-per-task from 1 to 2. What happens to the finish times?
Remove --exact from each of the job steps. What happens to the finish times?

In addition, please feel free to experiment with the various values, and see how the values effect the finish times and/or output of our get_info.py script. Can you determine how things work? Write 1-2 sentences about your observations. Please do take the time to iterate on this question over and over until you get a good feel for how things work.

Items to submit

Code used to solve this problem.
Output from running the code.

Question 4

Make your job script run for at least 20 seconds — you can do this by adding more steps, reducing cpus, or modifying the time.sleep call in the get_info.py script. Submit the job using sbatch. Immediately after submitting the job, use the built in squeue command, in combination with grep to find the job id of your job.

What is squeue? Here are the docs.

Items to submit

Code used to solve this problem.
Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.

1. https://en.wikipedia.org/wiki/Slurm_Workload_Manager