Browse Source

Add job array sections

Jim Madge 3 years ago
parent
commit
c685dcd5cb
1 changed files with 85 additions and 2 deletions
  1. 85 2
      docs/hpc.md

+ 85 - 2
docs/hpc.md

@@ -33,8 +33,9 @@ display
 - Processor and memory clock speeds in MHz
 - PCIe throughput input (Rx) and output (Tx) in MB/s
 
-Every 300 seconds this information will be saved to a file named using the SLURM
-array job and task IDs as discussed in [the SLURM section](#slurm)
+Every 300 seconds this information will be saved to a file named using the
+SLURM array job and task IDs as discussed in [the SLURM
+section](#parametrising-job-arrays)
 
 This job is sent to the background and stopped after the `$command` has run.
 
@@ -72,3 +73,85 @@ In particular interest to users are
 This section does not aim to be a comprehensive guide to Slurm, or even a brief
 introduction. Instead, it is intended to provide suggestions and a template for
 running this projects workflows on a cluster with Slurm.
+
+### Requesting GPUs
+
+### Benchmarking
+
+### Using scratch space
+
+### Repeated runs (job arrays)
+
+If you are assessing a systems performance you will likely want to repeat the
+same calculation a number of times until you are satisfied with you estimate of
+mean performance. It would be possible to simply repeatedly submit the same job
+and many people are tempted to engineer their own scripts to do so. However,
+Slurm provides a way to submit groups of jobs that you will most likely find
+more convenient.
+
+When submitting a job with `sbatch` you can specify the size of your job array
+with the `--array=` flag using a range of numbers *e.g* `0-9` or a comma
+separated list *e.g.* `1,2,3`. You can use `:` with a range to specify a stride,
+for example `1-5:2` is equivalent to `1,3,5`. You may also specify the maximum
+number of jobs from an array that may run simultaneously using `%` *e.g.*
+`0-31%4`.
+
+Here are some examples
+
+```bash
+# Submit 10 jobs with indices 1,2,3,..,10
+sbatch --array=1-10 script.sh
+
+# Submit 5 jobs with indices 1, 4, 8, 12, 16 and at most two of these running
+# simultaneously
+sbatch --array=1-16:4%2 script.sh
+```
+
+### Parametrising job arrays
+
+One particularly powerful way to use job arrays is through parametrising the
+individual tasks. For example, this could be used to sweep over a set of input
+parameters or data sets. As with using job array for repeating jobs, this will
+likely be more convenient than implementing your own solution.
+
+Within your batch script you will have access to the following environment
+variables
+
+| environment variable      | value                    |
+|---------------------------|--------------------------|
+| `SLURM_ARRAY_JOB_ID`      | job id of the first task |
+| `SLURM_ARRAY_TASK_ID`     | current task index       |
+| `SLURM_ARRAY_TASK_COUNT ` | total number of tasks    |
+| `SLURM_ARRAY_TASK_MAX`    | the highest index value  |
+| `SLURM_ARRAY_TASK_MIN`    | the lowest index value   |
+
+For example, if you submitted a job array with the command
+
+```bash
+$ sbatch --array=0-12:4 script.sh
+Submitted batch job 42
+```
+
+then the job id of the first task is `42` and the four jobs will have
+`SLURM_ARRAY_JOB_ID`, `SLURM_ARRAY_TASK_ID` pairs of
+
+- 42, 0
+- 42, 4
+- 42, 8
+- 42, 12
+
+The environment variables can be used in your commands. For example
+
+```bash
+my_program -n $SLURM_ARRAY_TASK_ID -o output_${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}
+```
+
+with the same submission command would execute the following commands (one in
+each job)
+
+- `my_program -n 0 -o output_42_0`
+- `my_program -n 4 -o output_42_4`
+- `my_program -n 8 -o output_42_8`
+- `my_program -n 12 -o output_42_12`
+
+### Template