What are job arrays?
A resource manager or scheduler that support job arrays typically exposes a task identifier to the job script as an environment variable. This is simply a number out of a range specified when the job is submitted.
For the resource managers and schedulers supported by atools, that would
be
PBS_ARRAYIDfor PBS torque,MOAB_JOBARRAYINDEXfor Adaptive's Moab,SGE_TASKIDfor SUN Grid Engine (SGE), andSLURM_ARRAY_TASK_IDfor Slurm workload manager.
Typically, this task identifier is then use to determine, e.g., the specific input file for this task in the Slurm job script:
...
INPUT_FILE="input-${SLURM_ARRAY_TASK_ID}.csv"
...
Similarly, for a PBS Torque job script:
...
INPUT_FILE="input-${PBS_ARRAYID}.csv"
...
Submitting arrays jobs is quite simple. For each of the supported queue
systems and schedulers, one simply adds the -t <int-range> options to
the submission command, qsub for PBS torque, SUN grid engine, msub
for Moab and --array=<int-range> to sbatch for Slurm, e.g., for Slurm:
$ sbatch --array=1-250 jobscript.slurm
Similarly, for PBS torque:
$ qsub -t 1-250 jobscript.pbs
The submission command above would create a job array of 250 tasks, and
for each the SLURM_ARRAY_TASK_ID or the PBS_ARRAYID environment variable
would be assigned a unique value between 1 and 250, inclusive.
Although job arrays provide sufficient features for simple scenarios, it
quickly becomes a nuisance for more sophisticated problems, especially in
parameter exploration type computations. atools aims to eliminate as
much as possible of the boiler plate code you have to write over and over
again.