What are job arrays?
A resource manager or scheduler that support job arrays typically exposes a task identifier to the job script as an environment variable. This is simply a number out of a range specified when the job is submitted.
For the resource managers and schedulers supported by atools
, that would
be
* PBS_ARRAYID
for PBS torque,
* MOAB_JOBARRAYINDEX
for Adaptive's Moab, and
* SGE_TASKID
for SUN Grid Engine (SGE),
* SLURM_ARRAY_TASK_ID
for Slurm workload manager.
Typically, this task identifier is then use to determine, e.g., the specific input file for this task in the job script:
...
INPUT_FILE="input-${PBS_ARRAYID}.csv"
...
Submitting arrays jobs is quite simple. For each of the supported queue
systems and schedulers, one simply adds the -t <int-range>
options to
the submission command, qsub
for PBS torque and SUN grid engine, msub
for Moab, e.g., for PBS torque:
$ qsub -t 1-250 bootstrap.pbs
The submission command above would create a job array of 250 tasks, and
for each the PBS_ARRAYID
environment variable would be assigned a unique
value between 1 and 250, inclusive.
Although job arrays provide sufficient features for simple scenarios, it
quickly becomes a nuisance for more sophisticated problems, especially in
parameter exploration type computations. atools
aims to eliminate as
much as possible of the boiler plate code you have to write over and over
again.