Welcome to the atools (Job Array Tools) documentation
atools
has been designed to conveniently deal with job arrays, a feature
supported by many queue systems and schedulers. A job array consists of
a (potentially large) number of individual tasks that can be run in
parallel, independent of one another.
Typically, these tasks originate from a few scenarios such as
- performing the same computation on many input files, or
- running an algorithm with many different parameter sets.
atools
in combination with a queue system or scheduler will allow you
to conveniently handle such MapReduce scenarios without the overhead,
both in terms of computation and setup of systems such as Hadoop or
Spark.
Currently, atools
supports PBS torque, Adaptive Computing Moab, SUN Grid
Engine and Slurm workload manager, but extending the list to other resource
managers and schedulers should be easy if they support a feature similar in
spirit to job arrays.
This documentation provides a walk through of the features, and serves as a reference for the more arcane features. Topics:
- adding
atools
features using templates, (usingacreate
) - instantiating parameter values per task (using
aenv
), - logging task start and completion information (using
alog
), - resuming computations if not all tasks were completed
(using
arange
), - aggregating output generated by the tasks (using
areduce
, - analyzing task run times and load balance (using
aload
).
atools
is an open source project hosted on
GitHub.