Welcome to the atools (Job Array Tools) documentation

atools has been designed to conveniently deal with job arrays, a feature supported by many queue systems and schedulers. A job array consists of a (potentially large) number of individual tasks that can be run in parallel, independent of one another.

Typically, these tasks originate from a few scenarios such as

  • performing the same computation on many input files, or
  • running an algorithm with many different parameter sets.

atools in combination with a queue system or scheduler will allow you to conveniently handle such MapReduce scenarios without the overhead, both in terms of computation and setup of systems such as Hadoop or Spark.

Currently, atools supports PBS torque, Adaptive Computing Moab, SUN Grid Engine and Slurm workload manager, but extending the list to other resource managers and schedulers should be easy if they support a feature similar in spirit to job arrays.

This documentation provides a walk through of the features, and serves as a reference for the more arcane features. Topics:

  • adding atools features using templates, (using acreate)
  • instantiating parameter values per task (using aenv),
  • logging task start and completion information (using alog),
  • resuming computations if not all tasks were completed (using arange),
  • aggregating output generated by the tasks (using areduce,
  • analyzing task run times and load balance (using aload).

atools is an open source project hosted on GitHub.