Submit Jobs

Here is what a typical job submission of shell script would look like from the command line:

qsub -cwd -pe smp 4 -l mem_free=2G -l scratch=50G -l h_rt=00:20:00

This job submission will submit to the job scheduler which will eventually launch the job on one the compute nodes that can meet the resource needs of the job. Exactly, what these options are is explained below sections, but in summary, the above will result in:

Sample submit script

Before you can submit jobs to the compute nodes, you should prepare a script like the one below. Split your jobs into smaller tasks varying only in input parameters. You can then submit the jobs from a login node or a dev node. (Note: do not include the #-- comments in your script - that won’t work.)

#!/bin/bash                        #-- what is the language of this shell
#                                  #-- Any line that starts with #$ is an instruction to SGE
#$ -S /bin/bash                    #-- the shell for the job
#$ -o [dir]                        #-- output directory (fill in)
#$ -e [dir]                        #-- error directory (fill in)
#$ -cwd                            #-- tell the job that it should start in your working directory
#$ -r y                            #-- tell the system that if a job crashes, it should be restarted
#$ -j y                            #-- tell the system that the STDERR and STDOUT should be joined
#$ -l mem_free=1G                  #-- submits on nodes with enough free memory (required)
#$ -l scratch=1G                   #-- SGE resources (home and scratch disks)
#$ -l h_rt=24:00:00                #-- runtime limit (see above; this requests 24 hours)
##$ -t 1-10                        #-- remove first '#' to specify the number of
                                   #-- tasks if desired (see Tips section on this page)

# Anything under here can be a bash script

# If you used the -t option above, this same script will be run for each task,
# but with $SGE_TASK_ID set to a different value each time (1-10 in this case).
# The commands below are one way to select a different input (PDB codes in
# this example) for each task.  Note that the bash arrays are indexed from 0,
# while task IDs start at 1, so the first entry in the tasks array variable
# is simply a placeholder

#tasks=(0 1bac 2xyz 3ijk 4abc 5def 6ghi 7jkl 8mno 9pqr 1stu )


## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"          # This is useful for debugging and usage purposes,
                                                  # e.g. "did my job exceed its memory request?"

Submit a script to run in the current working directory

To submit a shell script to the scheduler such that it will run in the current working directory (-cwd), use:

qsub -cwd

The scheduler will assign your job a unique (numeric) job ID.

Specifying (maximum) memory usage

Unless specified, the maximum amount of memory used at any time is 1 GiB per slot (-l mem_free=1G). A job that need to use more memory, need to request that when submitted. For example, a job that needs (at most) 10 GiB of memory should be submitted as:

qsub -cwd -l mem_free=10G

The scheduler will launch this jobs on the first available compute node with that amount of memory available.

TIPS: Add qstat -j $JOB_ID to the end of your script to find out how much memory and CPU time your job needed. See Job Summary page for more details.

Specifying (maximum) run time

By specifying the how long each job will take, the better the scheduler can manage resources and allocate jobs to different nodes. This will also decrease the average waiting time the job will sit in the queue before being launched on a compute node. You can specify the maximum run time (= wall time, not CPU time) for a job using option -l h_rt=HH:MM:SS where HH:MM:SS specifies the number of hours (HH), the number of minutes (MM), and the number of seconds (SS) - all parts must be specified. For instance, the following job is expected to run for at most 3 minutes (180 seconds):

qsub -cwd -l mem_free=2G -l h_rt=00:03:00

Using local scratch storage

Each compute node has 0.1-1.8 TiB of local scratch storage which is fast and ideal for temporary, intermediate data files that are only needed for the length of a job. This scratch storage is unique to each machine and shared among all users and jobs running on the same machine. To minimize the risk of launching a job on a node that have little scratch space left, specify the -l scratch=size resource. For instance, if your job requires 200 GiB of local /scratch space, submit the job using:

qsub -cwd -l scratch=200G

Your job is only guaranteed the amount of available scratch space that you request when it is launched. For more information and best practices, see Using Local /scratch on Compute Nodes.

If your job would benefit from extra-fast local scratch storage, then you can request a node with either a SSD or NVMe scratch drive via the following flag:

qsub -l ssd_scratch=1

Parallel processing (on a single machine)

The scheduler will allocate a single core for your job. To allow the job to use multiple slots, request the number of slots needed when you submit the job. For instance, to request four slots (NSLOTS=4) each with 2 GiB of RAM, for a total of 8 GiB RAM, use:

qsub -pe smp 4 -l mem_free=2G

The scheduler will make sure your job is launched on a node with at least four slots available.

Note, when writing your script, use SGE environment variable NSLOTS, which is set to the number of cores that your job was allocated. This way you don’t have to update your script if you request a different number of cores. For instance, if your script runs the BWA alignment, have it specify the number of parallel threads as:

bwa aln -t "${NSLOTS:-1}" ...

By using ${NSLOTS:-1}, instead of just ${NSLOTS}, this script will fall back to use a single thread if NSLOTS is not set, e.g. when running the script on your local computer.

Comment: PE stands for ‘Parallel environment’. SMP stands for ‘Symmetric multiprocessing’ and indicates that the job will run on a single machine using one or more cores.

Minimum network speed (1 Gbps, 10 Gbps, 40 Gbps)

The majority of the compute nodes have 1 Gbps and 10 Gbps network cards while a few got 40 Gbps cards. A job that requires 10-40 Gbps network speed can request this by specifying the eth_speed=10 (sic!) resource, e.g.

qsub -cwd -l eth_speed=10

A job requesting eth_speed=40 will end up on a 40 Gbps node, and a job requesting eth_speed=1 (default) will end up on any node.

Passing arguments to script

You can pass arguments to a job script similarly to how one passes argument to a script executed on the command line, e.g.

qsub -cwd -l mem_free=1G --first=2 --second=true --third='"some value"' --debug

Arguments are then passed as if you called the script as --first=2 --second=true --third="some value" --debug. Note how you have to have an extra layer of single quotes around "some value", otherwise will see --third=some value as two independent arguments (--third=some and value).

Interactive jobs

It is currently not possible to request interactive jobs (aka qlogin). Instead, there are dedicated development nodes that can be used for short-term interactive development needs such building software and prototyping scripts before submitting them to the scheduler.

MPI: Parallel processing via Hybrid MPI (multi-threaded multi-node MPI jobs)

Wynton HPC provides a special MPI parallel environment (PE) called mpi-8 that allocates exactly eight (8) slots per node across one or more compute nodes. For instance, to request a Hybrid MPI job with in total forty slots (NSLOTS=40), submit it as:

qsub -pe mpi-8 40

and make sure that the script (here exports OMP_NUM_THREADS=8 (the eight slots per node) and then launches the MPI application using mpirun -np $NHOSTS /path/to/the_app where NHOSTS is automatically set by SGE (here NHOSTS=5):

#! /usr/bin/env bash
#$ -cwd   ## SGE directive to run in the current working directory

module load mpi/openmpi-x86_64
mpirun -np $NHOSTS /path/to/the_app

Note: When working with MPI, it is important to use the exact same version as was used to built the software using MPI. Because of this, we always specify the full mpi/<version> path.

Comment: MPI stands for ‘Message Passing Interface’.


See also

For further options and advanced usage, see Advanced Usage of the scheduler.