Here is what a typical job submission of shell script
script.sh would look like from the command line:
qsub -cwd -pe smp 4 -l mem_free=2G -l scratch=50G -l h_rt=00:20:00 script.sh
This job submission will submit
script.sh to the job scheduler which will eventually launch the job on one the compute nodes that can meet the resource needs of the job. Exactly, what these options are is explained below sections, but in summary, the above will result in:
-cwd: the working directory will be set to the same directory as from where the submission was done
-pe smp 4: the job will be allotted four slots (“cores”) on a single machine
-l mem_free=2G: the job will be allotted 2 GiB of RAM per slot, i.e. 8 GiB in total
-l scratch=50G: the job will be launched on a compute node with at least 50 GiB of local
-l h_rt=00:20:00: the scheduler knows that the job to run no longer than 20 minutes allowing it to be scheduled much sooner than if no run-time was specified
script.sh: the shell script to be run
To submit a shell script to the scheduler such that it will run in the current working directory (
qsub -cwd script.sh
The scheduler will assign your job a unique (numeric) job ID.
Unless specified, the maximum amount of memory used at any time is 1 GiB per slot (
-l mem_free=1G). A job that need to use more memory, need to request that when submitted. For example, a job that needs (at most) 10 GiB of memory should be submitted as:
qsub -cwd -l mem_free=10G script.sh
The scheduler will launch this jobs on the first available compute node with that amount of memory available.
qstat -j $JOB_ID to the end of your script to find out how much memory and CPU time your job needed. See Job Summary page for more details.
-l mem_free=sizespecifies memory per slot, not per job.
By specifying the how long each job will take, the better the scheduler can manage resources and allocate jobs to different nodes. This will also decrease the average waiting time the job will sit in the queue before being launched on a compute node. You can specify the maximum run time (= wall time, not CPU time) for a job using option
-l h_rt=HH:MM:SS where
HH:MM:SS specifies the number of hours (
HH), the number of minutes (
MM), and the number of seconds (
SS) - all parts must be specified. For instance, the following job is expected to run for at most 3 minutes (180 seconds):
qsub -cwd -l mem_free=2G -l h_rt=00:03:00 script.sh
Each compute node has 0.1-1.8 TiB of local scratch storage which is fast and ideal for temporary, intermediate data files that are only needed for the length of a job. This scratch storage is unique to each machine and shared among all users and jobs running on the same machine. To minimize the risk of launching a job on a node that have little scratch space left, specify the
-l scratch=size resource. For instance, if your job requires 200 GiB of local
/scratch space, submit the job using:
qsub -cwd -l scratch=200G script.sh
Your job is only guaranteed the amount of available scratch space that you request when it is launched. For more information and best practices, see Using Local /scratch on Compute Nodes.
-l scratch=sizewhen using local
/scratchand please cleanup afterward. This maximizes the chance for compute nodes having enough available space, reduces the queuing times, and minimizes the risk for running out of local scratch.
-l scratch=sizespecifies space per job, not per slot.
If your job would benefit from extra-fast local scratch storage, then you can request a node with either a SSD or NVMe scratch drive via the following flag:
qsub -l ssd_scratch=1
The scheduler will allocate a single core for your job. To allow the job to use multiple slots, request the number of slots needed when you submit the job. For instance, to request four slots (
NSLOTS=4) each with 2 GiB of RAM, for a total of 8 GiB RAM, use:
qsub -pe smp 4 -l mem_free=2G script.sh
The scheduler will make sure your job is launched on a node with at least four slots available.
Note, when writing your script, use SGE environment variable
NSLOTS, which is set to the number of cores that your job was allocated. This way you don’t have to update your script if you request a different number of cores. For instance, if your script runs the BWA alignment, have it specify the number of parallel threads as:
bwa aln -t $NSLOTS ...
Comment: PE stands for ‘Parallel environment’. SMP stands for ‘Symmetric multiprocessing’ and indicates that the job will run on a single machine using one or more cores.
NSLOTSavoids this problem. Another problem is software that by default use all of the machine's cores - make sure to control for this, e.g. use dedicated command-line option or environment variable for that software.
The majority of the compute nodes have 1 Gbps and 10 Gbps network cards while a few got 40 Gbps cards. A job that requires 10-40 Gbps network speed can request this by specifying the
eth_speed=10 (sic!) resource, e.g.
qsub -cwd -l eth_speed=10 script.sh
A job requesting
eth_speed=40 will end up on a 40 Gbps node, and a job requesting
eth_speed=1 (default) will end up on any node.
You can pass arguments to a job script similarly to how one passes argument to a script executed on the command line, e.g.
qsub -cwd -l mem_free=1G script.sh --first=2 --second=true --third='"some value"' --debug
Arguments are then passed as if you called the script as
script.sh --first=2 --second=true --third="some value" --debug. Note how you have to have an extra layer of single quotes around
"some value", otherwise
script.sh will see
--third=some value as two independent arguments (
It is currently not possible to request interactive jobs (aka
qlogin). Instead, there are dedicated development nodes that can be used for short-term interactive development needs such building software and prototyping scripts before submitting them to the scheduler.
Wynton provides a special MPI parallel environment (PE) called
mpi-8 that allocates exactly eight (8) slots per node across one or more compute nodes. For instance, to request a Hybrid MPI job with in total forty slots (
NSLOTS=40), submit it as:
qsub -pe mpi-8 40 hybrid_mpi.sh
and make sure that the script (here
OMP_NUM_THREADS=8 (the eight slots per node) and then launches the MPI application using
mpirun -np $NHOSTS /path/to/the_app where
NHOSTS is automatically set by SGE (here
#! /usr/bin/env bash #$ -cwd ## SGE directive to run in the current working directory export OMP_NUM_THREADS=8 mpirun -np $NHOSTS /path/to/the_app
NSLOTSis not a multiple of eight, then the job will be stuck in the queue forever and never run.
Comment: MPI stands for ‘Message Passing Interface’.
For further options and advanced usage, see Advanced Usage of the scheduler.