Q. My submitted job is still in the queue - why is it not running?
A. There could be several reason why your job is not running:
-pe smp 256. If so, your job will never run. Either lower the job’s resource needs using
qdel, or, alternatively, remove the job (
qdel) and submit (
qsub) a new one with adjusted resources.
qstat -j <job_id>will provide details on why a particular job is not running.
qstat -u '*'will show all jobs and their priority scores in the queue.
A: [For QB3 migrants] If you are coming from the QB3 cluster, please make sure that your job script does not specify any of the below QB3-specific SGE resources. A job submitted with one or more of these will sit in the queue forever.
-l arch=linux-x64: this architecture does not exist on Wynton HPC. This specification can safely be dropped on Wynton HPC. (You could specify,
-l arch=lx-amd64 but that is not needed as all compute nodes now have the same architecture.)
-l database=<size>: this storage resource does not exist on Wynton HPC.
-l netapp=<size>: this storage resource does not exist on Wynton HPC. This specification can safely be dropped on Wynton HPC.
-l scrapp=<size>: this storage resource does not exist on Wynton HPC.
-l scrapp2=<size>: this storage resource does not exist on Wynton HPC.
Q. I tried to delete some jobs, and now they’re stuck in the state “dr”. How can I get rid of them?
A. The most likely cause of this is that node (or nodes) running your jobs crashed. Since the node(s) can’t report back to SGE and confirm the job deletion, the state of the jobs doesn’t change. To force the issue:
qdel -f $JOB_ID [-t $SGE_TASK_ID]
Q. I just started to get SSL-related errors when using
qstat that I have never seen before;
error: commlib error: ssl connect error (SSL handshake error) ssl error (the used certificate is expired) unable to contact qmaster using port 6444 on host "q"
A. Your Wynton account has expired. If so, you should already have received an email from us with instructions on how to request the renewal. If you have responded to that email, then it’s a mistake on our end (sorry) - please drop us another email.
Q. Is it possible to have a common folder where our lab group members can share files and software?
A1. If you belong to a specific group, we can set up a
/wynton/home/your_group/shared/ folder that group members (part of the same Unix group) have write access to. Any such files will count toward the disk quota of the user who owns the files. The typical use case is then that one or more members maintain subdirectories therein. If you need this, please drop us an email. Note, if the
groups command reports
lsd for you, then you do not belong to a specific group and can unfortunately not get a group-specific folder.
A2. Labs who purchase additional storage will get a
/wynton/group/your_group/ folder. Files written in that folder will not count toward users disk quota.
Q. Our lab would contribute to Wynton in order to increase our priority. How can we do this?
A. We welcome donations of any size. In return, your lab will receive a number of slots in member.q equivalent to the number of cores in a current Standard Node that your contribution would purchase. As of May 2020, that cost is $170 per slot.
Q. Our lab has some old nodes we’d like to contribute to Wynton in return for priority. Will you take them?
A. Please get in touch with the Wynton team.