Currently, the Wynton HPC cluster has in total member.qtotal = 7023 slots available on the member.q queue. Jobs on the member.q queue will launch and finish sooner than jobs on the communal, lower-priority long.q queue. A member.q job will have higher-priority on the CPU than a long.q job in case they run on the same compute node. It is only contributing members who have access to the member.q queue - non-contributing members will only have access to queues such as the long.q queue. Contributors get non-expiring, lifetime access to a number of these member.q slots in proportion to their hardware contribution to the cluster. The number of member.q slots a particular hardware contribution, which can be monetary(*) or physical(*), adds, is based on how much compute power the contribution adds to the cluster. The amount of compute power that contributed hardware adds is based on benchmarking(*), which result in a processing-unit score (PU) for the contribution. Currently, there are in total PUtotal = 20186 contributed processing units on Wynton HPC.
As other labs contribute to the cluster, the total computer power (PUtotal) and the total number of member.q slots (member.qtotal) will increase over time. This will result in the lab’s relative compute share (PUlab / PUtotal) to decrease over time while their number of member.q slots (member.qlab) will stay approximately(**) the same.
Assume that the last addition was from the Charlie Lab contributing 4 compute nodes. Each of these machines has a 12-core 2.2 GHz Opteron 6174 CPU and clocks in at 1.6 PUs based on the benchmarking, resulting in the processing power added for this lab, but also to the cluster as a whole, to be 4 * 1.6 PUs = +6.4 PUs. In addition to increasing the total amount of contributed PUs, the lab’s contribution also increased the total number of member.q slots on the cluster by 4 * 12 = +48 slots.
If this was Charlie Lab’s first contribution to Wynton, their share on the member.q queue will be PUlab / PUtotal = 6.4 / 20186 = 0.032%. This PU share translates to member.qlab = (PUlab / PUtotal) *member.qtotal = 2 member.q slots (2.21 rounded off to the closest integer). Instead, if they already had contributed, say, in total 16.3 PUs in the past, their computational share would had become PUlab = (16.3 + 6.4) / 20186 = 0.112%, which, would corresponds to 8 member.q slots (7.85 rounded off).
Below table shows the current amount of contributions in terms of Processing Units (PU) and the corresponding number of member.q slots per contributing lab.
Source: compute_shares.tsv produced on . These data were compiled from the current SGE configuration (
qconf -srqs member_queue_limits and
qconf -sprj <project>). In SGE terms, a processing unit (PU) corresponds to a functional share (“fshare”).
(*) To be documented.
(**) The reason for member.qlab not remaining exactly the same when PUlab does not change, is that the compute power per core is greater for newer hardware compared with older hardware. Because of this, a lab’s number of member.q slots is likely to, ever so slightly, decrease in the long run as the cluster keeps growing. But don’t worry, as the average compute power per member.q slot increases over time, your lab’s total compute power on the member.q queue remains constant per definition (unless your lab adds further contributions).