How to find out which GPUs are installed and available?
To find out which Machines have GPUs installed you can run
condor_status -constraint 'PartitionableSlot && TotalGpus > 0' \
-af:h Machine TotalGPUs CUDADeviceName
This will output something akin to
Machine GPUs CUDADeviceName
a3102.atlas.local 2 undefined
a3104.atlas.local 1 GeForce GTX 1080 Ti
a3108.atlas.local 1 GeForce RTX 2080
a3112.atlas.local 1 GeForce GTX 1660 Ti
[...]
Here, undefined
could either mean a non CUDA capable card being
found or - as in this case - two different CUDA capable cards being
found (TITAN V and GV100).
Please note, that we currently have three
pools and
condor_status
only reflects the situation within the connected pool!
To see, which of these GPUs are currently available, run
condor_status -constraint 'PartitionableSlot && Gpus > 0'\
-af:h Machine TotalGpus GPUs CUDADeviceName Cpus Memory/1024
The difference here is, that TotalGpus
indicates the number of physically installed GPUs while Gpus
indicates how many GPUs are currently available for new Jobs.
For convenience, two additional columns were added, the number of currently available CPU cores and memory (in GiByte).