Optimizing with gcc for compute node types
When using gcc for compiling code, usually one uses -O2
or -O3
for
optimized builds and usually one leaves it at that. But if you have a
program which should try to get the last bits of optimization out of
the machine you are running on, you may find the -march=native
option mentioned on the inter-webs. While this works possibly nicely
on your local machine, using this on Atlas will optimize your code for
the machine you are building the programs - usually a submit host -
possibly rendering it useless for most of the compute nodes as it may
simply crash there.
Finding out what CPU models are online
Given that we only want to optimize for the nodes’ CPUs, we can first find out which CPUs we have online within Atlas by running
condor_status -const PartitionableSlot -af:V lscpu_model_name CpuModelNumber |\
sort | uniq -c | sort -n
1 "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz" 79
41 "Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz" 85
49 "Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz" 62
230 "Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz" 60
337 "AMD EPYC 7452 32-Core Processor" 49
546 "Intel(R) Xeon(R) CPU E5-2658 v4 @ 2.30GHz" 79
1581 "Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz" 60
The output shows, what our potential target CPUs are, how many of these are online (first column) and which CPU model number they have (final column), an information which will be useful later-on.
Using gcc’s native compile option
When compiling with gcc
for the local host, using -march=native
will automatically switch on a lot of extra flags, helping the
compiler to optimize for cache sizes and extra operations suitable for
this particular CPU. Thanks to this answer on
stackoverflow.com, you can
display all the flags via
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/lib/gcc/x86_64-linux-gnu/8/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu -
-march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16
-msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4
-mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx
-mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed
-mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er
-mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec
-mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma
-mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx
-mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2
-mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri
-mno-movdir64b --param l1-cache-size=32 --param l1-cache-line-size=64
--param l2-cache-size=35840 -mtune=broadwell
(this is the result for a E5-2658 with CPU model 79
)
This can now be used to build your program optimized for one, some or
all of our compute nodes, but please ensure you mark the resulting
binaries accordingly. You could install everything under a directory
named build-for-cpu-79
or if you just had one binary, you could
embed the cpu model number directly, e.g. my_exec_model_79
.
Obtaining flags for all architectures
Assuming you can log into the cluster nodes via ssh, the attached
script
should find out which CPU models are currently online, log into one
machine of each type and create files in the current directory named
gcc_optim_model_XX
where XX is the CPU model number.
The content of these files can now be used for the CFLAGS
environment variable for your builds or even directly, e.g. gcc $(cat
gcc_optim_model_79) -O3 program.c -o my_exec_model_79
.
Setting up the submit file for Condor
In your submit file, you can reference any of Condor’s machine
attributes for matching or other purposes. Assuming, you created the
binaries for our largest number of compute node CPUs (49, 60 and 79)
and named them my_exec_model_49
, my_exec_model_60
and
my_exec_model_79
, respectively, you can now use those binaries on
specifically targeted machines by adding an extra requirement:
Executable = my_exec_model_$(CpuModelNumber)
Requirements = (CpuModelNumber == 49 || CpuModelNumber == 60 || CpuModelNumber == 79 )
(obviously, these are just the two lines in your submit file which are about this problem, you will a few more).