When using gcc for compiling code, usually one uses -O2 or -O3 for optimized builds and usually one leaves it at that. But if you have a program which should try to get the last bits of optimization out of the machine you are running on, you may find the -march=native option mentioned on the inter-webs. While this works possibly nicely on your local machine, using this on Atlas will optimize your code for the machine you are building the programs - usually a submit host - possibly rendering it useless for most of the compute nodes as it may simply crash there.

Finding out what CPU models are online

Given that we only want to optimize for the nodes’ CPUs, we can first find out which CPUs we have online within Atlas by running

 condor_status -const PartitionableSlot -af:V lscpu_model_name CpuModelNumber |\
  sort | uniq -c | sort -n
 1 "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz" 79
     41 "Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz" 85
     49 "Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz" 62
    230 "Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz" 60
    337 "AMD EPYC 7452 32-Core Processor" 49
    546 "Intel(R) Xeon(R) CPU E5-2658 v4 @ 2.30GHz" 79
   1581 "Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz" 60

The output shows, what our potential target CPUs are, how many of these are online (first column) and which CPU model number they have (final column), an information which will be useful later-on.

Using gcc’s native compile option

When compiling with gcc for the local host, using -march=native will automatically switch on a lot of extra flags, helping the compiler to optimize for cache sizes and extra operations suitable for this particular CPU. Thanks to this answer on stackoverflow.com, you can display all the flags via

gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
 /usr/lib/gcc/x86_64-linux-gnu/8/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu -
 -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16
 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4
 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx
 -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed
 -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er
 -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec
 -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma
 -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx
 -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2
 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri
 -mno-movdir64b --param l1-cache-size=32 --param l1-cache-line-size=64
 --param l2-cache-size=35840 -mtune=broadwell

(this is the result for a E5-2658 with CPU model 79)

This can now be used to build your program optimized for one, some or all of our compute nodes, but please ensure you mark the resulting binaries accordingly. You could install everything under a directory named build-for-cpu-79 or if you just had one binary, you could embed the cpu model number directly, e.g. my_exec_model_79.

Obtaining flags for all architectures

Assuming you can log into the cluster nodes via ssh, the attached script should find out which CPU models are currently online, log into one machine of each type and create files in the current directory named gcc_optim_model_XX where XX is the CPU model number.

The content of these files can now be used for the CFLAGS environment variable for your builds or even directly, e.g. gcc $(cat gcc_optim_model_79) -O3 program.c -o my_exec_model_79.

Setting up the submit file for Condor

In your submit file, you can reference any of Condor’s machine attributes for matching or other purposes. Assuming, you created the binaries for our largest number of compute node CPUs (49, 60 and 79) and named them my_exec_model_49, my_exec_model_60 and my_exec_model_79, respectively, you can now use those binaries on specifically targeted machines by adding an extra requirement:

Executable = my_exec_model_$(CpuModelNumber)
Requirements = (CpuModelNumber == 49 || CpuModelNumber == 60 || CpuModelNumber == 79 )

(obviously, these are just the two lines in your submit file which are about this problem, you will a few more).