EPYC BIOS & OS Tuning

Introduction

AMD EPYC server processors can have some parameters tweaked in order to obtain the maximal performance they’re capable of. Tuning the BIOS and OS parameters yields benefits for both StorPool and the compute workload being executed.

BIOS configuration

There are two main points that must be configured in the BIOS to allow StorPool to take advantage of the features an EPYC CPU offers. The first is to raise its power limits, allowing the CPU to reach its maximum frequency. The other point is to allow the CPU C-State so that the CPU can go idle and not waste electricity when there is no meaningful work to be done.

Please note that the options of interest might be named differently in each vendor’s BIOS.

Please follow the steps below to configure the host’s BIOS:

SuperMicro

  1. Load the optimized default settings by navigating to Save & Exit and selecting Load optimized defaults.

  2. Save all changes.

  3. Reboot the host and enter the BIOS setup.

  4. Set the following settings:

    1. Advanced -> ACPI settings -> NUMA Layout [NPS1].

      Warning

      For single socket systems only. Not tested for dual-socket and not recommended.

      Using a 1 NUMA node topology, the CPU interleaves access to the available memory channels. NPS4 provides a small performance benefit, but it is considered too small of a benefit to justify the inconvenience of having more than 1 NUMA node per host. However, if the compute workload is NUMA-aware and is found to benefit from a more detailed NUMA topology, setting this to NPS4 isn’t considered harmful.

    2. Advanced -> NB configuration -> Determinism slider [Manual], Determinism value [Power].

      This allows the CPU to utilize all available electrical power, thus maintaining a higher frequency among all cores at the cost of an increase of 15-20W of electrical power consumed when under heavy compute load.

    3. Advanced -> NB configuration -> cTDP [Manual], cTDP Value [the max of the CPU].

      The default cTDP value is lower than the maximum supported by the CPU by default. For example, on an EPYC 7543P, the default value is 225W, while the maximum is 240W. Raising it to the maximum provides an increase in the maximum frequency at the expense of consuming 10-20W more. Again, the exact amount of increase in power consumption varies depending on the workload scheduled.

    4. Advanced -> NB configuration -> Preferred IO Bus [Enabled], IO Bus [the PCI bus of the storage NIC in hex].

      Note

      Please note that this setting is only available for EPYC Gen 2 and Gen 3 CPUs.

      Configuring this setting is optional and only possible if StorPool uses a single physical NIC. This setting allows the IO Hub in the IO die to synchronize its Link clock to the storage NIC, eliminating the need to buffer data because of the different clock signal speeds.

  5. Save all changes.

  6. Reboot the host.

HPE

  1. Load the optimized default settings by navigating to Save & Exit and selecting Load optimized defaults.

  2. Save all changes.

  3. Reboot the host and enter the BIOS setup.

  4. Set the following settings:

    1. Workload profile [Custom].

      This allows modifying settings individually.

    2. Processor options -> Determinism control [Manual], Performance determinism [Power deterministic].

      This allows the CPU to utilize all available electrical power, thus maintaining a higher frequency among all cores at the cost of an increase of 15-20W of electrical power consumed when under heavy compute load.

    3. Power and Performance options -> Power regulator [OS control mode], Collaborative power control [Enabled].

      This allows the Linux kernel to control the CPU frequency boost more precisely, in unison with the actual load the system is experiencing.

    4. Power and Performance options -> Advanced power options -> Package power limit control [Manual], Package power limit value [max of the CPU model].

      HPE BIOS doesn’t have a setting which can control the CPU’s cTDP. However, AMD recommends setting the cTDP and Package power limit to the same value. Hence, setting the Package power limit has the same effect as setting the cTDP value. For example, on an EPYC 9554 the maximum allowed cTDP value is 400W, so the Package power limit value should be set to 400.

  5. Save all changes.

  6. Reboot the host.

OS configuration

The Linux kernel must have the amd_pstate=guided parameter added to its command line. This allows the kernel to relay precisely how much the CPU must boost its frequency.

Validating

Some settings can be validated by examining the output of various cpupower commands.

Power settings

Checking if AMD CPPC is enabled can be verified using the cpupower frequency-info command. The essential here is the driver: amd-pstate line:

[root@sof10 ~]# cpupower frequency-info
analyzing CPU 0:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 2.18 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 2.18 GHz.
               The governor "performance" may decide which speed to use
               within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 3.34 GHz (asserted by call to kernel)
  boost state support:
    Supported: no
    Active: no
    AMD PSTATE Highest Performance: 166. Maximum Frequency: 2.18 GHz.
    AMD PSTATE Nominal Performance: 190. Nominal Frequency: 2.50 GHz.
    AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
    AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.

C-States

Checking if C-State are enabled can be verified using the cpupower idle-info command and examining its output. The essential lines here are the presence of the C1 state, which must have a Latency value of 1 (microsecond) and must contain the INTEL MWAIT string inside the Flags field:

[root@sof10 ~]# cpupower idle-info
CPUidle driver: acpi_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 3
Available idle states: POLL C1 C2
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 61621088
Duration: 1063210860
C1:
Flags/Description: ACPI FFH INTEL MWAIT 0x0
Latency: 1
Usage: 9581519391
Duration: 623142966924
C2:
Flags/Description: ACPI IOPORT 0x814
Latency: 400
Usage: 1906
Duration: 2788004

References

  1. AMD EPYC 7003 optimization guide: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/amd-epyc-7003-tg-workload-57011.pdf

  2. Linux amd-pstate driver documentation: https://docs.kernel.org/admin-guide/pm/amd-pstate.html