Control groups

This document gives an overview of kernel control groups (cgroups) feature, and how StorPool uses the cpuset and memory groups to optimize performance and protect the storage system from random events like out of memory states.

Kernel control groups and StorPool

Cgroups

For a good overview of cgroups, check the Description and Control Groups Version 1 sections in the cgroups manual page. For more detailed information, see the kernel cgroups documentation.

Cgexec

All StorPool services are started via the cgexec utility. It runs the service and accounts its resources in the given-by-parameters cgroups. For example, cgexec ./test -g cpuset:cpuset_cg -g memory:memory_cg will run the test binary. It would limit its cpuset resources by the limitations defined in the cpuset_cg cgroup, and its memory resources by the limitations defined in the memory_cg cgroup.

Slices

Common practice is to create cgroups with same names under different controllers. Take, for example, the cpuset and memory controllers. If one creates a test cgroup in the memory controller, and also a test cgroup in the cpuset controller, this can be considered a slice. A more appropriate name for the two cgroups would be test.slice.

Defining slices makes it easier to keep track of resources used by a process. If you think of it as ‘the process runs in the test.slice’ that implies both cpuset restrictions from cpuset:test.slice and memory restrictions from memory:test.slice.

Machines that run virtual guests have a machine.slice where all the virtual machines run, and system.slice where the system processes run. Systemd machines also have a user.slice, where all user session processes run.

StorPool and Cgroups

Machines that run StorPool also have a storpool.slice, where all StorPool core services run. On a properly configured machine all slices will have configured memory (and memory+swap) limits. This is done to guarantee two things:

  • The kernel will have enough memory to run properly (explained later).

  • The storage system (StorPool) will not suffer from OOM situation that was not triggered by it.

The cpuset controller in the storpool.slice is used to:

  • Dedicate CPUs only for StorPool (that other slices do not have access to).

  • Map the dedicated CPUs to StorPool services in a specific manner to optimize performance.

Memory configuration

For the root cgroup, memory.use_hierarchy should be set to 1, so that a hierarchical memory model is used for the cgroups.

For all slices:

  • memory.move_charge_at_immigrate should be set to at least 1, and for the storpool.slice it should be set to 3.

  • memory.limit_in_bytes and memory.memsw.limit_in_bytes should be set to the same appropriate value.

For the storpool.slice, memory.swappines should be set to 0.

Note

To ensure enough memory for the kernel, the sum of the memory limits of all slices should be at least 1G short of the total machine memory.

memory:storpool.slice has two memory subslices - common and alloc. The storpool.slice/alloc subslice is used to limit the memory usage of the mgmt, iscsi and bridge services, while the storpool.slice/common subslice is for everything else. Their memory limits should also be configured and their sum should be equal to the storpool.slice memory limit.

Cpuset configuration

Dedicated CPUs for StorPool should be set in cpuset.cpus of storpool.slice . All other slices’ cpuset.cpus should be set to have all the remaining CPUs. The cpuset.cpu_exclusive flag in storpool.slice should be set to 1 to ensure that the other slices cannot use the CPUs dedicated for StorPool.

cpuset:storpool.slice should have a subslice for each running StorPool service on the machine. So, if you have two servers, beacon, mgmt and block running, there should be storpool.slice/{server,server_1,beacon,mgmt,block} subslices. These are used to assign the services to specific CPUs. This is achieved by, for example, setting the cpuset.cpus of the storpool.slice/beacon to 2. This will restrict the storpool_beacon to run on CPU#2.

Note that for machines that do not have hardware accelerated network cards the storpool.slice will also need a CPU for the nic, but there is no nic subslice. That CPU must not be in any of the subslices (should be left empty).

For storpool.slice and each of it subslices, the cpuset.mems option should be set to all available NUMA nodes on the machine (for example, 0-3 on a 4 NUMA nodes machine).

Cgroup configuration

To make Cgroup configurations persistent, set them in the configuration files in /etc/cgconfig.d/ and reboot. When the machine boots, the cgconfig service runs and applies the configuration using cgconfigparser.

Writing configuration files for cgconfig by hand could be a nasty and ugly job. It is recommended to generate them using the storpool_cg utility provided by StorPool.

Note

After you create the configuration files you will need to restart the cgconfig service, or to parse the configuration files with cgconfigparser so that the configuration is applied to the machine.

Warning

Restarting the cgconfig service on machines that have already created cgroups with processes running in them will move the processes to the root cgroup! This is dangerous and is strongly advised NOT to do so!

Introduction to storpool_cg

Before you start

Before running storpool_cg, make sure that all needed services are installed and network interfaces for StorPool are properly configured in storpool.conf.

Format

The storpool_cg tool should be used as follows:

$ storpool_cg [command] [options]

For [command], you must use one of the conf, print, or check commands. You can also set options as needed. For details, see the following sections.

Viewing results before applying them

When using the conf command, it is always advisable first to run storpool_cg with the -N (-noop) option, as shown in the example below:

$ storpool_cg conf -N
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
  core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
  core: 3 cpu: 3,23
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

This way you can see an overview of the configuration the tool can create for the machine. Note that the configuration is not written because the -N option was used. This gives you the opportunity to decide whether the configuration is appropriate for the machine:

  • Yes: You can apply it the configuration by running storpool_cg without the -N option.

  • No: Keep using the -N option and add some of the options described below, until you get a suitable configuration.

Creating cgroups configurations for freshly installed machines

Hypervisors

Setting slice limits

If you think some of the slice limits should be different - for example, you want the system.slice limit to be 4G - you can do the following:

$ storpool_cg conf -N system_limit=4G
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120872M
slice: storpool limit: 692M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 0G
slice: system limit: 4G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
  core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
  core: 3 cpu: 3,23
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

In the same manner, you can pass machine_limit, user_limit sp_common_limit and sp_alloc_limit to the command line. Values in MB are also accepted with the M suffix.

Setting number of CPUs

If you want to dedicate more CPUs to StorPool, you can run storpool_cg with the cores=<N> parameter:

$ storpool_cg conf -N cores=3
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 -
  core: 2 cpu: 2,22 <--- 2 - rdma; 22 -
  core: 3 cpu: 3,23 <--- 3 - block; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

Note that on hyper-threaded machines one core will add two CPUs, while on machines without (or with disabled) hyper-threading one core will add one CPU.

The storpool_cg tool detects which storpool services that need their own cpuset subslice are installed on the machine. It might happen (while unexpected) that you do not have all services installed yet.

Overriding services detection

You can override the services detection by specifying <service>=true or <service>=1. For example, to add a mgmt service to the above configuration:

$ storpool_cg conf -N cores=3 mgmt=1
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
  core: 2 cpu: 2,22 <--- 2 - block; 22 -
  core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

The storpool_cg tool will also detect what driver the network card uses, and if it can be used with hardware acceleration in the current StorPool installation.

Overriding hardware acceleration

You can override the hardware acceleration detection by specifying iface_acc=true/false on the command line. Here is an example of the above configuration with hardware acceleration enabled:

$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
  core: 2 cpu: 2,22 <--- 2 - block; 22 -
  core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

Note that storpool_cg will leave 1G memory for the kernel.

Setting memory for the kernel

If you want to change the amount of memory for the kernel, you can specify the kernel_mem=<X> command line parameter. For example, reserving 3G for the kernel:

$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true kernel_mem=3G
########## START SUMMARY ##########
slice: machine limit: 118696M
slice: storpool limit: 2868M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
  core: 2 cpu: 2,22 <--- 2 - block; 22 -
  core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

Attention

storpool_cg will use CPUs for the storpool.slice from the local CPUs list of the StorPool network interfaces.

Dedicated storage and hyperconverged machines

All options described in the Hypervisors section can also be used on storage and hyperconverged machines.

Warning

Before running storpool_cg on a storage or hyperconverged machine, make sure it meets the following conditions:

  • All its disks are initialized for StorPool. For details, see 7.  Storage devices.

  • If it has NVMe disks that will be used with the storpool_nvmed service, ensure that these drives are NOT unbound from the kernel nvme driver and are visible as block devices in /dev. For details, see 9.  Background services.

Dedicated storage machines

Here is a sample output from the storpool_cg on a dedicated storage machine, +which has its disks configured to run in four storpool_server instances and will run the storpool_iscsi service:

$ storpool_cg conf -N
########## START SUMMARY ##########
slice: storpool limit: 26382M
  subslice: storpool/common limit: 23054M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
  core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
  core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
  core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
  core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
  core: 4 cpu: 4,10
  core: 5 cpu: 5,11
###################################
SP_CACHE_SIZE=2048
SP_CACHE_SIZE_1=2048
SP_CACHE_SIZE_2=2048
SP_CACHE_SIZE_3=2048
########### END SUMMARY ###########

First thing to notice is the SP_CACHE_SIZE{_X} variable at the bottom. By default, when run on a node with local disks, storpool_cg will set the cache sizes for different storpool_server instances. These values will be written in /etc/storpool.conf.d/cache-size.conf.

Cache size

If you don’t want storpool_cg to set the server caches (maybe you have already done it yourself) you can set the set_cache_size command line parameter to false:

$ storpool_cg conf -N set_cache_size=false
########## START SUMMARY ##########
slice: storpool limit: 26382M
  subslice: storpool/common limit: 23054M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
  core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
  core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
  core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
  core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
  core: 4 cpu: 4,10
  core: 5 cpu: 5,11
########### END SUMMARY ###########

As shown in the example above, SP_CACHE_SIZE{_X} disappeared from the config summary, which means they won’t be changed.

Number of servers

storpool_cg detects how many server instances will be running on the machine by reading the storpool_initdisk --list output. If you haven’t configured the right amount of servers on the machine, you can override this detection by specifying the servers command line parameter:

$ storpool_cg conf -N set_cache_size=false servers=2
########## START SUMMARY ##########
slice: storpool limit: 26382M
  subslice: storpool/common limit: 23054M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 6, 7, 8]
socket:0
  core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
  core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
  core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
  core: 3 cpu: 3, 9
  core: 4 cpu: 4,10
  core: 5 cpu: 5,11
########### END SUMMARY ###########

Hyperconverged machines

On hyperconverged machines storpool_cg should be run with the converged command line parameter set to true (or 1). There are two major differences compared to configuring storage-only nodes:

  • A machine.slice will be created for the machine.

  • The memory limit of the storpool.slice will be calculated carefully to be minimal, but sufficient for StorPool to run without problems.

$ storpool_cg conf -N converged=1
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
  subslice: storpool/common limit: 12806M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 2,22
  core: 2 cpu: 4,24
  core: 3 cpu: 6,26
  core: 4 cpu: 8,28
  core: 8 cpu:10,30
  core: 9 cpu:12,32
  core:10 cpu:14,34
  core:11 cpu:16,36
  core:12 cpu:18,38
socket:1
  core: 0 cpu: 1,21
  core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
  core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
  core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
  core: 4 cpu: 9,29
  core: 8 cpu:11,31
  core: 9 cpu:13,33
  core:10 cpu:15,35
  core:11 cpu:17,37
  core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########

Warning

If the machine does not boot with the kernel memsw cgroups feature enabled, you should specify that to storpool_cg conf by setting set_memsw to false (or 0).

Note that storpool_cg will use only CPUs from the network interface local cpulist, which are commonly restricted to one NUMA node. If you want to allow storpool_cg to use all CPUs on the machine, specify that to storpool_cg conf by setting numa_overflow to true (or 1).

Configuring multiple similar machines

It may happen that you want to run storpool_cg with the same command line arguments on multiple machines. The easiest way to do this is to set the corresponding options in the configuration file of storpool_cg, as shown in the example below:

[cgtool]
CONVERGED=1

MGMT=0
SERVERS=2

SET_CACHE_SIZE=0
SYSTEM_LIMIT=4G
USER_LIMIT=4G
KERNEL_MEM=2G

IFACE_ACC=1

CORES=4

Having these options in a file called example.conf, you can simply run storpool_cg by telling it about the file:

storpool_cg conf -N -C example.conf

This is equivalent to setting the options on the command line, like this:

storpool_cg conf -N converged=1 mgmt=0 servers=2 set_cache_size=0 system_limit=4G user_limit=4G kernel_mem=2G iface_acc=true cores=4

Hint

For more information about the options you can set in the configuration file, check the example in /usr/share/doc/storpool/examples/cgtool/example.cfg, or run storpool_cg conf -h.

Saving a configuration as a file

You can use the -D option to save the configuration detected by storpool_cg as a file. For example, by running storpool_cg conf -N converged=1 -D my-cgconf.cfg the configuration would be saved in the my-cgconf.cfg file. Note that my-cgconf.cfg is a valid configuration file for storpool_cg. As shown below, you can create the file and check its content:

$ storpool_cg conf -N converged=1 -D my-cgconf.cfg
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
  subslice: storpool/common limit: 12806M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 2,22
  core: 2 cpu: 4,24
  core: 3 cpu: 6,26
  core: 4 cpu: 8,28
  core: 8 cpu:10,30
  core: 9 cpu:12,32
  core:10 cpu:14,34
  core:11 cpu:16,36
  core:12 cpu:18,38
socket:1
  core: 0 cpu: 1,21
  core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
  core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
  core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
  core: 4 cpu: 9,29
  core: 8 cpu:11,31
  core: 9 cpu:13,33
  core:10 cpu:15,35
  core:11 cpu:17,37
  core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########

$ cat my-cgconf.cfg
##########START CONFIG##########
[cgtool]
CONVERGED=1
BLOCK=1
ISCSI=1
MGMT=1
BRIDGE=0
SERVERS=2

SET_CACHE_SIZE=1
SP_COMMON_LIMIT=12806M
SP_ALLOC_LIMIT=3328M
SYSTEM_LIMIT=2836M
KERNEL_MEM=1G
USER_LIMIT=2G
MACHINE_LIMIT=356G

IFACE=p4p1
IFACE_ACC=1

CORES=3

CONFDIR=/etc/cgconfig.d
CACHEDIR=/etc/storpool.conf.d

###########END CONFIG###########

Verifying machine cgroups state and configurations

storpool_cg print

storpool_cg print is a simple script that reads the cgroups filesystem and reports its current state in a StorPool-friendly readable format. It is in the same format used by storpool_cg for printing configurations. storpool_cg print is useful for making yourself familiar with the machine configuration. Here is an example:

$ storpool_cg print
slice: storpool.slice limit: 26631M
  subslice: storpool.slice/alloc limit: 3328M
  subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --
  core:1 cpus:[ 2  3]  --  nic    | rdma
  core:2 cpus:[ 4  5]  --  server | server_1
  core:3 cpus:[ 6  7]  --  iscsi  | beacon,mgmt,block
socket:1
  core:0 cpus:[ 8  9]  --
  core:1 cpus:[10 11]  --
  core:2 cpus:[12 13]  --
  core:3 cpus:[14 15]  --

It can be used with the -N and -S options to display the NUMA nodes and cpuset slices for the CPUs:

$ storpool_cg print -N -S
slice: storpool.slice limit: 26631M
  subslice: storpool.slice/alloc limit: 3328M
  subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --  numa:[0 0]  --  system user      | system user
  core:1 cpus:[ 2  3]  --  numa:[0 0]  --  storpool: nic    | storpool: rdma
  core:2 cpus:[ 4  5]  --  numa:[0 0]  --  storpool: server | storpool: server_1
  core:3 cpus:[ 6  7]  --  numa:[0 0]  --  storpool: iscsi  | storpool: beacon,mgmt,block
socket:1
  core:0 cpus:[ 8  9]  --  numa:[1 1]  --  system user      | system user
  core:1 cpus:[10 11]  --  numa:[1 1]  --  system user      | system user
  core:2 cpus:[12 13]  --  numa:[1 1]  --  system user      | system user
  core:3 cpus:[14 15]  --  numa:[1 1]  --  system user      | system user

The last option it accepts is the -U/--usage. It will show a table with the memory usage of each memory slice it usually prints, as well as what memory is left for the kernel.

$ storpool_cg print -U
slice                      usage    limit    perc    free
=========================================================
machine.slice               0.00 / 13.21G   0.00%  13.21G
storpool.slice              2.86 / 10.17G  28.09%   7.32G
  storpool.slice/alloc      0.20 /  4.38G   4.61%   4.17G
  storpool.slice/common     2.66 /  5.80G  45.81%   3.14G
system.slice                2.13 /  4.44G  47.84%   2.32G
user.slice                  0.65 /  2.00G  32.73%   1.35G
=========================================================
ALL SLICES                  5.64 / 29.82G  18.91%  24.19G

                        reserved    total    perc  kernel
=========================================================
NON KERNEL                 29.82 / 31.26G  95.40%   1.44G
=========================================================
cpus for StorPool: [1, 2, 3, 4, 5, 6, 7]
socket:0
  core:0 cpus:[ 0  1]  --         | bridge,mgmt
  core:1 cpus:[ 2  3]  --  nic    | rdma
  core:2 cpus:[ 4  5]  --  server | server_1
  core:3 cpus:[ 6  7]  --  iscsi  | beacon,block
socket:1
  core:0 cpus:[ 8  9]  --
  core:1 cpus:[10 11]  --
  core:2 cpus:[12 13]  --
  core:3 cpus:[14 15]  --

storpool_cg check

storpool_cg check will run a series of cgroup-related checks on the machine, and will report any errors or warnings it finds. It can be used to identify cgroup-related problems. Here is an example:

$ storpool_cg check
M: ==== cpuset ====
E: user.slice and machine.slice cpusets intersect
E: machine.slice and system.slice cpusets intersect
M: ==== memory ====
W: memory left for kernel is 0MB
E: sum of storpool.slice, user.slice, system.slice, machine.slice limits is 33549.0MB, while total memory is 31899.46875MB
M: Done.

storpool_process

storpool_process can find all StorPool processes running on the machine and report their cpuset and memory cgroups. It can be used to check in which cgroups do the StorPool processes run to quickly find problems (for example, StorPool processes in the root cgroup).

To list all StorPool processes run:

$ storpool_process list
[pid] [service]  [cpuset]              [memory]
1121  stat       system.slice          system.slice/storpool_stat.service
1181  stat       system.slice          system.slice/storpool_stat.service
1261  stat       system.slice          system.slice/storpool_stat.service
1262  stat       system.slice          system.slice/storpool_stat.service
1263  stat       system.slice          system.slice/storpool_stat.service
1266  stat       system.slice          system.slice/storpool_stat.service
5743  server     storpool.slice/server storpool.slice
14483 block      storpool.slice/block  storpool.slice
21327 stat       system.slice          system.slice/storpool_stat.service
27379 rdma       storpool.slice/rdma   storpool.slice
27380 rdma       storpool.slice/rdma   storpool.slice
27381 rdma       storpool.slice/rdma   storpool.slice
27382 rdma       storpool.slice/rdma   storpool.slice
27383 rdma       storpool.slice/rdma   storpool.slice
28940 mgmt       storpool.slice/mgmt   storpool.slice/alloc
29346 controller system.slice          system.slice
29358 controller system.slice          system.slice
29752 nvmed      storpool.slice/beacon storpool.slice
29764 nvmed      storpool.slice/beacon storpool.slice
30838 block      storpool.slice/block  storpool.slice
31055 server     storpool.slice/server storpool.slice
31086 mgmt       storpool.slice/mgmt   storpool.slice/alloc
31450 beacon     storpool.slice/beacon storpool.slice
31469 beacon     storpool.slice/beacon storpool.slice

By default, processes are sorted by pid. You can specify the sorting by using the -S parameter:

$ storpool_process list -S service pid
[pid] [service]  [cpuset]              [memory]
31450 beacon     storpool.slice/beacon storpool.slice
31469 beacon     storpool.slice/beacon storpool.slice
14483 block      storpool.slice/block  storpool.slice
30838 block      storpool.slice/block  storpool.slice
29346 controller system.slice          system.slice
29358 controller system.slice          system.slice
28940 mgmt       storpool.slice/mgmt   storpool.slice/alloc
31086 mgmt       storpool.slice/mgmt   storpool.slice/alloc
29752 nvmed      storpool.slice/beacon storpool.slice
29764 nvmed      storpool.slice/beacon storpool.slice
27379 rdma       storpool.slice/rdma   storpool.slice
27380 rdma       storpool.slice/rdma   storpool.slice
27381 rdma       storpool.slice/rdma   storpool.slice
27382 rdma       storpool.slice/rdma   storpool.slice
27383 rdma       storpool.slice/rdma   storpool.slice
5743  server     storpool.slice/server storpool.slice
31055 server     storpool.slice/server storpool.slice
1121  stat       system.slice          system.slice/storpool_stat.service
1181  stat       system.slice          system.slice/storpool_stat.service
1261  stat       system.slice          system.slice/storpool_stat.service
1262  stat       system.slice          system.slice/storpool_stat.service
1263  stat       system.slice          system.slice/storpool_stat.service
1266  stat       system.slice          system.slice/storpool_stat.service
21327 stat       system.slice          system.slice/storpool_stat.service

You can also use the storpool_process tool to reclassify misplaced StorPool processes in their right cgroups. If the proper cgroups are configured in storpool.conf you can run storpool_process reclassify, and the tool will classify each process to its right cpuset and memory cgroup. It is advisable to run storpool_process reclassify -N (or even storpool_process reclassify -N -v) first to see which processes are affected and where will they be moved.

Updating already configured machines

Sometimes you might need to update machines that are already configured. A possible solution for this scenario is to create a new configuration and reboot the machine. However, often you won’t be able to reboot, so storpool_cg offers a solution for ‘live’-migrating machines. It is activated by the -M (--migrate) command line option.

When you have a configuration that you want to apply to a machine (you know with what options you want to run storpool_cg) you have two options:

  • Run the storpool_cg to create cgconfig.d files, and then reboot.

  • Use storpool_cg conf with the same options plus the -M option, and let it try to apply it.

Attention

Note the following:

  • Before attempting a live migration, storpool_cg will run a series of checks to verify that it is safe to try the migration.

  • Migrating machines with storpool_cg is pseudo-transactional procedure. If the migration process fails, a rollback procedure will be attempted to restore the initial machine state. The rollback operation is not guaranteed to succeed! Some ‘extreme’ conditions must have occurred for the rollback to fail, though.

You can use the migration in the following cases:

  • Changing slice limits

  • Enabling hardware acceleration

  • Adding and removing services

  • Adding and removing disks

Migrating to new-style configuration

Let’s look at the following machine:

$ storpool_cg print
slice: storpool.slice limit: 26G
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --  nic      | rdma
  core:1 cpus:[ 2  3]  --  server   | block
  core:2 cpus:[ 4  5]  --  server_1 |
  core:3 cpus:[ 6  7]  --  server_2 | beacon
  core:4 cpus:[ 8  9]  --
  core:5 cpus:[10 11]  --
  core:6 cpus:[12 13]  --
  core:7 cpus:[14 15]  --

You can run a ‘fake’ migration with -NM to see the desired configuration and the steps the tool will make to achieve it. All other arguments of storpool_cg conf can be used with -M, so (for example) if you need to tweak the number of cores, you can still use cores=4.

$ storpool_cg conf -NM
W: NIC is expected to be on cpu 2
########## START SUMMARY ##########
slice: storpool limit: 26696M
  subslice: storpool/common limit: 26696M
  subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [2, 3, 4, 5, 6, 7]
socket:0
  core: 0 cpu: 0, 1
  core: 1 cpu: 2, 3 <--- 2 - nic; 3 - rdma
  core: 2 cpu: 4, 5 <--- 4 - server; 5 - server_1
  core: 3 cpu: 6, 7 <--- 6 - server_2; 7 - block,beacon
  core: 4 cpu: 8, 9
  core: 5 cpu:10,11
  core: 6 cpu:12,13
  core: 7 cpu:14,15
########### END SUMMARY ###########
echo 2 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 5 > /sys/fs/cgroup/cpuset/storpool.slice/server_1/cpuset.cpus
echo 2-7 > /sys/fs/cgroup/cpuset/storpool.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/user.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/system.slice/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 7 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.memsw.limit_in_bytes
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/common
echo 1 > /sys/fs/cgroup/memory/storpool.slice/common/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/common/memory.move_charge_at_immigrate
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/common/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/alloc
echo 1 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.move_charge_at_immigrate
echo 0G > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.limit_in_bytes
echo 6143 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6682 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6692 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6913 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6926 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6977 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6987 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7174 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7185 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7585 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7604 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs

When you are happy with the configuration and want to migrate to it, run the migration without -N. Then check if everything is OK with storpool_cg print -S and storpool_cg check.

$ storpool_cg conf -M
W: NIC is expected to be on cpu 2

$ storpool_cg print -S
slice: storpool.slice limit: 26696M
  subslice: storpool.slice/alloc limit: 0G
  subslice: storpool.slice/common limit: 26696M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --  system user        | system user
  core:1 cpus:[ 2  3]  --  storpool: nic      | storpool: rdma
  core:2 cpus:[ 4  5]  --  storpool: server   | storpool: server_1
  core:3 cpus:[ 6  7]  --  storpool: server_2 | storpool: beacon,block
  core:4 cpus:[ 8  9]  --  system user        | system user
  core:5 cpus:[10 11]  --  system user        | system user
  core:6 cpus:[12 13]  --  system user        | system user
  core:7 cpus:[14 15]  --  system user        | system user

$ storpool_cg check
M: ==== memory ====
W: memory:system.slice has more than 80% usage

Attention

One more thing you need to check after a migration is the output of storpool_process reclassify -N -v. If it suggests moving some processes to different cgroups than the ones they are currently running in, you can do it by running just storpool_process reclassify. Note that storpool_process will make suggestions based on the current SP_X_CGROUPS variables in the storpool.conf.