Control Groups

1. Kernel Control Groups and StorPool

1.1. Overview

This document gives an overview of kernel control groups (cgroups) feature and how StorPool uses the cpuset and memory groups to optimize performance and protect the storage system from random events like OOMs.

1.2. Cgroups

A good overview of cgroups can be given by the Description and Control Groups Version 1 sections in the cgroups manual page.

For more detailed information you can read the kernel cgroups documentation.

1.3. Cgexec

All StorPool services are being started via the cgexec utility. It runs the service and accounts its resources in the given-by-parameters cgroups. For example, cgexec ./test -g cpuset:cpuset_cg -g memory:memory_cg will run the test binary and limit its cpuset resources by the limitations defined in the cpuset_cg cgroup and its memory resources by the limitations defined in the memory_cg cgroup.

1.4. Slices

Common practice is to create cgroups with same names under different controllers. Take, for example, the cpuset and memory controllers. If one create a test cgroup in the memory controller and test cgroup in the cpuset controller, we will consider this a slice. So more appropriate name for the two cgroups would be test.slice. Defining slices makes it easier to keep track of resources used by a process. If you think of it as ‘the process runs in the test.slice’ that implies both cpuset restrictions from cpuset:test.slice and memory restrictions from memory:test.slice.

Machines that run virtual guests have a machine.slice where all the virtual machines run and system.slice where the system processes run. Systemd machines also have a user.slice where all user session processes run.

1.5. StorPool and Cgroups

Machines that run StorPool will also have a storpool.slice where all StorPool core services run. On a properly configured machine all slices will have configured memory (and memory+swap) limits. This is done to guarantee two things:

  1. the kernel will have enough memory to run properly (explained later)

  2. the storage system (StorPool) will not suffer from OOM situation that was not triggered by it

The cpuset controller in the storpool.slice is used to:

  1. Dedicate cpus only for StorPool (that other slices do not have access to)

  2. Map the dedicated cpus to StorPool services in a specific manner to optimize performance

1.5.1. Memory configuration

For the root cgroup, memory.use_hierarchy should be set to 1, so that a hierarchical memory model is used for the cgroups.

For all slices, memory.move_charge_at_immigrate should be set to at least 1 and for the storpool.slice it should be set to 3.

For all slices, memory.limit_in_bytes and memory.memsw.limit_in_bytes should be set to the same appropriate value.

For the storpool.slice memory.swappines should be set to 0.

To ensure enough memory for the kernel, the sum of the memory limits of all slices should be at least 1G short of the total machine memory.

The memory:storpool.slice have two memory subslices - common and alloc. The storpool.slice/alloc subslice is used to limit the memory usage of the mgmt, iscsi and bridge services, while the storpool.slice/common subslice is for everything else. Their memory limits should also be configured and their sum should be equal to the storpool.slice memory limit.

1.5.2. Cpuset configuration

Dedicated cpus for StorPool should be set in the storpool.slice’s cpuset.cpus. All other slices’ cpuset.cpus should be set to have all the remaining cpus. The cpuset.cpu_exclusive flag in the storpool.slice should be set to 1 to ensure that the other slices cannot use the cpus dedicated for StorPool.

The cpuset:storpool.slice should have a subslice for each running StorPool service on the machine. So, if you have two servers, beacon, mgmt and block running, there should be storpool.slice/{server,server_1,beacon,mgmt,block} subslices. These are used to assign the services to specific cpus. This is achieved by, for example, setting the cpuset.cpus of the storpool.slice/beacon to 2. This will restrict the storpool_beacon to run on cpu#2.

Note that for machines that do not have hardware accelerated network cards the storpool.slice will also need a cpu for the nic, but there is no nic subslice. That cpu must not be in any of the subslices (should be left empty).

For the storpool.slice and each of it subslices cpuset.mems option should be set to all available numa nodes on the machine (e.g. 0-3 on a 4 NUMA nodes machine).

1.6. Cgconfig

Cgroup configurations are made persistent through reboot by configuration files placed in /etc/cgconfig.d/. When the machine boots the cgconfig service runs and applies the configuration using the cgconfigparser.

Writing cgconfig files by hand is nasty and ugly job, so it is advisable to use the cgtool to generate them.

Note

After you create the config files you will need to restart the cgconfig service or parse the configuration files with cgconfigparser so that the configuration is applied to the machine.

Warning

Restarting the cgconfig service on machines that have already created cgroups with processes running in them will move the processes to the root cgroup! This is dangerous and is strongly advised NOT to do so!

2. Using the cgtool to create Cgroups configurations for freshly installed machines

2.1. Hypervisors

Attention

Before running the cgtool make sure that all needed services are installed and network interfaces for StorPool are properly configured in the storpool.conf.

It is always advisable to run the cgtool with the -N (-noop) option first. Here is a sample output.

$ storpool_cg conf -N
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
  core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
  core: 3 cpu: 3,23
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

What you can see is an overview of the configuration the tool will create for the machine. Note that because of the -N option, the configuration is not written. If you think the configuration is appropriate for the machine, just run the cgtool without the -N option and the configuration will be created.

If the configuration does not seem right, here are several things you can change:

  1. If you think some of the slice limits should be different, let’s say that you want the system.slice limit to be 4G, you can do:

$ storpool_cg conf -N system_limit=4G
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120872M
slice: storpool limit: 692M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 0G
slice: system limit: 4G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
  core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
  core: 3 cpu: 3,23
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

In the same manner, you can set pass the machine_limit, user_limit sp_common_limit and sp_alloc_limit to the command line. Values in MB are also accepted with the M suffix.

  1. If you want to dedicated more cpus to StorPool, you can run the cgtool with cores=<N> parameter.

$ storpool_cg conf -N cores=3
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 -
  core: 2 cpu: 2,22 <--- 2 - rdma; 22 -
  core: 3 cpu: 3,23 <--- 3 - block; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

Please note that on hyperthreaded machines one core will add two cpus, while on machines without (or with disabled) hyperthreading one core will add one cpu.

The cgtool detects which storpool services that need their own cpuset subslice are installed on the machine. It might happen (but it is unexpected!) that you do not have all services installed yet.

  1. You can override the cgtool services detection by specifying <service>=true or <service>=1. For example, to add a mgmt service to the above configuration, run:

$ storpool_cg conf -N cores=3 mgmt=1
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
  core: 2 cpu: 2,22 <--- 2 - block; 22 -
  core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

The cgtool will also detect what driver the network card uses and if it can be used with hardware acceleration from the current StorPool installation.

  1. The hardware acceleration detection can be overriden by specifying iface_acc=true/false on the command line. Here is an example of the above configuration with enabled hardware acceleration:

$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
  core: 2 cpu: 2,22 <--- 2 - block; 22 -
  core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

Note that the cgtool will leave 1G memory for the kernel.

  1. If you want to change the amount of memory for the kernel, you can specify the kernel_mem=<X> command line parameter. For example, reserving 3G for the kernel:

$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true kernel_mem=3G
########## START SUMMARY ##########
slice: machine limit: 118696M
slice: storpool limit: 2868M
  subslice: storpool/common limit: 692M
  subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
  core: 2 cpu: 2,22 <--- 2 - block; 22 -
  core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
  core: 4 cpu: 4,24
  core: 8 cpu: 5,25
  core: 9 cpu: 6,26
  core:10 cpu: 7,27
  core:11 cpu: 8,28
  core:12 cpu: 9,29
socket:1
  core: 0 cpu:10,30
  core: 1 cpu:11,31
  core: 2 cpu:12,32
  core: 3 cpu:13,33
  core: 4 cpu:14,34
  core: 8 cpu:15,35
  core: 9 cpu:16,36
  core:10 cpu:17,37
  core:11 cpu:18,38
  core:12 cpu:19,39
###################################

########### END SUMMARY ###########

Attention

The cgtool will use cpus for the storpool.slice from the local cpus list of the StorPool network interfaces.

2.2. Dedicated storage and Hyperconverged machines

All options described in the Hypervisors section can also be used on storage and hyperconverged machines.

Attention

Before running the cgtool in storage and hyperconverged machines, you should have initialized all disk for StorPool.

Warning

If the machine has NVMe disks, that will be used with the storpool_nvme service, please make sure that these drives are NOT unbind from the kernel nvme driver and are visible as block devices in /dev.

2.2.1. Dedicated storage machines

Sample output from the cgtool on a dedicated storage machine which has its disks configured to run in four storpool_server instances and will run the storpool_iscsi service.

$ storpool_cg conf -N
########## START SUMMARY ##########
slice: storpool limit: 26382M
  subslice: storpool/common limit: 23054M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
  core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
  core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
  core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
  core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
  core: 4 cpu: 4,10
  core: 5 cpu: 5,11
###################################
SP_CACHE_SIZE=2048
SP_CACHE_SIZE_1=2048
SP_CACHE_SIZE_2=2048
SP_CACHE_SIZE_3=2048
########### END SUMMARY ###########

First thing to notice is the SP_CACHE_SIZE{_X} variable at the bottom. By default, when run on a node with local disks, the cgtool will set the cache sizes for different storpool_server instances. These values will be written in /etc/storpool.conf.d/cache-size.conf. If you don’t want the cgtool to set the server caches (maybe you have already done it yourself) you can set the set_cache_size command line parameter to false:

$ storpool_cg conf -N set_cache_size=false
########## START SUMMARY ##########
slice: storpool limit: 26382M
  subslice: storpool/common limit: 23054M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
  core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
  core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
  core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
  core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
  core: 4 cpu: 4,10
  core: 5 cpu: 5,11
########### END SUMMARY ###########

The SP_CACHE_SIZE{_X} disappeared from the config summary, which means they won’t be changed.

The cgtool detects how many server instances will be running on the machine by reading the storpool_initdisk --list output. If you haven’t configured the right amount of servers on the machine, you can override this detection by specifying the servers command line parameter:

$ storpool_cg conf -N set_cache_size=false servers=2
########## START SUMMARY ##########
slice: storpool limit: 26382M
  subslice: storpool/common limit: 23054M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 6, 7, 8]
socket:0
  core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
  core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
  core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
  core: 3 cpu: 3, 9
  core: 4 cpu: 4,10
  core: 5 cpu: 5,11
########### END SUMMARY ###########

2.2.2. Hyperconverged machines

On hyperconverged machines the cgtool should be run with the converged command line parameter set to true/1. This has two major differences compared to configuring storage-only nodes:

  1. A machine.slice will be created for the machine.

  2. The memory limit of the storpool.slice will be carefully calculated to be minimal, but sufficient for StorPool to run without problems.

$ storpool_cg conf -N converged=1
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
  subslice: storpool/common limit: 12806M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 2,22
  core: 2 cpu: 4,24
  core: 3 cpu: 6,26
  core: 4 cpu: 8,28
  core: 8 cpu:10,30
  core: 9 cpu:12,32
  core:10 cpu:14,34
  core:11 cpu:16,36
  core:12 cpu:18,38
socket:1
  core: 0 cpu: 1,21
  core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
  core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
  core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
  core: 4 cpu: 9,29
  core: 8 cpu:11,31
  core: 9 cpu:13,33
  core:10 cpu:15,35
  core:11 cpu:17,37
  core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########

Warning

If the machine will not boot with the kernel memsw cgroups feature enabled, you should specify that to storpool_cg conf by setting set_memsw to false/0.

Note

The cgtool will use only cpus from the network interface local cpulist, which are commonly restricted to one numa node. If you want to allow the cgtool to use all cpus on the machine specify that to storpool_cg conf by setting numa_overflow to true/1.

3. Using the cgtool to configure multiple similar machines

It may happen that you want to run the cgtool with the same command line arguments on multiple machines. The cgtool provides a configuration file option. A sample configuration file looks like the following:

[cgtool]
CONVERGED=1

MGMT=0
SERVERS=2

SET_CACHE_SIZE=0
SYSTEM_LIMIT=4G
USER_LIMIT=4G
KERNEL_MEM=2G

IFACE_ACC=1

CORES=4

Running the cgtool with storpool_cg conf -N -C example.conf is equivalent to storpool_cg conf -N converged=1 mgmt=0 servers=2 set_cache_size=0 system_limit=4G user_limit=4G kernel_mem=2G iface_acc=true cores=4.

Hint

You can see the example.cfg in the cgtool directory and explore the storpool_cg conf -h.

3.1. Dumping a configuration

You can ‘ask’ the cgtool how its configuration looks like after it has done all of its autodetections. This can be achieved by running storpool_cg conf -N converged=1 -D dumped.cfg which will dump the cgtool configuration in to dumped.cfg. Note that dumped.cfg is a valid configuration file for the cgtool. For example:

$ storpool_cg conf -N converged=1 -D dumped.cfg
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
  subslice: storpool/common limit: 12806M
  subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
  core: 0 cpu: 0,20
  core: 1 cpu: 2,22
  core: 2 cpu: 4,24
  core: 3 cpu: 6,26
  core: 4 cpu: 8,28
  core: 8 cpu:10,30
  core: 9 cpu:12,32
  core:10 cpu:14,34
  core:11 cpu:16,36
  core:12 cpu:18,38
socket:1
  core: 0 cpu: 1,21
  core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
  core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
  core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
  core: 4 cpu: 9,29
  core: 8 cpu:11,31
  core: 9 cpu:13,33
  core:10 cpu:15,35
  core:11 cpu:17,37
  core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########

$ cat dumped.cfg
##########START CONFIG##########
[cgtool]
CONVERGED=1
BLOCK=1
ISCSI=1
MGMT=1
BRIDGE=0
SERVERS=2

SET_CACHE_SIZE=1
SP_COMMON_LIMIT=12806M
SP_ALLOC_LIMIT=3328M
SYSTEM_LIMIT=2836M
KERNEL_MEM=1G
USER_LIMIT=2G
MACHINE_LIMIT=356G

IFACE=p4p1
IFACE_ACC=1

CORES=3

CONFDIR=/etc/cgconfig.d
CACHEDIR=/etc/storpool.conf.d

###########END CONFIG###########

4. Verifying machine cgroups state and configurations

4.1. storpool_cg print

storpool_cg print is a simple script that reads the cgroups filesystem and reports its current state in a StorPool-friendly readable format. It is the same format used by the cgtool for printing configurations. storpool_cg print is useful for making yourself familiar with the machine configuration. Here is an example:

$ storpool_cg print
slice: storpool.slice limit: 26631M
  subslice: storpool.slice/alloc limit: 3328M
  subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --
  core:1 cpus:[ 2  3]  --  nic    | rdma
  core:2 cpus:[ 4  5]  --  server | server_1
  core:3 cpus:[ 6  7]  --  iscsi  | beacon,mgmt,block
socket:1
  core:0 cpus:[ 8  9]  --
  core:1 cpus:[10 11]  --
  core:2 cpus:[12 13]  --
  core:3 cpus:[14 15]  --

It can be used with the -N and -S options to display the numa nodes and cpuset slices for the cpus:

$ storpool_cg print -N -S
slice: storpool.slice limit: 26631M
  subslice: storpool.slice/alloc limit: 3328M
  subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --  numa:[0 0]  --  system user      | system user
  core:1 cpus:[ 2  3]  --  numa:[0 0]  --  storpool: nic    | storpool: rdma
  core:2 cpus:[ 4  5]  --  numa:[0 0]  --  storpool: server | storpool: server_1
  core:3 cpus:[ 6  7]  --  numa:[0 0]  --  storpool: iscsi  | storpool: beacon,mgmt,block
socket:1
  core:0 cpus:[ 8  9]  --  numa:[1 1]  --  system user      | system user
  core:1 cpus:[10 11]  --  numa:[1 1]  --  system user      | system user
  core:2 cpus:[12 13]  --  numa:[1 1]  --  system user      | system user
  core:3 cpus:[14 15]  --  numa:[1 1]  --  system user      | system user

The last option it accepts is the -U/--usage. It will show a table with the memory usage of each memory slice it usually prints as well as what memory is left for the kernel.

$ storpool_cg print -U
slice                      usage    limit    perc    free
=========================================================
machine.slice               0.00 / 13.21G   0.00%  13.21G
storpool.slice              2.86 / 10.17G  28.09%   7.32G
  storpool.slice/alloc      0.20 /  4.38G   4.61%   4.17G
  storpool.slice/common     2.66 /  5.80G  45.81%   3.14G
system.slice                2.13 /  4.44G  47.84%   2.32G
user.slice                  0.65 /  2.00G  32.73%   1.35G
=========================================================
ALL SLICES                  5.64 / 29.82G  18.91%  24.19G

                        reserved    total    perc  kernel
=========================================================
NON KERNEL                 29.82 / 31.26G  95.40%   1.44G
=========================================================
cpus for StorPool: [1, 2, 3, 4, 5, 6, 7]
socket:0
  core:0 cpus:[ 0  1]  --         | bridge,mgmt
  core:1 cpus:[ 2  3]  --  nic    | rdma
  core:2 cpus:[ 4  5]  --  server | server_1
  core:3 cpus:[ 6  7]  --  iscsi  | beacon,block
socket:1
  core:0 cpus:[ 8  9]  --
  core:1 cpus:[10 11]  --
  core:2 cpus:[12 13]  --
  core:3 cpus:[14 15]  --

4.2. storpool_cg check

storpool_cg check will run a series of control group related checks on the machine and will report any errors or warnings it finds. It can used to identify cgroup-related problems. Here is an example output:

$ storpool_cg check
M: ==== cpuset ====
E: user.slice and machine.slice cpusets intersect
E: machine.slice and system.slice cpusets intersect
M: ==== memory ====
W: memory left for kernel is 0MB
E: sum of storpool.slice, user.slice, system.slice, machine.slice limits is 33549.0MB, while total memory is 31899.46875MB
M: Done.

4.3. storpool_process

storpool_process can find all StorPool processes running on the machine and report their cpuset and memory cgroups. It can be used to check in which cgroups do the StorPool processes run to quickly find problems (e.g. StorPool processes in the root cgroup). To list all StorPool processes run:

$ storpool_process list
[pid] [service]  [cpuset]              [memory]
1121  stat       system.slice          system.slice/storpool_stat.service
1181  stat       system.slice          system.slice/storpool_stat.service
1261  stat       system.slice          system.slice/storpool_stat.service
1262  stat       system.slice          system.slice/storpool_stat.service
1263  stat       system.slice          system.slice/storpool_stat.service
1266  stat       system.slice          system.slice/storpool_stat.service
5743  server     storpool.slice/server storpool.slice
14483 block      storpool.slice/block  storpool.slice
21327 stat       system.slice          system.slice/storpool_stat.service
27379 rdma       storpool.slice/rdma   storpool.slice
27380 rdma       storpool.slice/rdma   storpool.slice
27381 rdma       storpool.slice/rdma   storpool.slice
27382 rdma       storpool.slice/rdma   storpool.slice
27383 rdma       storpool.slice/rdma   storpool.slice
28940 mgmt       storpool.slice/mgmt   storpool.slice/alloc
29346 controller system.slice          system.slice
29358 controller system.slice          system.slice
29752 nvmed      storpool.slice/beacon storpool.slice
29764 nvmed      storpool.slice/beacon storpool.slice
30838 block      storpool.slice/block  storpool.slice
31055 server     storpool.slice/server storpool.slice
31086 mgmt       storpool.slice/mgmt   storpool.slice/alloc
31450 beacon     storpool.slice/beacon storpool.slice
31469 beacon     storpool.slice/beacon storpool.slice

By default, processes are sorted by pid. You can specify the sorting by using the -S parameter:

$ storpool_process list -S service pid
[pid] [service]  [cpuset]              [memory]
31450 beacon     storpool.slice/beacon storpool.slice
31469 beacon     storpool.slice/beacon storpool.slice
14483 block      storpool.slice/block  storpool.slice
30838 block      storpool.slice/block  storpool.slice
29346 controller system.slice          system.slice
29358 controller system.slice          system.slice
28940 mgmt       storpool.slice/mgmt   storpool.slice/alloc
31086 mgmt       storpool.slice/mgmt   storpool.slice/alloc
29752 nvmed      storpool.slice/beacon storpool.slice
29764 nvmed      storpool.slice/beacon storpool.slice
27379 rdma       storpool.slice/rdma   storpool.slice
27380 rdma       storpool.slice/rdma   storpool.slice
27381 rdma       storpool.slice/rdma   storpool.slice
27382 rdma       storpool.slice/rdma   storpool.slice
27383 rdma       storpool.slice/rdma   storpool.slice
5743  server     storpool.slice/server storpool.slice
31055 server     storpool.slice/server storpool.slice
1121  stat       system.slice          system.slice/storpool_stat.service
1181  stat       system.slice          system.slice/storpool_stat.service
1261  stat       system.slice          system.slice/storpool_stat.service
1262  stat       system.slice          system.slice/storpool_stat.service
1263  stat       system.slice          system.slice/storpool_stat.service
1266  stat       system.slice          system.slice/storpool_stat.service
21327 stat       system.slice          system.slice/storpool_stat.service

You can also use the storpool_process tool to reclassify mispaced StorPool processes in their right cgroups. If the proper cgroups are configured in the storpool.conf you can run storpool_process reclassify and the tool will classify each process to its right cpuset and memory cgroups. It is advisable to run storpool_process reclassify -N (or even storpool_process reclassify -N -v) first to see what processes will be moved where.

5. Using the cgtool to update already configured machines

A possible solution for this scenario is to create a new configuration and reboot the machine. However, often you won’t be able to reboot, so the cgtool offers a solution for ‘live’-migrating machines. It is activated by the -M (--migrate) command line option.

When you have a configuration that you want to apply to a machine (you know with what options you want to run the cgtool) you have two options: run the cgtool to create cgconfig.d files and reboot or use storpool_cg conf with the same options and the -M option and let it try to apply it.

Attention

Before attempting a live migration, the cgtool will run a series of checks to verify that it is safe to try the migration.

Attention

Migrating machines with the cgtool is pseudo-transactional procedure. If the migration process fails, a rollback procedure will be attempted to restore the initial machine state. The rollback operation is not guaranteed to succeed! Some ‘extreme’ conditions must have occurred for the rollback to fail, though.

You can use the migration when:

  1. Changing slice limits

  2. Enabling hardware acceleration

  3. Adding and removing services

  4. Adding and removing disks

5.1. Migrating to new-style configs

Let’s look at the following machine:

$ storpool_cg print
slice: storpool.slice limit: 26G
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --  nic      | rdma
  core:1 cpus:[ 2  3]  --  server   | block
  core:2 cpus:[ 4  5]  --  server_1 |
  core:3 cpus:[ 6  7]  --  server_2 | beacon
  core:4 cpus:[ 8  9]  --
  core:5 cpus:[10 11]  --
  core:6 cpus:[12 13]  --
  core:7 cpus:[14 15]  --

When you run a ‘fake’ migration with -NM to see the desired configuration and the steps the tool will make to achieve it. All other arguments of storpool_cg conf can be used with -M, so if you need to tweak the number of cores, for example, you can still use cores=4.

$ storpool_cg conf -NM
W: NIC is expected to be on cpu 2
########## START SUMMARY ##########
slice: storpool limit: 26696M
  subslice: storpool/common limit: 26696M
  subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [2, 3, 4, 5, 6, 7]
socket:0
  core: 0 cpu: 0, 1
  core: 1 cpu: 2, 3 <--- 2 - nic; 3 - rdma
  core: 2 cpu: 4, 5 <--- 4 - server; 5 - server_1
  core: 3 cpu: 6, 7 <--- 6 - server_2; 7 - block,beacon
  core: 4 cpu: 8, 9
  core: 5 cpu:10,11
  core: 6 cpu:12,13
  core: 7 cpu:14,15
########### END SUMMARY ###########
echo 2 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 5 > /sys/fs/cgroup/cpuset/storpool.slice/server_1/cpuset.cpus
echo 2-7 > /sys/fs/cgroup/cpuset/storpool.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/user.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/system.slice/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 7 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.memsw.limit_in_bytes
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/common
echo 1 > /sys/fs/cgroup/memory/storpool.slice/common/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/common/memory.move_charge_at_immigrate
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/common/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/alloc
echo 1 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.move_charge_at_immigrate
echo 0G > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.limit_in_bytes
echo 6143 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6682 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6692 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6913 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6926 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6977 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6987 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7174 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7185 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7585 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7604 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs

When you are happy with the configuration and want to migrate to it, run the migration without -N and then check if everything is ok with storpool_cg print -S and storpool_cg check.

$ storpool_cg conf -M
W: NIC is expected to be on cpu 2

$ storpool_cg print -S
slice: storpool.slice limit: 26696M
  subslice: storpool.slice/alloc limit: 0G
  subslice: storpool.slice/common limit: 26696M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
  core:0 cpus:[ 0  1]  --  system user        | system user
  core:1 cpus:[ 2  3]  --  storpool: nic      | storpool: rdma
  core:2 cpus:[ 4  5]  --  storpool: server   | storpool: server_1
  core:3 cpus:[ 6  7]  --  storpool: server_2 | storpool: beacon,block
  core:4 cpus:[ 8  9]  --  system user        | system user
  core:5 cpus:[10 11]  --  system user        | system user
  core:6 cpus:[12 13]  --  system user        | system user
  core:7 cpus:[14 15]  --  system user        | system user

$ storpool_cg check
M: ==== memory ====
W: memory:system.slice has more than 80% usage

Attention

One more thing you need to check after a migration is the output of storpool_process reclassify -N -v. If it suggests moving some processes to different cgroups than the ones they are currently running in, you can do it by running just storpool_process reclassify. Note that, storpool_process will make suggestions based on the current SP_X_CGROUPS variables in the storpool.conf.