Control Groups
1. Kernel Control Groups and StorPool
1.1. Overview
This document gives an overview of kernel control groups (cgroups) feature and how StorPool uses the cpuset and memory groups to optimize performance and protect the storage system from random events like OOMs.
1.2. Cgroups
A good overview of cgroups can be given by the Description and Control Groups Version 1 sections in the cgroups manual page.
For more detailed information you can read the kernel cgroups documentation.
1.3. Cgexec
All StorPool services are being started via the cgexec
utility. It runs the service and accounts its resources in the given-by-parameters cgroups.
For example, cgexec ./test -g cpuset:cpuset_cg -g memory:memory_cg
will run the test binary and limit its cpuset
resources by the limitations defined in the cpuset_cg cgroup and its memory resources by the limitations defined in the memory_cg cgroup.
1.4. Slices
Common practice is to create cgroups with same names under different controllers. Take, for example, the cpuset
and memory
controllers. If one create a test cgroup in the memory
controller and test cgroup in the cpuset
controller, we will consider this a slice. So more appropriate name for the two cgroups would be test.slice.
Defining slices makes it easier to keep track of resources used by a process. If you think of it as ‘the process runs in the test.slice
’ that implies both cpuset restrictions from cpuset:test.slice
and memory restrictions from memory:test.slice
.
Machines that run virtual guests have a machine.slice
where all the virtual machines run and system.slice
where the system processes run.
Systemd machines also have a user.slice
where all user session processes run.
1.5. StorPool and Cgroups
Machines that run StorPool will also have a storpool.slice
where all StorPool core services run.
On a properly configured machine all slices will have configured memory (and memory+swap) limits. This is done to guarantee two things:
the kernel will have enough memory to run properly (explained later)
the storage system (StorPool) will not suffer from OOM situation that was not triggered by it
The cpuset
controller in the storpool.slice
is used to:
Dedicate cpus only for StorPool (that other slices do not have access to)
Map the dedicated cpus to StorPool services in a specific manner to optimize performance
1.5.1. Memory configuration
For the root cgroup, memory.use_hierarchy
should be set to 1
, so that a hierarchical memory model is used for the cgroups.
For all slices, memory.move_charge_at_immigrate
should be set to at least 1
and for the storpool.slice
it should be set to 3
.
For all slices, memory.limit_in_bytes
and memory.memsw.limit_in_bytes
should be set to the same appropriate value.
For the storpool.slice memory.swappines
should be set to 0
.
To ensure enough memory for the kernel, the sum of the memory limits of all slices should be at least 1G short of the total machine memory.
The memory:storpool.slice
have two memory subslices - common
and alloc
. The storpool.slice/alloc
subslice is used to limit the memory usage of the mgmt, iscsi and bridge services, while the storpool.slice/common
subslice is for everything else. Their memory limits should also be configured and their sum should be equal to the storpool.slice
memory limit.
1.5.2. Cpuset configuration
Dedicated cpus for StorPool should be set in the storpool.slice’s cpuset.cpus
. All other slices’ cpuset.cpus
should be set to have all the remaining cpus. The cpuset.cpu_exclusive
flag in the storpool.slice should be set to 1
to ensure that the other slices cannot use the cpus dedicated for StorPool.
The cpuset:storpool.slice
should have a subslice for each running StorPool service on the machine. So, if you have two servers, beacon, mgmt and block running, there should be storpool.slice/{server,server_1,beacon,mgmt,block}
subslices. These are used to assign the services to specific cpus. This is achieved by, for example, setting the cpuset.cpus
of the storpool.slice/beacon
to 2
. This will restrict the storpool_beacon
to run on cpu#2.
Note that for machines that do not have hardware accelerated network cards the storpool.slice
will also need a cpu for the nic, but there is no nic subslice. That cpu must not be in any of the subslices (should be left empty).
For the storpool.slice
and each of it subslices cpuset.mems
option should be set to all available numa nodes on the machine (e.g. 0-3 on a 4 NUMA nodes machine).
1.6. Cgconfig
Cgroup configurations are made persistent through reboot by configuration files placed in /etc/cgconfig.d/. When the machine boots the cgconfig service runs and applies the configuration using the cgconfigparser.
Writing cgconfig files by hand is nasty and ugly job, so it is advisable to use the cgtool
to generate them.
Note
After you create the config files you will need to restart the cgconfig service or parse the configuration files with cgconfigparser so that the configuration is applied to the machine.
Warning
Restarting the cgconfig service on machines that have already created cgroups with processes running in them will move the processes to the root cgroup! This is dangerous and is strongly advised NOT to do so!
2. Using the cgtool
to create Cgroups configurations for freshly installed machines
2.1. Hypervisors
Attention
Before running the cgtool
make sure that all needed services are installed and network interfaces for StorPool are properly configured in the storpool.conf.
It is always advisable to run the cgtool
with the -N (-noop)
option first. Here is a sample output.
$ storpool_cg conf -N
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
core: 3 cpu: 3,23
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
What you can see is an overview of the configuration the tool will create for the machine. Note that because of the -N
option, the configuration is not written.
If you think the configuration is appropriate for the machine, just run the cgtool
without the -N
option and the configuration will be created.
If the configuration does not seem right, here are several things you can change:
If you think some of the slice limits should be different, let’s say that you want the
system.slice
limit to be4G
, you can do:
$ storpool_cg conf -N system_limit=4G
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120872M
slice: storpool limit: 692M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 0G
slice: system limit: 4G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
core: 3 cpu: 3,23
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
In the same manner, you can set pass the machine_limit
, user_limit
sp_common_limit
and sp_alloc_limit
to the command line. Values in MB are also accepted with the M
suffix.
If you want to dedicated more cpus to StorPool, you can run the
cgtool
withcores=<N>
parameter.
$ storpool_cg conf -N cores=3
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 -
core: 2 cpu: 2,22 <--- 2 - rdma; 22 -
core: 3 cpu: 3,23 <--- 3 - block; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
Please note that on hyperthreaded machines one core will add two cpus, while on machines without (or with disabled) hyperthreading one core will add one cpu.
The cgtool detects which storpool services that need their own cpuset subslice are installed on the machine. It might happen (but it is unexpected!) that you do not have all services installed yet.
You can override the cgtool services detection by specifying
<service>=true
or<service>=1
. For example, to add a mgmt service to the above configuration, run:
$ storpool_cg conf -N cores=3 mgmt=1
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
core: 2 cpu: 2,22 <--- 2 - block; 22 -
core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
The cgtool will also detect what driver the network card uses and if it can be used with hardware acceleration from the current StorPool installation.
The hardware acceleration detection can be overriden by specifying
iface_acc=true/false
on the command line. Here is an example of the above configuration with enabled hardware acceleration:
$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
core: 2 cpu: 2,22 <--- 2 - block; 22 -
core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
Note that the cgtool
will leave 1G memory for the kernel.
If you want to change the amount of memory for the kernel, you can specify the
kernel_mem=<X>
command line parameter. For example, reserving 3G for the kernel:
$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true kernel_mem=3G
########## START SUMMARY ##########
slice: machine limit: 118696M
slice: storpool limit: 2868M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
core: 2 cpu: 2,22 <--- 2 - block; 22 -
core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
Attention
The cgtool
will use cpus for the storpool.slice
from the local cpus list of the StorPool network interfaces.
2.2. Dedicated storage and Hyperconverged machines
All options described in the Hypervisors section can also be used on storage and hyperconverged machines.
Attention
Before running the cgtool
in storage and hyperconverged machines, you should have initialized all disk for StorPool.
Warning
If the machine has NVMe disks, that will be used with the storpool_nvme
service, please make sure that these drives are NOT unbind from the kernel nvme driver and are visible as block devices in /dev.
2.2.1. Dedicated storage machines
Sample output from the cgtool
on a dedicated storage machine which has its disks configured to run in four storpool_server
instances and will run the storpool_iscsi
service.
$ storpool_cg conf -N
########## START SUMMARY ##########
slice: storpool limit: 26382M
subslice: storpool/common limit: 23054M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
core: 4 cpu: 4,10
core: 5 cpu: 5,11
###################################
SP_CACHE_SIZE=2048
SP_CACHE_SIZE_1=2048
SP_CACHE_SIZE_2=2048
SP_CACHE_SIZE_3=2048
########### END SUMMARY ###########
First thing to notice is the SP_CACHE_SIZE{_X}
variable at the bottom. By default, when run on a node with local disks, the cgtool
will set the cache sizes for different storpool_server
instances. These values will be written in /etc/storpool.conf.d/cache-size.conf. If you don’t want the cgtool
to set the server caches (maybe you have already done it yourself) you can set the set_cache_size
command line parameter to false
:
$ storpool_cg conf -N set_cache_size=false
########## START SUMMARY ##########
slice: storpool limit: 26382M
subslice: storpool/common limit: 23054M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
core: 4 cpu: 4,10
core: 5 cpu: 5,11
########### END SUMMARY ###########
The SP_CACHE_SIZE{_X}
disappeared from the config summary, which means they won’t be changed.
The cgtool
detects how many server instances will be running on the machine by reading the storpool_initdisk --list
output. If you haven’t configured the right amount of servers on the machine, you can override this detection by specifying the servers
command line parameter:
$ storpool_cg conf -N set_cache_size=false servers=2
########## START SUMMARY ##########
slice: storpool limit: 26382M
subslice: storpool/common limit: 23054M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 6, 7, 8]
socket:0
core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
core: 3 cpu: 3, 9
core: 4 cpu: 4,10
core: 5 cpu: 5,11
########### END SUMMARY ###########
2.2.2. Hyperconverged machines
On hyperconverged machines the cgtool
should be run with the converged
command line parameter set to true/1
. This has two major differences compared to configuring storage-only nodes:
A
machine.slice
will be created for the machine.The memory limit of the
storpool.slice
will be carefully calculated to be minimal, but sufficient for StorPool to run without problems.
$ storpool_cg conf -N converged=1
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
subslice: storpool/common limit: 12806M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 2,22
core: 2 cpu: 4,24
core: 3 cpu: 6,26
core: 4 cpu: 8,28
core: 8 cpu:10,30
core: 9 cpu:12,32
core:10 cpu:14,34
core:11 cpu:16,36
core:12 cpu:18,38
socket:1
core: 0 cpu: 1,21
core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
core: 4 cpu: 9,29
core: 8 cpu:11,31
core: 9 cpu:13,33
core:10 cpu:15,35
core:11 cpu:17,37
core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########
Warning
If the machine will not boot with the kernel memsw
cgroups feature enabled, you should specify that to storpool_cg conf
by setting set_memsw
to false/0
.
Note
The cgtool will use only cpus from the network interface local cpulist, which are commonly restricted to one numa node. If you want to allow the cgtool to use all cpus on the machine specify that to storpool_cg conf
by setting numa_overflow
to true/1
.
3. Using the cgtool
to configure multiple similar machines
It may happen that you want to run the cgtool
with the same command line arguments on multiple machines. The cgtool
provides a configuration file option. A sample configuration file looks like the following:
[cgtool]
CONVERGED=1
MGMT=0
SERVERS=2
SET_CACHE_SIZE=0
SYSTEM_LIMIT=4G
USER_LIMIT=4G
KERNEL_MEM=2G
IFACE_ACC=1
CORES=4
Running the cgtool
with storpool_cg conf -N -C example.conf
is equivalent to storpool_cg conf -N converged=1 mgmt=0 servers=2 set_cache_size=0 system_limit=4G user_limit=4G kernel_mem=2G iface_acc=true cores=4
.
Hint
You can see the example.cfg in the cgtool directory and explore the storpool_cg conf -h
.
3.1. Dumping a configuration
You can ‘ask’ the cgtool
how its configuration looks like after it has done all of its autodetections. This can be achieved by running storpool_cg conf -N converged=1 -D dumped.cfg
which will dump the cgtool
configuration in to dumped.cfg. Note that dumped.cfg is a valid configuration file for the cgtool
.
For example:
$ storpool_cg conf -N converged=1 -D dumped.cfg
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
subslice: storpool/common limit: 12806M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 2,22
core: 2 cpu: 4,24
core: 3 cpu: 6,26
core: 4 cpu: 8,28
core: 8 cpu:10,30
core: 9 cpu:12,32
core:10 cpu:14,34
core:11 cpu:16,36
core:12 cpu:18,38
socket:1
core: 0 cpu: 1,21
core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
core: 4 cpu: 9,29
core: 8 cpu:11,31
core: 9 cpu:13,33
core:10 cpu:15,35
core:11 cpu:17,37
core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########
$ cat dumped.cfg
##########START CONFIG##########
[cgtool]
CONVERGED=1
BLOCK=1
ISCSI=1
MGMT=1
BRIDGE=0
SERVERS=2
SET_CACHE_SIZE=1
SP_COMMON_LIMIT=12806M
SP_ALLOC_LIMIT=3328M
SYSTEM_LIMIT=2836M
KERNEL_MEM=1G
USER_LIMIT=2G
MACHINE_LIMIT=356G
IFACE=p4p1
IFACE_ACC=1
CORES=3
CONFDIR=/etc/cgconfig.d
CACHEDIR=/etc/storpool.conf.d
###########END CONFIG###########
4. Verifying machine cgroups state and configurations
4.1. storpool_cg print
storpool_cg print
is a simple script that reads the cgroups filesystem and reports its current state in a StorPool-friendly readable format. It is the same format used by the cgtool
for printing configurations. storpool_cg print
is useful for making yourself familiar with the machine configuration.
Here is an example:
$ storpool_cg print
slice: storpool.slice limit: 26631M
subslice: storpool.slice/alloc limit: 3328M
subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] --
core:1 cpus:[ 2 3] -- nic | rdma
core:2 cpus:[ 4 5] -- server | server_1
core:3 cpus:[ 6 7] -- iscsi | beacon,mgmt,block
socket:1
core:0 cpus:[ 8 9] --
core:1 cpus:[10 11] --
core:2 cpus:[12 13] --
core:3 cpus:[14 15] --
It can be used with the -N
and -S
options to display the numa nodes and cpuset slices for the cpus:
$ storpool_cg print -N -S
slice: storpool.slice limit: 26631M
subslice: storpool.slice/alloc limit: 3328M
subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] -- numa:[0 0] -- system user | system user
core:1 cpus:[ 2 3] -- numa:[0 0] -- storpool: nic | storpool: rdma
core:2 cpus:[ 4 5] -- numa:[0 0] -- storpool: server | storpool: server_1
core:3 cpus:[ 6 7] -- numa:[0 0] -- storpool: iscsi | storpool: beacon,mgmt,block
socket:1
core:0 cpus:[ 8 9] -- numa:[1 1] -- system user | system user
core:1 cpus:[10 11] -- numa:[1 1] -- system user | system user
core:2 cpus:[12 13] -- numa:[1 1] -- system user | system user
core:3 cpus:[14 15] -- numa:[1 1] -- system user | system user
The last option it accepts is the -U/--usage
. It will show a table with the memory usage of each memory slice it usually prints as well as what memory is left for the kernel.
$ storpool_cg print -U
slice usage limit perc free
=========================================================
machine.slice 0.00 / 13.21G 0.00% 13.21G
storpool.slice 2.86 / 10.17G 28.09% 7.32G
storpool.slice/alloc 0.20 / 4.38G 4.61% 4.17G
storpool.slice/common 2.66 / 5.80G 45.81% 3.14G
system.slice 2.13 / 4.44G 47.84% 2.32G
user.slice 0.65 / 2.00G 32.73% 1.35G
=========================================================
ALL SLICES 5.64 / 29.82G 18.91% 24.19G
reserved total perc kernel
=========================================================
NON KERNEL 29.82 / 31.26G 95.40% 1.44G
=========================================================
cpus for StorPool: [1, 2, 3, 4, 5, 6, 7]
socket:0
core:0 cpus:[ 0 1] -- | bridge,mgmt
core:1 cpus:[ 2 3] -- nic | rdma
core:2 cpus:[ 4 5] -- server | server_1
core:3 cpus:[ 6 7] -- iscsi | beacon,block
socket:1
core:0 cpus:[ 8 9] --
core:1 cpus:[10 11] --
core:2 cpus:[12 13] --
core:3 cpus:[14 15] --
4.2. storpool_cg check
storpool_cg check
will run a series of control group related checks on the machine and will report any errors or warnings it finds. It can used to identify cgroup-related problems.
Here is an example output:
$ storpool_cg check
M: ==== cpuset ====
E: user.slice and machine.slice cpusets intersect
E: machine.slice and system.slice cpusets intersect
M: ==== memory ====
W: memory left for kernel is 0MB
E: sum of storpool.slice, user.slice, system.slice, machine.slice limits is 33549.0MB, while total memory is 31899.46875MB
M: Done.
4.3. storpool_process
storpool_process
can find all StorPool processes running on the machine and report their cpuset and memory cgroups. It can be used to check in which cgroups do the StorPool processes run to quickly find problems (e.g. StorPool processes in the root cgroup).
To list all StorPool processes run:
$ storpool_process list
[pid] [service] [cpuset] [memory]
1121 stat system.slice system.slice/storpool_stat.service
1181 stat system.slice system.slice/storpool_stat.service
1261 stat system.slice system.slice/storpool_stat.service
1262 stat system.slice system.slice/storpool_stat.service
1263 stat system.slice system.slice/storpool_stat.service
1266 stat system.slice system.slice/storpool_stat.service
5743 server storpool.slice/server storpool.slice
14483 block storpool.slice/block storpool.slice
21327 stat system.slice system.slice/storpool_stat.service
27379 rdma storpool.slice/rdma storpool.slice
27380 rdma storpool.slice/rdma storpool.slice
27381 rdma storpool.slice/rdma storpool.slice
27382 rdma storpool.slice/rdma storpool.slice
27383 rdma storpool.slice/rdma storpool.slice
28940 mgmt storpool.slice/mgmt storpool.slice/alloc
29346 controller system.slice system.slice
29358 controller system.slice system.slice
29752 nvmed storpool.slice/beacon storpool.slice
29764 nvmed storpool.slice/beacon storpool.slice
30838 block storpool.slice/block storpool.slice
31055 server storpool.slice/server storpool.slice
31086 mgmt storpool.slice/mgmt storpool.slice/alloc
31450 beacon storpool.slice/beacon storpool.slice
31469 beacon storpool.slice/beacon storpool.slice
By default, processes are sorted by pid. You can specify the sorting by using the -S
parameter:
$ storpool_process list -S service pid
[pid] [service] [cpuset] [memory]
31450 beacon storpool.slice/beacon storpool.slice
31469 beacon storpool.slice/beacon storpool.slice
14483 block storpool.slice/block storpool.slice
30838 block storpool.slice/block storpool.slice
29346 controller system.slice system.slice
29358 controller system.slice system.slice
28940 mgmt storpool.slice/mgmt storpool.slice/alloc
31086 mgmt storpool.slice/mgmt storpool.slice/alloc
29752 nvmed storpool.slice/beacon storpool.slice
29764 nvmed storpool.slice/beacon storpool.slice
27379 rdma storpool.slice/rdma storpool.slice
27380 rdma storpool.slice/rdma storpool.slice
27381 rdma storpool.slice/rdma storpool.slice
27382 rdma storpool.slice/rdma storpool.slice
27383 rdma storpool.slice/rdma storpool.slice
5743 server storpool.slice/server storpool.slice
31055 server storpool.slice/server storpool.slice
1121 stat system.slice system.slice/storpool_stat.service
1181 stat system.slice system.slice/storpool_stat.service
1261 stat system.slice system.slice/storpool_stat.service
1262 stat system.slice system.slice/storpool_stat.service
1263 stat system.slice system.slice/storpool_stat.service
1266 stat system.slice system.slice/storpool_stat.service
21327 stat system.slice system.slice/storpool_stat.service
You can also use the storpool_process
tool to reclassify mispaced StorPool processes in their right cgroups. If the proper cgroups are configured in the storpool.conf you can run storpool_process reclassify
and the tool will classify each process to its right cpuset and memory cgroups. It is advisable to run storpool_process reclassify -N
(or even storpool_process reclassify -N -v
) first to see what processes will be moved where.
5. Using the cgtool
to update already configured machines
A possible solution for this scenario is to create a new configuration and reboot the machine. However, often you won’t be able to reboot, so the cgtool
offers a solution for ‘live’-migrating machines.
It is activated by the -M (--migrate)
command line option.
When you have a configuration that you want to apply to a machine (you know with what options you want to run the cgtool
) you have two options: run the cgtool
to create cgconfig.d
files and reboot or use storpool_cg conf
with the same options and the -M
option and let it try to apply it.
Attention
Before attempting a live migration, the cgtool
will run a series of checks to verify that it is safe to try the migration.
Attention
Migrating machines with the cgtool
is pseudo-transactional procedure. If the migration process fails, a rollback procedure will be attempted to restore the initial machine state. The rollback operation is not guaranteed to succeed! Some ‘extreme’ conditions must have occurred for the rollback to fail, though.
You can use the migration when:
Changing slice limits
Enabling hardware acceleration
Adding and removing services
Adding and removing disks
5.1. Migrating to new-style configs
Let’s look at the following machine:
$ storpool_cg print
slice: storpool.slice limit: 26G
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] -- nic | rdma
core:1 cpus:[ 2 3] -- server | block
core:2 cpus:[ 4 5] -- server_1 |
core:3 cpus:[ 6 7] -- server_2 | beacon
core:4 cpus:[ 8 9] --
core:5 cpus:[10 11] --
core:6 cpus:[12 13] --
core:7 cpus:[14 15] --
When you run a ‘fake’ migration with -NM
to see the desired configuration and the steps the tool will make to achieve it. All other arguments of storpool_cg conf
can be used with -M
, so if you need to tweak the number of cores, for example, you can still use cores=4
.
$ storpool_cg conf -NM
W: NIC is expected to be on cpu 2
########## START SUMMARY ##########
slice: storpool limit: 26696M
subslice: storpool/common limit: 26696M
subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [2, 3, 4, 5, 6, 7]
socket:0
core: 0 cpu: 0, 1
core: 1 cpu: 2, 3 <--- 2 - nic; 3 - rdma
core: 2 cpu: 4, 5 <--- 4 - server; 5 - server_1
core: 3 cpu: 6, 7 <--- 6 - server_2; 7 - block,beacon
core: 4 cpu: 8, 9
core: 5 cpu:10,11
core: 6 cpu:12,13
core: 7 cpu:14,15
########### END SUMMARY ###########
echo 2 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 5 > /sys/fs/cgroup/cpuset/storpool.slice/server_1/cpuset.cpus
echo 2-7 > /sys/fs/cgroup/cpuset/storpool.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/user.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/system.slice/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 7 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.memsw.limit_in_bytes
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/common
echo 1 > /sys/fs/cgroup/memory/storpool.slice/common/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/common/memory.move_charge_at_immigrate
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/common/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/alloc
echo 1 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.move_charge_at_immigrate
echo 0G > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.limit_in_bytes
echo 6143 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6682 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6692 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6913 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6926 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6977 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6987 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7174 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7185 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7585 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7604 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
When you are happy with the configuration and want to migrate to it, run the migration without -N
and then check if everything is ok with storpool_cg print -S
and storpool_cg check
.
$ storpool_cg conf -M
W: NIC is expected to be on cpu 2
$ storpool_cg print -S
slice: storpool.slice limit: 26696M
subslice: storpool.slice/alloc limit: 0G
subslice: storpool.slice/common limit: 26696M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] -- system user | system user
core:1 cpus:[ 2 3] -- storpool: nic | storpool: rdma
core:2 cpus:[ 4 5] -- storpool: server | storpool: server_1
core:3 cpus:[ 6 7] -- storpool: server_2 | storpool: beacon,block
core:4 cpus:[ 8 9] -- system user | system user
core:5 cpus:[10 11] -- system user | system user
core:6 cpus:[12 13] -- system user | system user
core:7 cpus:[14 15] -- system user | system user
$ storpool_cg check
M: ==== memory ====
W: memory:system.slice has more than 80% usage
Attention
One more thing you need to check after a migration is the output of storpool_process reclassify -N -v
. If it suggests moving some processes to different cgroups than the ones they are currently running in, you can do it by running just storpool_process reclassify
. Note that, storpool_process
will make suggestions based on the current SP_X_CGROUPS
variables in the storpool.conf
.