Control groups
This document gives an overview of kernel control groups (cgroups) feature, and how StorPool uses the cpuset and memory groups to optimize performance and protect the storage system from random events like out of memory states.
Kernel control groups and StorPool
Cgroups
For a good overview of cgroups, check the Description and Control Groups Version 1 sections in the cgroups manual page. For more detailed information, see the kernel cgroups documentation.
Cgexec
All StorPool services are started via the cgexec
utility. It runs the
service and accounts its resources in the given-by-parameters cgroups. For
example, cgexec ./test -g cpuset:cpuset_cg -g memory:memory_cg
will run the
test binary. It would limit its cpuset
resources by the limitations
defined in the cpuset_cg cgroup, and its memory resources by the limitations
defined in the memory_cg cgroup.
Slices
Common practice is to create cgroups with same names under different
controllers. Take, for example, the cpuset
and memory
controllers. If
one creates a test cgroup in the memory
controller, and also a test
cgroup in the cpuset
controller, this can be considered a slice. A more
appropriate name for the two cgroups would be test.slice.
Defining slices makes it easier to keep track of resources used by a process. If
you think of it as ‘the process runs in the test.slice
’ that implies both
cpuset restrictions from cpuset:test.slice
and memory restrictions from
memory:test.slice
.
Machines that run virtual guests have a machine.slice
where all the virtual
machines run, and system.slice
where the system processes run. Systemd
machines also have a user.slice
, where all user session processes run.
StorPool and Cgroups
Machines that run StorPool also have a storpool.slice
, where all StorPool
core services run. On a properly configured machine all slices will have
configured memory (and memory+swap) limits. This is done to guarantee two
things:
The kernel will have enough memory to run properly (explained later).
The storage system (StorPool) will not suffer from OOM situation that was not triggered by it.
The cpuset
controller in the storpool.slice
is used to:
Dedicate CPUs only for StorPool (that other slices do not have access to).
Map the dedicated CPUs to StorPool services in a specific manner to optimize performance.
Memory configuration
For the root cgroup, memory.use_hierarchy
should be set to 1
, so that a
hierarchical memory model is used for the cgroups.
For all slices:
memory.move_charge_at_immigrate
should be set to at least1
, and for thestorpool.slice
it should be set to3
.memory.limit_in_bytes
andmemory.memsw.limit_in_bytes
should be set to the same appropriate value.
For the storpool.slice
, memory.swappines
should be set to 0
.
Note
To ensure enough memory for the kernel, the sum of the memory limits of all slices should be at least 1G short of the total machine memory.
memory:storpool.slice
has two memory subslices - common
and alloc
.
The storpool.slice/alloc
subslice is used to limit the memory usage of the
mgmt, iscsi and bridge services, while the storpool.slice/common
subslice is for everything else. Their memory limits should also be configured
and their sum should be equal to the storpool.slice
memory limit.
Cpuset configuration
Dedicated CPUs for StorPool should be set in cpuset.cpus
of
storpool.slice
. All other slices’ cpuset.cpus
should be set to have all
the remaining CPUs. The cpuset.cpu_exclusive
flag in storpool.slice
should be set to 1
to ensure that the other slices cannot use the CPUs
dedicated for StorPool.
cpuset:storpool.slice
should have a subslice for each running StorPool
service on the machine. So, if you have two servers, beacon, mgmt and block
running, there should be storpool.slice/{server,server_1,beacon,mgmt,block}
subslices. These are used to assign the services to specific CPUs. This is
achieved by, for example, setting the cpuset.cpus
of the
storpool.slice/beacon
to 2
. This will restrict the storpool_beacon
to run on CPU#2.
Note that for machines that do not have hardware accelerated network cards the
storpool.slice
will also need a CPU for the nic, but there is no nic
subslice. That CPU must not be in any of the subslices (should be left empty).
For storpool.slice
and each of it subslices, the cpuset.mems
option
should be set to all available NUMA nodes on the machine (for example, 0-3 on a
4 NUMA nodes machine).
Cgroup configuration
To make Cgroup configurations persistent, set them in the configuration files in
/etc/cgconfig.d/
and reboot. When the machine boots, the cgconfig
service runs and applies the configuration using cgconfigparser
.
Writing configuration files for cgconfig
by hand could be a nasty and ugly
job. It is recommended to generate them using the storpool_cg
utility
provided by StorPool.
Note
After you create the configuration files you will need to restart the cgconfig service, or to parse the configuration files with cgconfigparser so that the configuration is applied to the machine.
Warning
Restarting the cgconfig service on machines that have already created cgroups with processes running in them will move the processes to the root cgroup! This is dangerous and is strongly advised NOT to do so!
Introduction to storpool_cg
Before you start
Before running storpool_cg
, make sure that all needed services are installed
and network interfaces for StorPool are properly configured in
storpool.conf
.
Format
The storpool_cg
tool should be used as follows:
$ storpool_cg [command] [options]
For [command]
, you must use one of the conf
, print
, or check
commands. You can also set options as needed. For details, see the following
sections.
Viewing results before applying them
When using the conf
command, it is always advisable first to run
storpool_cg
with the -N (-noop)
option, as shown in the example below:
$ storpool_cg conf -N
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
core: 3 cpu: 3,23
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
This way you can see an overview of the configuration the tool can create for
the machine. Note that the configuration is not written because the -N
option was used. This gives you the opportunity to decide whether the
configuration is appropriate for the machine:
Yes: You can apply it the configuration by running
storpool_cg
without the-N
option.No: Keep using the
-N
option and add some of the options described below, until you get a suitable configuration.
Creating cgroups configurations for freshly installed machines
Hypervisors
Setting slice limits
If you think some of the slice limits should be different - for example, you
want the system.slice
limit to be 4G
- you can do the following:
$ storpool_cg conf -N system_limit=4G
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120872M
slice: storpool limit: 692M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 0G
slice: system limit: 4G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 21, 22]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
core: 2 cpu: 2,22 <--- 2 - block; 22 - beacon
core: 3 cpu: 3,23
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
In the same manner, you can pass machine_limit
, user_limit
sp_common_limit
and sp_alloc_limit
to the command line. Values in MB are
also accepted with the M
suffix.
Setting number of CPUs
If you want to dedicate more CPUs to StorPool, you can run storpool_cg
with
the cores=<N>
parameter:
$ storpool_cg conf -N cores=3
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 122920M
slice: storpool limit: 692M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 -
core: 2 cpu: 2,22 <--- 2 - rdma; 22 -
core: 3 cpu: 3,23 <--- 3 - block; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
Note that on hyper-threaded machines one core will add two CPUs, while on machines without (or with disabled) hyper-threading one core will add one CPU.
The storpool_cg
tool detects which storpool services that need their own
cpuset subslice are installed on the machine. It might happen (while unexpected)
that you do not have all services installed yet.
Overriding services detection
You can override the services detection by specifying <service>=true
or
<service>=1
. For example, to add a mgmt service to the above
configuration:
$ storpool_cg conf -N cores=3 mgmt=1
W: NIC is expected to be on cpu 1
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - nic; 21 - rdma
core: 2 cpu: 2,22 <--- 2 - block; 22 -
core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
The storpool_cg
tool will also detect what driver the network card uses, and
if it can be used with hardware acceleration in the current StorPool
installation.
Overriding hardware acceleration
You can override the hardware acceleration detection by specifying
iface_acc=true/false
on the command line. Here is an example of the above
configuration with hardware acceleration enabled:
$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true
########## START SUMMARY ##########
slice: machine limit: 120744M
slice: storpool limit: 2868M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
core: 2 cpu: 2,22 <--- 2 - block; 22 -
core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
Note that storpool_cg
will leave 1G memory for the kernel.
Setting memory for the kernel
If you want to change the amount of memory for the kernel, you can specify the
kernel_mem=<X>
command line parameter. For example, reserving 3G for the
kernel:
$ storpool_cg conf -N cores=3 mgmt=1 iface_acc=true kernel_mem=3G
########## START SUMMARY ##########
slice: machine limit: 118696M
slice: storpool limit: 2868M
subslice: storpool/common limit: 692M
subslice: storpool/alloc limit: 2176M
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 21, 22, 23]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 1,21 <--- 1 - rdma; 21 -
core: 2 cpu: 2,22 <--- 2 - block; 22 -
core: 3 cpu: 3,23 <--- 3 - mgmt; 23 - beacon
core: 4 cpu: 4,24
core: 8 cpu: 5,25
core: 9 cpu: 6,26
core:10 cpu: 7,27
core:11 cpu: 8,28
core:12 cpu: 9,29
socket:1
core: 0 cpu:10,30
core: 1 cpu:11,31
core: 2 cpu:12,32
core: 3 cpu:13,33
core: 4 cpu:14,34
core: 8 cpu:15,35
core: 9 cpu:16,36
core:10 cpu:17,37
core:11 cpu:18,38
core:12 cpu:19,39
###################################
########### END SUMMARY ###########
Attention
storpool_cg
will use CPUs for the storpool.slice
from the
local CPUs list of the StorPool network interfaces.
Dedicated storage and hyperconverged machines
All options described in the Hypervisors section can also be used on storage and hyperconverged machines.
Warning
Before running storpool_cg
on a storage or hyperconverged machine, make
sure it meets the following conditions:
All its disks are initialized for StorPool. For details, see 7. Storage devices.
If it has NVMe disks that will be used with the
storpool_nvmed
service, ensure that these drives are NOT unbound from the kernelnvme
driver and are visible as block devices in /dev. For details, see 9. Background services.
Dedicated storage machines
Here is a sample output from the storpool_cg
on a dedicated storage machine,
+which has its disks configured to run in four storpool_server
instances and
will run the storpool_iscsi
service:
$ storpool_cg conf -N
########## START SUMMARY ##########
slice: storpool limit: 26382M
subslice: storpool/common limit: 23054M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
core: 4 cpu: 4,10
core: 5 cpu: 5,11
###################################
SP_CACHE_SIZE=2048
SP_CACHE_SIZE_1=2048
SP_CACHE_SIZE_2=2048
SP_CACHE_SIZE_3=2048
########### END SUMMARY ###########
First thing to notice is the SP_CACHE_SIZE{_X}
variable at the bottom. By
default, when run on a node with local disks, storpool_cg
will set the cache
sizes for different storpool_server
instances. These values will be written
in /etc/storpool.conf.d/cache-size.conf.
Cache size
If you don’t want storpool_cg
to set the server caches (maybe you have
already done it yourself) you can set the set_cache_size
command line
parameter to false
:
$ storpool_cg conf -N set_cache_size=false
########## START SUMMARY ##########
slice: storpool limit: 26382M
subslice: storpool/common limit: 23054M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 3, 6, 7, 8, 9]
socket:0
core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
core: 3 cpu: 3, 9 <--- 3 - server_2; 9 - server_3
core: 4 cpu: 4,10
core: 5 cpu: 5,11
########### END SUMMARY ###########
As shown in the example above, SP_CACHE_SIZE{_X}
disappeared from the config
summary, which means they won’t be changed.
Number of servers
storpool_cg
detects how many server instances will be running on the machine
by reading the storpool_initdisk --list
output. If you haven’t configured
the right amount of servers on the machine, you can override this detection by
specifying the servers
command line parameter:
$ storpool_cg conf -N set_cache_size=false servers=2
########## START SUMMARY ##########
slice: storpool limit: 26382M
subslice: storpool/common limit: 23054M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2445M
slice: user limit: 2G
###################################
cpus for StorPool: [1, 2, 6, 7, 8]
socket:0
core: 0 cpu: 0, 6 <--- 6 - mgmt,block,beacon
core: 1 cpu: 1, 7 <--- 1 - rdma; 7 - iscsi
core: 2 cpu: 2, 8 <--- 2 - server; 8 - server_1
core: 3 cpu: 3, 9
core: 4 cpu: 4,10
core: 5 cpu: 5,11
########### END SUMMARY ###########
Hyperconverged machines
On hyperconverged machines storpool_cg
should be run with the converged
command line parameter set to true
(or 1
). There are two major
differences compared to configuring storage-only nodes:
A
machine.slice
will be created for the machine.The memory limit of the
storpool.slice
will be calculated carefully to be minimal, but sufficient for StorPool to run without problems.
$ storpool_cg conf -N converged=1
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
subslice: storpool/common limit: 12806M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 2,22
core: 2 cpu: 4,24
core: 3 cpu: 6,26
core: 4 cpu: 8,28
core: 8 cpu:10,30
core: 9 cpu:12,32
core:10 cpu:14,34
core:11 cpu:16,36
core:12 cpu:18,38
socket:1
core: 0 cpu: 1,21
core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
core: 4 cpu: 9,29
core: 8 cpu:11,31
core: 9 cpu:13,33
core:10 cpu:15,35
core:11 cpu:17,37
core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########
Warning
If the machine does not boot with the kernel memsw
cgroups
feature enabled, you should specify that to storpool_cg conf
by setting set_memsw
to false
(or 0
).
Note that storpool_cg
will use only CPUs from the network interface local
cpulist, which are commonly restricted to one NUMA node. If you want to allow
storpool_cg
to use all CPUs on the machine, specify that to storpool_cg
conf
by setting numa_overflow
to true
(or 1
).
Configuring multiple similar machines
It may happen that you want to run storpool_cg
with the same command line
arguments on multiple machines. The easiest way to do this is to set the
corresponding options in the configuration file of storpool_cg
, as shown in
the example below:
[cgtool]
CONVERGED=1
MGMT=0
SERVERS=2
SET_CACHE_SIZE=0
SYSTEM_LIMIT=4G
USER_LIMIT=4G
KERNEL_MEM=2G
IFACE_ACC=1
CORES=4
Having these options in a file called example.conf
, you can simply run
storpool_cg
by telling it about the file:
storpool_cg conf -N -C example.conf
This is equivalent to setting the options on the command line, like this:
storpool_cg conf -N converged=1 mgmt=0 servers=2 set_cache_size=0 system_limit=4G user_limit=4G kernel_mem=2G iface_acc=true cores=4
Hint
For more information about the options you can set in the
configuration file, check the example in
/usr/share/doc/storpool/examples/cgtool/example.cfg
, or run
storpool_cg conf -h
.
Saving a configuration as a file
You can use the -D
option to save the configuration detected by
storpool_cg
as a file. For example, by running storpool_cg conf -N
converged=1 -D my-cgconf.cfg
the configuration would be saved in the
my-cgconf.cfg file. Note that my-cgconf.cfg is a valid configuration file
for storpool_cg
. As shown below, you can create the file and check its
content:
$ storpool_cg conf -N converged=1 -D my-cgconf.cfg
##########START SUMMARY##########
slice: machine limit: 356G
slice: storpool limit: 16134M
subslice: storpool/common limit: 12806M
subslice: storpool/alloc limit: 3328M
slice: system limit: 2836M
slice: user limit: 2G
#################################
cpus for StorPool: [3, 5, 7, 23, 25, 27]
socket:0
core: 0 cpu: 0,20
core: 1 cpu: 2,22
core: 2 cpu: 4,24
core: 3 cpu: 6,26
core: 4 cpu: 8,28
core: 8 cpu:10,30
core: 9 cpu:12,32
core:10 cpu:14,34
core:11 cpu:16,36
core:12 cpu:18,38
socket:1
core: 0 cpu: 1,21
core: 1 cpu: 3,23 <--- 3 - rdma; 23 - server
core: 2 cpu: 5,25 <--- 5 - server_1; 25 - mgmt,beacon
core: 3 cpu: 7,27 <--- 7 - iscsi; 27 - block
core: 4 cpu: 9,29
core: 8 cpu:11,31
core: 9 cpu:13,33
core:10 cpu:15,35
core:11 cpu:17,37
core:12 cpu:19,39
#################################
SP_CACHE_SIZE=1024
SP_CACHE_SIZE_1=4096
###########END SUMMARY###########
$ cat my-cgconf.cfg
##########START CONFIG##########
[cgtool]
CONVERGED=1
BLOCK=1
ISCSI=1
MGMT=1
BRIDGE=0
SERVERS=2
SET_CACHE_SIZE=1
SP_COMMON_LIMIT=12806M
SP_ALLOC_LIMIT=3328M
SYSTEM_LIMIT=2836M
KERNEL_MEM=1G
USER_LIMIT=2G
MACHINE_LIMIT=356G
IFACE=p4p1
IFACE_ACC=1
CORES=3
CONFDIR=/etc/cgconfig.d
CACHEDIR=/etc/storpool.conf.d
###########END CONFIG###########
Verifying machine cgroups state and configurations
storpool_cg print
storpool_cg print
is a simple script that reads the cgroups filesystem
and reports its current state in a StorPool-friendly readable format. It is in
the same format used by storpool_cg
for printing configurations.
storpool_cg print
is useful for making yourself familiar with the machine
configuration. Here is an example:
$ storpool_cg print
slice: storpool.slice limit: 26631M
subslice: storpool.slice/alloc limit: 3328M
subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] --
core:1 cpus:[ 2 3] -- nic | rdma
core:2 cpus:[ 4 5] -- server | server_1
core:3 cpus:[ 6 7] -- iscsi | beacon,mgmt,block
socket:1
core:0 cpus:[ 8 9] --
core:1 cpus:[10 11] --
core:2 cpus:[12 13] --
core:3 cpus:[14 15] --
It can be used with the -N
and -S
options to display the NUMA nodes and
cpuset slices for the CPUs:
$ storpool_cg print -N -S
slice: storpool.slice limit: 26631M
subslice: storpool.slice/alloc limit: 3328M
subslice: storpool.slice/common limit: 23303M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] -- numa:[0 0] -- system user | system user
core:1 cpus:[ 2 3] -- numa:[0 0] -- storpool: nic | storpool: rdma
core:2 cpus:[ 4 5] -- numa:[0 0] -- storpool: server | storpool: server_1
core:3 cpus:[ 6 7] -- numa:[0 0] -- storpool: iscsi | storpool: beacon,mgmt,block
socket:1
core:0 cpus:[ 8 9] -- numa:[1 1] -- system user | system user
core:1 cpus:[10 11] -- numa:[1 1] -- system user | system user
core:2 cpus:[12 13] -- numa:[1 1] -- system user | system user
core:3 cpus:[14 15] -- numa:[1 1] -- system user | system user
The last option it accepts is the -U/--usage
. It will show a table with the
memory usage of each memory slice it usually prints, as well as what memory is
left for the kernel.
$ storpool_cg print -U
slice usage limit perc free
=========================================================
machine.slice 0.00 / 13.21G 0.00% 13.21G
storpool.slice 2.86 / 10.17G 28.09% 7.32G
storpool.slice/alloc 0.20 / 4.38G 4.61% 4.17G
storpool.slice/common 2.66 / 5.80G 45.81% 3.14G
system.slice 2.13 / 4.44G 47.84% 2.32G
user.slice 0.65 / 2.00G 32.73% 1.35G
=========================================================
ALL SLICES 5.64 / 29.82G 18.91% 24.19G
reserved total perc kernel
=========================================================
NON KERNEL 29.82 / 31.26G 95.40% 1.44G
=========================================================
cpus for StorPool: [1, 2, 3, 4, 5, 6, 7]
socket:0
core:0 cpus:[ 0 1] -- | bridge,mgmt
core:1 cpus:[ 2 3] -- nic | rdma
core:2 cpus:[ 4 5] -- server | server_1
core:3 cpus:[ 6 7] -- iscsi | beacon,block
socket:1
core:0 cpus:[ 8 9] --
core:1 cpus:[10 11] --
core:2 cpus:[12 13] --
core:3 cpus:[14 15] --
storpool_cg check
storpool_cg check
will run a series of cgroup-related checks on the machine,
and will report any errors or warnings it finds. It can be used to identify
cgroup-related problems. Here is an example:
$ storpool_cg check
M: ==== cpuset ====
E: user.slice and machine.slice cpusets intersect
E: machine.slice and system.slice cpusets intersect
M: ==== memory ====
W: memory left for kernel is 0MB
E: sum of storpool.slice, user.slice, system.slice, machine.slice limits is 33549.0MB, while total memory is 31899.46875MB
M: Done.
storpool_process
storpool_process
can find all StorPool processes running on the machine and
report their cpuset and memory cgroups. It can be used to check in which
cgroups do the StorPool processes run to quickly find problems (for example,
StorPool processes in the root cgroup).
To list all StorPool processes run:
$ storpool_process list
[pid] [service] [cpuset] [memory]
1121 stat system.slice system.slice/storpool_stat.service
1181 stat system.slice system.slice/storpool_stat.service
1261 stat system.slice system.slice/storpool_stat.service
1262 stat system.slice system.slice/storpool_stat.service
1263 stat system.slice system.slice/storpool_stat.service
1266 stat system.slice system.slice/storpool_stat.service
5743 server storpool.slice/server storpool.slice
14483 block storpool.slice/block storpool.slice
21327 stat system.slice system.slice/storpool_stat.service
27379 rdma storpool.slice/rdma storpool.slice
27380 rdma storpool.slice/rdma storpool.slice
27381 rdma storpool.slice/rdma storpool.slice
27382 rdma storpool.slice/rdma storpool.slice
27383 rdma storpool.slice/rdma storpool.slice
28940 mgmt storpool.slice/mgmt storpool.slice/alloc
29346 controller system.slice system.slice
29358 controller system.slice system.slice
29752 nvmed storpool.slice/beacon storpool.slice
29764 nvmed storpool.slice/beacon storpool.slice
30838 block storpool.slice/block storpool.slice
31055 server storpool.slice/server storpool.slice
31086 mgmt storpool.slice/mgmt storpool.slice/alloc
31450 beacon storpool.slice/beacon storpool.slice
31469 beacon storpool.slice/beacon storpool.slice
By default, processes are sorted by pid. You can specify the sorting by using
the -S
parameter:
$ storpool_process list -S service pid
[pid] [service] [cpuset] [memory]
31450 beacon storpool.slice/beacon storpool.slice
31469 beacon storpool.slice/beacon storpool.slice
14483 block storpool.slice/block storpool.slice
30838 block storpool.slice/block storpool.slice
29346 controller system.slice system.slice
29358 controller system.slice system.slice
28940 mgmt storpool.slice/mgmt storpool.slice/alloc
31086 mgmt storpool.slice/mgmt storpool.slice/alloc
29752 nvmed storpool.slice/beacon storpool.slice
29764 nvmed storpool.slice/beacon storpool.slice
27379 rdma storpool.slice/rdma storpool.slice
27380 rdma storpool.slice/rdma storpool.slice
27381 rdma storpool.slice/rdma storpool.slice
27382 rdma storpool.slice/rdma storpool.slice
27383 rdma storpool.slice/rdma storpool.slice
5743 server storpool.slice/server storpool.slice
31055 server storpool.slice/server storpool.slice
1121 stat system.slice system.slice/storpool_stat.service
1181 stat system.slice system.slice/storpool_stat.service
1261 stat system.slice system.slice/storpool_stat.service
1262 stat system.slice system.slice/storpool_stat.service
1263 stat system.slice system.slice/storpool_stat.service
1266 stat system.slice system.slice/storpool_stat.service
21327 stat system.slice system.slice/storpool_stat.service
You can also use the storpool_process
tool to reclassify misplaced StorPool
processes in their right cgroups. If the proper cgroups are configured in
storpool.conf you can run storpool_process reclassify
, and the tool will
classify each process to its right cpuset and memory cgroup. It is advisable to
run storpool_process reclassify -N
(or even storpool_process reclassify -N
-v
) first to see which processes are affected and where will they be moved.
Updating already configured machines
Sometimes you might need to update machines that are already configured. A
possible solution for this scenario is to create a new configuration and reboot
the machine. However, often you won’t be able to reboot, so storpool_cg
offers a solution for ‘live’-migrating machines. It is activated by the -M
(--migrate)
command line option.
When you have a configuration that you want to apply to a machine (you know with
what options you want to run storpool_cg
) you have two options:
Run the
storpool_cg
to createcgconfig.d
files, and then reboot.Use
storpool_cg conf
with the same options plus the-M
option, and let it try to apply it.
Attention
Note the following:
Before attempting a live migration,
storpool_cg
will run a series of checks to verify that it is safe to try the migration.Migrating machines with
storpool_cg
is pseudo-transactional procedure. If the migration process fails, a rollback procedure will be attempted to restore the initial machine state. The rollback operation is not guaranteed to succeed! Some ‘extreme’ conditions must have occurred for the rollback to fail, though.
You can use the migration in the following cases:
Changing slice limits
Enabling hardware acceleration
Adding and removing services
Adding and removing disks
Migrating to new-style configuration
Let’s look at the following machine:
$ storpool_cg print
slice: storpool.slice limit: 26G
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] -- nic | rdma
core:1 cpus:[ 2 3] -- server | block
core:2 cpus:[ 4 5] -- server_1 |
core:3 cpus:[ 6 7] -- server_2 | beacon
core:4 cpus:[ 8 9] --
core:5 cpus:[10 11] --
core:6 cpus:[12 13] --
core:7 cpus:[14 15] --
You can run a ‘fake’ migration with -NM
to see the desired configuration and
the steps the tool will make to achieve it. All other arguments of
storpool_cg conf
can be used with -M
, so (for example) if you need to
tweak the number of cores, you can still use cores=4
.
$ storpool_cg conf -NM
W: NIC is expected to be on cpu 2
########## START SUMMARY ##########
slice: storpool limit: 26696M
subslice: storpool/common limit: 26696M
subslice: storpool/alloc limit: 0G
slice: system limit: 2G
slice: user limit: 2G
###################################
cpus for StorPool: [2, 3, 4, 5, 6, 7]
socket:0
core: 0 cpu: 0, 1
core: 1 cpu: 2, 3 <--- 2 - nic; 3 - rdma
core: 2 cpu: 4, 5 <--- 4 - server; 5 - server_1
core: 3 cpu: 6, 7 <--- 6 - server_2; 7 - block,beacon
core: 4 cpu: 8, 9
core: 5 cpu:10,11
core: 6 cpu:12,13
core: 7 cpu:14,15
########### END SUMMARY ###########
echo 2 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 5 > /sys/fs/cgroup/cpuset/storpool.slice/server_1/cpuset.cpus
echo 2-7 > /sys/fs/cgroup/cpuset/storpool.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/user.slice/cpuset.cpus
echo 0-1,8-15 > /sys/fs/cgroup/cpuset/system.slice/cpuset.cpus
echo 4 > /sys/fs/cgroup/cpuset/storpool.slice/server/cpuset.cpus
echo 3 > /sys/fs/cgroup/cpuset/storpool.slice/rdma/cpuset.cpus
echo 7 > /sys/fs/cgroup/cpuset/storpool.slice/block/cpuset.cpus
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.memsw.limit_in_bytes
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/common
echo 1 > /sys/fs/cgroup/memory/storpool.slice/common/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/common/memory.move_charge_at_immigrate
echo 26696M > /sys/fs/cgroup/memory/storpool.slice/common/memory.limit_in_bytes
mkdir /sys/fs/cgroup/memory/storpool.slice/alloc
echo 1 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.use_hierarchy
echo 3 > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.move_charge_at_immigrate
echo 0G > /sys/fs/cgroup/memory/storpool.slice/alloc/memory.limit_in_bytes
echo 6143 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6682 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6692 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6913 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6926 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6977 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 6987 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7174 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7185 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7585 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
echo 7604 > /sys/fs/cgroup/memory/storpool.slice/common/cgroup.procs
When you are happy with the configuration and want to migrate to it, run the
migration without -N
. Then check if everything is OK with storpool_cg
print -S
and storpool_cg check
.
$ storpool_cg conf -M
W: NIC is expected to be on cpu 2
$ storpool_cg print -S
slice: storpool.slice limit: 26696M
subslice: storpool.slice/alloc limit: 0G
subslice: storpool.slice/common limit: 26696M
slice: system.slice limit: 2G
slice: user.slice limit: 2G
socket:0
core:0 cpus:[ 0 1] -- system user | system user
core:1 cpus:[ 2 3] -- storpool: nic | storpool: rdma
core:2 cpus:[ 4 5] -- storpool: server | storpool: server_1
core:3 cpus:[ 6 7] -- storpool: server_2 | storpool: beacon,block
core:4 cpus:[ 8 9] -- system user | system user
core:5 cpus:[10 11] -- system user | system user
core:6 cpus:[12 13] -- system user | system user
core:7 cpus:[14 15] -- system user | system user
$ storpool_cg check
M: ==== memory ====
W: memory:system.slice has more than 80% usage
Attention
One more thing you need to check after a migration is the output
of storpool_process reclassify -N -v
. If it suggests moving
some processes to different cgroups than the ones they are
currently running in, you can do it by running just
storpool_process reclassify
. Note that storpool_process
will make suggestions based on the current SP_X_CGROUPS
variables in the storpool.conf
.