Metrics collected and available from StorPool

Overview

A StorPool cluster collects and sends metrics information about its performance and related stats. These are described below.

Customers can access the data either:

Via analytics.storpool.com on pre-defined dashboards
Via direct access in the InfluxDB instance for their cluster (please contact StorPool support to get access)
Via an InfluxDB (or other database that supports InfluxDB’s line protocol) of their own, by configuring SP_STATDB_ADDITIONAL in /etc/storpool.conf, see the last section in this document.

The Grafana dashboards used by StorPool at analytics.storpool.com are available on request, via StorPool support.

Internals

In this section you can find some details on the operation of the data collection and InfluxDB, which could be helpful to customers that would like to operate their own metrics database for StorPool.

`storpool_stat` operations

The storpool_stat tool is responsible for collecting and sending most of the information to the InfluxDB instances. It’s general flow is as follows:

On start, it forks one child per measurement type and one per receiving database
All measurement processes collect data and write atomically a file every few seconds in /tmp/storpool_stat
The sending processes take the data from the files and push it to the databases, then delete the files
If any file is found to be older than two days, the file is removed.

Note

It has been known for storpool_stat to fill up /tmp on loss of network connectivity for nodes with large amount of measured elements (CPUs, volumes).

To configure an extra metrics database for data to be pushed to, the SP_STATDB_ADDITIONAL parameter needs to be set in storpool.conf. It must contain a full URL to the write endpoint of the database, for example http://USER:PASSWORD@10.1.2.3:8086/write?db=metrics. Note that if the URL scheme is https, the endpoint will need a valid certificate.

Data interpolation between `s1` and `m1` retention policies

In StorPool’s initial deployment, this was done by continuous queries. Below is what was used:

CREATE CONTINUOUS QUERY "1s_to_1m" ON DBNAME BEGIN SELECT mean(*) INTO m1.:MEASUREMENT FROM /.*/ GROUP BY time(1m),* FILL(NULL) END
CREATE CONTINUOUS QUERY "1s_to_1m_max" ON DBNAME BEGIN SELECT max(*) INTO m1.:MEASUREMENT FROM /.*/ GROUP BY time(1m),* FILL(NULL) END'

This solution did not scale for larger amount of databases, as the continuous queries are executed sequentially with no parallelization. For this purpose, StorPool developed cqsched (available to customers on request to StorPool support) to process multiple databases and measurements in parallel.

Disk usage and IO of InfluxDB databases

The IO requirements of a single database are remarkably modest. For planing purposes, you should note that a cluster sends a data point every second for:

every CPU thread
every attached volume
every HDD or SSD drive on a node.

For disk usage, as an example, a cluster with ~800 attached volumes and ~800 CPU threads takes 11GiB space for the s1 measurement, and 78GiB for the m1 measurement.

Data structure

The main unit is a “measurement”, which contains the data for a specific measurement (disk I/O, CPU usage, etc).

All of the data below has tags and fields. Basically, a “tag” is something that is used to filter data, and field is something used to do calculations/plot graphs.

For more information, see InfluxDB’s documentation at https://docs.influxdata.com/influxdb/v1/concepts/key_concepts/.

There are two retention policies for data: the per-second data (s1) is retained for 7 days, the per-minute data (m1) is retained for 730 days (2 years). All data from storpool_stat is pushed into s1 and the InfluxDB instances take care of downsampling it to per-minute data either via continuous queries or other means.

Measurements reference

The instances contain the following measurements:

bridgestatus
cpustat
disk
diskiostat
diskstat
iostat
iscsisession
memstat
netstat
servicestat
task
template

bridgestatus

Collected by: storpool_monitor, pushed via the monitoring system.

Collected from: storpool remoteBridge status

This measurement is basically storpool remoteBridge status collected once a minute.

The tcpInfo_* fields are direct copy of the tcp_info structure in the Linux kernel for the relevant connection.

Tags:

clusterId - ID of the remote cluster
connectionState - current state of the connection (string)
ip - IP address of the remote bridge
lastErrno - errno of the last error
lastError - string representation of the last error
protocolVersion - StorPool bridge protocol version.

Fields:

countersSinceConnect_bytesRecv - bytes received in the current connection
countersSinceConnect_bytesSent - bytes sent in the current connection
countersSinceStart_bytesRecv - bytes received from peer since the start of the process
countersSinceStart_bytesSent - bytes sent to peer since the start of the process
tcpInfo_tcpi_advmss - this and all below - tcpinfo field.
tcpInfo_tcpi_ato
tcpInfo_tcpi_backoff
tcpInfo_tcpi_ca_state
tcpInfo_tcpi_fackets
tcpInfo_tcpi_last_ack_recv
tcpInfo_tcpi_last_ack_sent
tcpInfo_tcpi_last_data_recv
tcpInfo_tcpi_last_data_sent
tcpInfo_tcpi_lost
tcpInfo_tcpi_options
tcpInfo_tcpi_pmtu
tcpInfo_tcpi_probes
tcpInfo_tcpi_rcv_mss
tcpInfo_tcpi_rcv_rtt
tcpInfo_tcpi_rcv_space
tcpInfo_tcpi_rcv_ssthresh
tcpInfo_tcpi_rcv_wscale
tcpInfo_tcpi_reordering
tcpInfo_tcpi_retrans
tcpInfo_tcpi_retransmits
tcpInfo_tcpi_rto
tcpInfo_tcpi_rtt
tcpInfo_tcpi_rttvar
tcpInfo_tcpi_sacked
tcpInfo_tcpi_snd_cwnd
tcpInfo_tcpi_snd_mss
tcpInfo_tcpi_snd_ssthresh
tcpInfo_tcpi_snd_wscale
tcpInfo_tcpi_state
tcpInfo_tcpi_total_retrans
tcpInfo_tcpi_unacked

cpustat

Collected by: storpool_stat Collected from: /proc/schedstat, /proc/stat

For extra information, see the documentation of the Linux kernel at https://docs.kernel.org/filesystems/proc.html for the two files above.

Tags:

cpu - the CPU thread the stats are for
hostname - hostname of the node
labels - pipe-delimited (|) list of StorPool services that are pinned on the CPU
server - SP_OURID of the node.

Fields:

guest - Amount of time running a guest (virtual machine)
guest_nice - Amount of time running a lower-priority (“nice”) guest (virtual machine)
idle - Amount of time the CPU has been idle
iowait - Amount of time the CPU has been idle and waiting for I/O
irq - Amount of time the CPU has processed interrupts
nice - Amount of time the CPU was running lower-priority (“nice”) tasks
run - Amount of time a task has been running on the cpu
runwait - Sum of run and wait
softirq - Amount of time the CPU has processed software interrupts
steal- Amount of time the CPU was not able to run because the host didn’t allow this (has meaning only for virtual machines)
system - Amount of time the CPU was executing kernel (and non-IRQ) tasks
user - Amount of time the CPU was executing in user-space
wait - Amount of time tasks(s) have been waiting to run on the CPU.

Note

run and wait come from the scheduler stats. Their main benefit is that they allow for the contention of the system to be measured, for example wait on a host running virtual machines translates directly to steal inside the virtual machines.

disk

Collected by: storpool_monitor, pushed via the monitoring system

Collected from: storpool disk list

This measurement is basically storpool disk list collected once a minute.

Tags:

id - Disk ID in StorPool
isWbc - Does the drive have write-back cache enabled
model - Drive model
noFlush - Is the drive initialized to not send FLUSH commands
noFua - Is the drive initialized not to use FUA (Force Unit Access)
noTrim - Is the drive initialized not to use TRIM/DISCARD
serial - Serial number of the drive
serverId - Server instance ID of the storpool_server working with the drive
ssd - Is the drive an SSD/NVMe device.

Fields:

agAllocated - Allocation groups currently in use
agCount - Total number of allocation groups
agFree - Free allocation groups
agFreeNotTrimmed - (internal) free allocation groups that haven’t been trimmed yet
agFreeing - (internal) allocation groups currently being freed
agFull - (internal) allocation groups that are full
agMaxSizeFull - (internal) allocation groups that are full with max-sized entries
agMaxSizePartial - (internal) allocation groups that have only max-sized entries, but are not full
agPartial - (internal) allocation groups that are partially full
aggregateScore_entries - aggregate score for entries
aggregateScore_space - aggregate score for space
aggregateScore_total - combined aggregate score
entriesAllocated - Entries currently in use
entriesCount - Total number of entries
entriesFree - Free entries
lastScrubCompleted - Time stamp of the last completed scrubbing operation
objectsAllocated - Objects currently in use
objectsCount - Total number of objects
objectsFree - Free objects
objectsOnDiskSize - Total amount of user data on drive (sum of all data in objects)
scrubbedBytes - Progress of the current scrubbing operation
scrubbingBW - Bandwidth of the current scrubbing operation
scrubbingFinishAfter - estimated ETA of the scrubbing operation
scrubbingStartedBefore - approximate start of the scrubbing operation
sectorsCount - Number of sectors of the drive
totalErrorsDetected - Total errors detected by checksum verification on the drive.

diskiostat

Collected by: storpool_stat

Collected from: /proc/diskstats, storpool_initdisk --list

This measurement collects I/O stats for all HDD and SATA SSD drives on the system. For extra information, see the documentation of the Linux kernel at https://docs.kernel.org/admin-guide/iostats.html.

Tags:

device - device name
hostname - host name of the node
server - SP_OURID of the node
sp_id - disk ID in StorPool (if applicable). Journal devices are prefixed with j
ssd - is the drive SSD.

Fields:

queue_depth - queue utilization
r_wait - wait time for read operations
read_bytes - bytes transferred for read operations
reads - number of read operations
reads_merges - merged read operations
utilization - device utilization (time busy)
w_wait - wait time for write operations
wait - total wait time
write_bytes - bytes transferred for write operations
write_merges - merged write operations
writes - number of write operations

diskstat

Collected by: storpool_stat Collected from: /usr/lib/storpool/server_stat

These metrics show the performance of the drive and operations as seen from the storpool_server processes.

Tags:

disk - the ID of the disk in StorPool
hostname - host name of the server
server - SP_OURID of the server

Fields:

aggregation_completion_time -
aggregations - number of aggregation operations performed
disk_initiated_read_bytes -
disk_initiated_reads -
disk_read_operations_completion_time -
disk_reads_completion_time -
disk_trims_bytes -
disk_trims_count -
disk_write_operations_completion_time -
disk_writes_completion_time -
entry_group_switches - (internal)
max_disk_writes_completion_time -
max_outstanding_read_requests - peak read requests in the queue
max_outstanding_write_requests - peak write requests in the queue
max_transfer_time -
metadata_completion_time -
pct_utilization_aggregation - drive utilization for aggregation
pct_utilization_metadata - drive utilization for metadata operations
pct_utilization_reads - drive utilization for read operations
pct_utilization_server_reads - drive utilization for server reads
pct_utilization_sys - drive utilization for system operations
pct_utilization_total - drive utilization in total
pct_utilization_total2 -
pct_utilization_unknwon -
pct_utilization_user -
pct_utilization_writes - drive utilization for write operations
queued_read_requests - number of read operations in the queue
queued_write_requests - number of write operations in the queue
read_balance_forward_double_dispatch -
read_balance_forward_double_dispatch_pct -
read_balance_forward_rcvd -
read_balance_forwards_sent -
read_bytes - bytes transferred for read operations
reads - read operation
reads_completion_time -
server_read_bytes - bytes transferred for server reads
server_reads - server reads (requests from other servers)
transfer_average_time -
trims - TRIM operations issued to the device
write_bytes - bytes transferred for write operations
writes - write operations
writes_completion_time -

iostat

Collected by: storpool_stat

Collected from: /proc/schedstat, /proc/stat

This measurement collects data for the I/O usage and latency for the volumes attached in hosts via the native StorPool driver. These are the same as for diskiostat.

Tags:

hostname - host name of the node
server - SP_OURID of the node
volume - volume name.

Fields:

queue_depth - queue utilization
r_wait - wait time for read operations
read_bytes - bytes transferred for read operations
reads - number of read operations
reads_merges - merged read operations
utilization - device utilization (time busy)
w_wait - wait time for write operations
wait - total wait time
write_bytes - bytes transferred for write operations
write_merges - merged write operations
writes - number of write operations

iscsisession

Collected by: storpool_monitor, pushed via the monitoring system Collected from: storpool iscsi sessions list

This measurement is storpool iscsi sessions list, collected once a minute. The data in it is counters, not differences, except the tasks_* fields, which are the current usage of the task queue.

Tags:

ISID - ISID
connectionId - numeric ID of the connection
controllerId - SP_OURID of the target exporting node
hwPort - network interface number (0/1)
initiator - initiator IQN
initiatorIP - initiator IP address
initiatorId - internal numeric ID for the initiator
initiatorPort - initiator originating TCP port
localMSS - MSS for the TCP connection
portalIP - IP of the portal
portalPort - TCP port of the portal
status - status of the connection
target - target name
targetId - numerical target ID
timeCreated - timestamp of connection creation

Fields:

dataHoles - data “holes” observer, either because of dropped packets or reordering
discardedBytes - amount of data discarded
discardedPackets - number of packets discarded
newBytesIn - bytes in SYN packets
newBytesOut - bytes in SYN and/or ACK packets
newPacketsIn - number of SYN packets
newPacketsOut - bytes in SYN and/or ACK packets.
retransmitsAcks - number of fast retransmits
retransmitsAcks2 - number of second retransmits
retransmitsTimeout - number of retransmissions because of a timeout
retransmittedBytes - amount of retransmitted data
retransmittedPackets - number of retransmitted packets
stats_dataIn - amount of data received
stats_dataOut - amount of data sent
stats_login - number of login requests
stats_loginRsp - number of login responses
stats_logout - number of logout requests
stats_logoutRsp - number of logout responses
stats_nopIn - number of NOPs received
stats_nopOut - number of NOPs sent
stats_r2t -
stats_reject -
stats_scsi -
stats_scsiRsp -
stats_snack -
stats_task -
stats_taskRsp -
stats_text -
stats_textRsp -
tasks_aborted - task slots with ABORT tasks
tasks_dataOut - total number of tasks for sending data
tasks_dataResp - tasks responding with data
tasks_inFreeList - available tasks slots
tasks_processing - task slots currently being processed
tasks_queued - task slots queued for processing
tcp_remoteMSS - MSS advertised from the remote side
tcp_remoteWindowSize - remote side window size
tcp_wscale - TCP window scaling factor
totalBytesIn - Total bytes received
totalBytesOut - Total bytes sent
totalPacketsIn - Total packets received
totalPacketsOut - Total packets sent.

memstat

Collected by: storpool_stat Collected from: /sys/fs/cgroup/memory/**/memory.stat

This measurement describes the memory usage of the mode and its cgroups. For the full description of the fields, see the documentation of the Linux kernel at https://docs.kernel.org/admin-guide/cgroup-v1/memory.html.

Tags:

cgroup - name of the cgroup;
hostname - host name of the node
server - SP_OURID of the node.

Fields:

active_anon
active_file
cache
hierarchical_memory_limit
hierarchical_memsw_limit
inactive_anon
inactive_file
mapped_file
pgfault
pgmajfault
pgpgin
pgpgout
rss
rss_huge
swap
total_active_anon
total_active_file
total_cache
total_inactive_anon
total_inactive_file
total_mapped_file
total_pgfault
total_pgmajfault
total_pgpgin
total_pgpgout
total_rss
total_rss_huge
total_swap
total_unevictable
unevictable

netstat

Collected by: storpool_stat

Collected from: /usr/lib/storpool/sdump

This measurement collects network stats for every StorPool service running on any node, for the StorPool network protocol.

Tags:

hostname - host name of the node
network - network ID
server - SP_OURID of the node
service - name of the service.

Fields:

getNotSpace - number of requests rejected because of no space in local queues
rxBytes - received bytes
rxChecksumError - received packets with checksum errors
rxDataChecksumErrors - received packets with checksum error in the data
rxDataHoles - “holes” in the received data, caused by either packet loss or reordering
rxDataPackets - received packets with data
rxHwChecksumError - received packets with checksum errors detected by hardware
rxNotForUs - received packets not destined to this service
rxPackets - total received packets
rxShort - received packets that were too short/truncated
txBytes - transmitted bytes
txBytesLocal - transmitted bytes to services on the same node
txDropNoBuf - dropped packets because no buffers were available
txGetPackets - transmitted get packets (requesting data from other services)
txPackets - total transmitted packets
txPacketsLocal - transmitted packets to services on the same node
txPingPackets - transmitted ping packets.

servicestat

Collected by: storpool_stat

Collected from: /usr/lib/storpool/sdump

Tags:

hostname - hostname of the node
server - SP_OURID of the node
service - service name

Fields:

data_transfers - number of successful data transfers
data_transfers_failed - number of failed data transfers
loops_per_second - processing loops done by the service
slept_for_usecs - amount of time the service has been idle

task

Collected by: storpool_monitor, pushed via the monitoring system

Collected from: storpool task list

This measurement tracks the active tasks in the cluster.

Tags:

diskId - disk initiating the tasks
transactionId - transaction ID of the task (0 is RECOVERY, 1 is bridge, 2 is balancer)

Fields:

allObjects - the sum of all objects in the task
completedObjects - completed objects
dispatchedObjects - objects currently being processed
unresolvedObjects - objects not yet resolved.

template

Collected by: storpool_monitor, pushed via the monitoring system

Collected from: storpool template status

This measurement tracks the amount of used/free space in a StorPool cluster.

Tags:

placeAll - placement group name for placeAll
placeHead - placement group name for placeHead
placeTail - placement group name for placeTail
templatename - template name

Fields:

availablePlaceAll - available space in placeAll placement group
availablePlaceHead - available space in placeHead placement group
availablePlaceTail - available space in placeTail placement group
capacity - total capacity of the template
capacityPlaceAll - capacity of the placeAll placement group
capacityPlaceHead - capacity of the placeHead placement group
capacityPlaceTail - capacity of the placeTail placement group
free - available space in the template
objectsCount - number of objects
onDiskSize - total size of data stored on disks in this template
removingSnapshotsCount - number of snapshots being deleted
size - total provisioned size on the template
snapshotsCount - number of snapshots
storedSize - total amount of data stored in template
stored_internal_u1 - internal value
stored_internal_u1_placeAll - internal value
stored_internal_u1_placeHead - internal value
stored_internal_u1_placeTail - internal value
stored_internal_u2 - internal value
stored_internal_u2_placeAll - internal value
stored_internal_u2_placeHead - internal value
stored_internal_u2_placeTail - internal value
stored_internal_u3 - internal value (estimate of space “lost” due to disbalance)
stored_internal_u3_placeAll - internal value
stored_internal_u3_placeHead - internal value
stored_internal_u3_placeTail - internal value
totalSize - The number of bytes of all volumes based on this template, including the StorPool replication overhead
volumesCount - number of volumes

per_host_status

This measurement collects inventory and per host data for monitoring and host-wide system checks.

Collected by: storpool_stat

Collected from: ph_status splib module.

Fields:

rootcgprocesses - Shows processes that run without a memory constraint in the node’s root cgroup, used for monitoring to prevent OOM and deadlocks.
apichecks - Checks that the API address and port are both reachable as a part of monitoring (catches blocked ports).
iscsi_checks - Reports which iSCSI remote portals are unreachable from this node and their MTU, as part of monitoring node and cluster-wide network issues.
service_checks - Reports all StorPool local services running and enabled state.
inventorydata - Collect the following used for inventory and comprehensive alerts:
- allkernels - list of all installed kernel versions on the node.
- by-id - list of all device symlinks in the /dev/disk/by-id directory.
- by-path - list of all device symlinks in the /dev/disk/by-path directory.
- conf_sums - map of sha256sum checksums of /etc/storpool.conf and all files in the /etc/storpool.conf.d/ directory.
- cputype - the CPU architecture as recognized by the tooling in the splib.
- df - list of lines output of df.
- dmidecode - raw output of dmidecode, used mostly for RAM compatibility alerts.
- free_-m - the output of free -m command.
- fstab - the raw contents of /etc/fstab.
- kernel - the running kernel of the node.
- lldp - the output from lldpcli show neighbours in JSON.
- lsblk - the raw output from lsblk.
- lscpu - the raw output from lscpu.
- lshw_-json - the JSON output from lshw.
- lsmod - the raw output from lsmod.
- lspci_-DvvA_linux-proc - the raw output from lspci -DvvA linux-proc.
- lspci_-k - the raw output from lspci -k used mostly for device driver inventory.
- mounts - list of lines output from /proc/mounts.
- net - map of symlink name and the path it is leading to for all devices in the /sys/class/net directory.
- nvme_list - the raw output from nvme list.
- os - the operating system string as detected by the tooling in splib.
- revision - the contents of /etc/storpool_revision, as well as the output from the storpool_revision tool in JSON format.
- spconf - map of key/value for all resolved values from the configuration files in /etc/storpool.conf and /etc/storpool.conf.d/*.conf files for this node.
- sprdma - map of files and their contents in the /sys/class/storpool_rdma/storpool_rdma/state directory.
- taint - list of all modules reported as tainted (to detect live patched kernels).
- unsupportedkernels - all kernels later than the presently running one that do not have StorPool kernel modules installed.
- vfgenconf - the JSON configuration created for the network interfaces used for StorPool to enable hardware acceleration.
net_info - Report the present network state as reported by /usr/lib/storpool/storpool_ping netInfo from the point of view of each local service, used for more comprehensive monitoring checks.
systemd - Report node’s systemd units and their state.
sysctl - Report the following sysctl values:
- kernel.core_uses_pid
- kernel.panic
- kernel.panic_on_oops
- kernel.softlockup_panic
- kernel.unknown_nmi_panic
- kernel.panic_on_unrecovered_nmi
- kernel.panic_on_io_nmi
- kernel.hung_task_panic
- vm.panic_on_oom
- vm.dirty_background_bytes
- vm.dirty_expire_centisecs
- vm.dirty_writeback_centisecs
- vm.oom_dump_tasks
- vm.oom_kill_allocating_task
- kernel.sysrq
- net.ipv4.ip_forward
- net.ipv6.conf.all.forwarding
- net.ipv6.conf.default.forwarding
- net.nf_conntrack_max
- net.ipv4.tcp_rmem
- net.ipv4.conf.all.arp_filter
- net.ipv4.conf.all.arp_announce
- net.ipv4.conf.all.arp_ignore
- net.ipv4.conf.default.arp_filter
- net.ipv4.conf.default.arp_announce
- net.ipv4.conf.default.arp_ignore
proc_cmdline - The cmdline of the running kernel.
one_version - The version of the OpenNebula plugin if one is installed.
cloudstack_info2 - Information about the installed CloudStack plugin for StorPool.
kdumpctl_status - Reports the output from kdump-config status or kdumpctl status depending on the OS version, used to ensure the kdump service is configured and running correctly.
iscsi_tool - Reports the output from the following views of the /usr/lib/storpool/iscsi_tool:
- ip net list
- ip neigh list
- ip route list
Used to monitor various local node parameters for more comprehensive monitoring alerts.

Metrics collected and available from StorPool

Overview

Internals

storpool_stat operations

Data interpolation between s1 and m1 retention policies

Disk usage and IO of InfluxDB databases

Data structure

Measurements reference

bridgestatus

cpustat

disk

diskiostat

diskstat

iostat

iscsisession

memstat

netstat

servicestat

task

template

per_host_status

`storpool_stat` operations

Data interpolation between `s1` and `m1` retention policies