Metrics collected and available from StorPool

Overview

A StorPool cluster collects and sends metrics information about its performance and related stats. These are described below.

Customers can access the data either:

  • Via https://analytics.storpool.com/ on pre-defined dashboards

  • Via direct access in the InfluxDB instance for their cluster (please contact StorPool support to get access)

  • Via an InfluxDB (or other database that supports InfluxDB’s line protocol) of their own, by configuring SP_STATDB_ADDITIONAL in /etc/storpool.conf, see the last section in this document.

The Grafana dashboards used by StorPool at https://analytics.storpool.com/ are available on request, via StorPool support.

Internals

In this section you can find some details on the operation of the data collection and InfluxDB, which could be helpful to customers that would like to operate their own metrics database for StorPool.

storpool_stat operations

The storpool_stat tool is responsible for collecting and sending most of the information to the InfluxDB instances. It’s general flow is as follows:

  • On start, it forks one child per measurement type and one per receiving database

  • All measurement processes collect data and write atomically a file every few seconds in /tmp/storpool_stat

  • The sending processes take the data from the files and push it to the databases, then delete the files

  • If any file is found to be older than two days, the file is removed.

Note

It has been known for storpool_stat to fill up /tmp on loss of network connectivity for nodes with large amount of measured elements (CPUs, volumes).

To configure an extra metrics database for data to be pushed to, the SP_STATDB_ADDITIONAL parameter needs to be set in storpool.conf. It must contain a full URL to the write endpoint of the database, for example http://USER:PASSWORD@10.1.2.3:8086/write?db=metrics. Note that if the URL scheme is https, the endpoint will need a valid certificate.

Data interpolation between s1 and m1 retention policies

In StorPool’s initial deployment, this was done by continuous queries. Below is what was used:

CREATE CONTINUOUS QUERY "1s_to_1m" ON DBNAME BEGIN SELECT mean(*) INTO m1.:MEASUREMENT FROM /.*/ GROUP BY time(1m),* FILL(NULL) END
CREATE CONTINUOUS QUERY "1s_to_1m_max" ON DBNAME BEGIN SELECT max(*) INTO m1.:MEASUREMENT FROM /.*/ GROUP BY time(1m),* FILL(NULL) END'

This solution did not scale for larger amount of databases, as the continuous queries are executed sequentially with no parallelization. For this purpose, StorPool developed cqsched (available to customers on request to StorPool support) to process multiple databases and measurements in parallel.

Disk usage and IO of InfluxDB databases

The IO requirements of a single database are remarkably modest. For planing purposes, you should note that a cluster sends a data point every second for:

  • every CPU thread

  • every attached volume

  • every HDD or SSD drive on a node.

For disk usage, as an example, a cluster with ~800 attached volumes and ~800 CPU threads takes 11GiB space for the s1 measurement, and 78GiB for the m1 measurement.

Data structure

The main unit is a “measurement”, which contains the data for a specific measurement (disk I/O, CPU usage, etc).

All of the data below has tags and fields. Basically, a “tag” is something that is used to filter data, and field is something used to do calculations/plot graphs.

For more information, see InfluxDB’s documentation at https://docs.influxdata.com/influxdb/v1/concepts/key_concepts/.

There are two retention policies for data: the per-second data (s1) is retained for 7 days, the per-minute data (m1) is retained for 730 days (2 years). All data from storpool_stat is pushed into s1 and the InfluxDB instances take care of downsampling it to per-minute data either via continuous queries or other means.

Measurements reference

The instances contain the following measurements:

bridgestatus

Collected by: storpool_monitor, pushed via the monitoring system.

Collected from: storpool remoteBridge status

This measurement is basically storpool remoteBridge status collected once a minute.

The tcpInfo_* fields are direct copy of the tcp_info structure in the Linux kernel for the relevant connection.

Tags:

  • clusterId - ID of the remote cluster

  • connectionState - current state of the connection (string)

  • ip - IP address of the remote bridge

  • lastErrno - errno of the last error

  • lastError - string representation of the last error

  • protocolVersion - StorPool bridge protocol version.

Fields:

  • countersSinceConnect_bytesRecv - bytes received in the current connection

  • countersSinceConnect_bytesSent - bytes sent in the current connection

  • countersSinceStart_bytesRecv - bytes received from peer since the start of the process

  • countersSinceStart_bytesSent - bytes sent to peer since the start of the process

  • tcpInfo_tcpi_advmss - this and all below - tcpinfo field.

  • tcpInfo_tcpi_ato

  • tcpInfo_tcpi_backoff

  • tcpInfo_tcpi_ca_state

  • tcpInfo_tcpi_fackets

  • tcpInfo_tcpi_last_ack_recv

  • tcpInfo_tcpi_last_ack_sent

  • tcpInfo_tcpi_last_data_recv

  • tcpInfo_tcpi_last_data_sent

  • tcpInfo_tcpi_lost

  • tcpInfo_tcpi_options

  • tcpInfo_tcpi_pmtu

  • tcpInfo_tcpi_probes

  • tcpInfo_tcpi_rcv_mss

  • tcpInfo_tcpi_rcv_rtt

  • tcpInfo_tcpi_rcv_space

  • tcpInfo_tcpi_rcv_ssthresh

  • tcpInfo_tcpi_rcv_wscale

  • tcpInfo_tcpi_reordering

  • tcpInfo_tcpi_retrans

  • tcpInfo_tcpi_retransmits

  • tcpInfo_tcpi_rto

  • tcpInfo_tcpi_rtt

  • tcpInfo_tcpi_rttvar

  • tcpInfo_tcpi_sacked

  • tcpInfo_tcpi_snd_cwnd

  • tcpInfo_tcpi_snd_mss

  • tcpInfo_tcpi_snd_ssthresh

  • tcpInfo_tcpi_snd_wscale

  • tcpInfo_tcpi_state

  • tcpInfo_tcpi_total_retrans

  • tcpInfo_tcpi_unacked

cpustat

Collected by: storpool_stat Collected from: /proc/schedstat, /proc/stat

For extra information, see the documentation of the Linux kernel at https://docs.kernel.org/filesystems/proc.html for the two files above.

Tags:

  • cpu - the CPU thread the stats are for

  • hostname - hostname of the node

  • labels - pipe-delimited (|) list of StorPool services that are pinned on the CPU

  • server - SP_OURID of the node.

Fields:

  • guest - Amount of time running a guest (virtual machine)

  • guest_nice - Amount of time running a lower-priority (“nice”) guest (virtual machine)

  • idle - Amount of time the CPU has been idle

  • iowait - Amount of time the CPU has been idle and waiting for I/O

  • irq - Amount of time the CPU has processed interrupts

  • nice - Amount of time the CPU was running lower-priority (“nice”) tasks

  • run - Amount of time a task has been running on the cpu

  • runwait - Sum of run and wait

  • softirq - Amount of time the CPU has processed software interrupts

  • steal- Amount of time the CPU was not able to run because the host didn’t allow this (has meaning only for virtual machines)

  • system - Amount of time the CPU was executing kernel (and non-IRQ) tasks

  • user - Amount of time the CPU was executing in user-space

  • wait - Amount of time tasks(s) have been waiting to run on the CPU.

Note

run and wait come from the scheduler stats. Their main benefit is that they allow for the contention of the system to be measured, for example wait on a host running virtual machines translates directly to steal inside the virtual machines.

disk

Collected by: storpool_monitor, pushed via the monitoring system

Collected from: storpool disk list

This measurement is basically storpool disk list collected once a minute.

Tags:

  • id - Disk ID in StorPool

  • isWbc - Does the drive have write-back cache enabled

  • model - Drive model

  • noFlush - Is the drive initialized to not send FLUSH commands

  • noFua - Is the drive initialized not to use FUA (Force Unit Access)

  • noTrim - Is the drive initialized not to use TRIM/DISCARD

  • serial - Serial number of the drive

  • serverId - Server instance ID of the storpool_server working with the drive

  • ssd - Is the drive an SSD/NVMe device.

Fields:

  • agAllocated - Allocation groups currently in use

  • agCount - Total number of allocation groups

  • agFree - Free allocation groups

  • agFreeNotTrimmed - (internal) free allocation groups that haven’t been trimmed yet

  • agFreeing - (internal) allocation groups currently being freed

  • agFull - (internal) allocation groups that are full

  • agMaxSizeFull - (internal) allocation groups that are full with max-sized entries

  • agMaxSizePartial - (internal) allocation groups that have only max-sized entries, but are not full

  • agPartial - (internal) allocation groups that are partially full

  • aggregateScore_entries - aggregate score for entries

  • aggregateScore_space - aggregate score for space

  • aggregateScore_total - combined aggregate score

  • entriesAllocated - Entries currently in use

  • entriesCount - Total number of entries

  • entriesFree - Free entries

  • lastScrubCompleted - Time stamp of the last completed scrubbing operation

  • objectsAllocated - Objects currently in use

  • objectsCount - Total number of objects

  • objectsFree - Free objects

  • objectsOnDiskSize - Total amount of user data on drive (sum of all data in objects)

  • scrubbedBytes - Progress of the current scrubbing operation

  • scrubbingBW - Bandwidth of the current scrubbing operation

  • scrubbingFinishAfter - estimated ETA of the scrubbing operation

  • scrubbingStartedBefore - approximate start of the scrubbing operation

  • sectorsCount - Number of sectors of the drive

  • totalErrorsDetected - Total errors detected by checksum verification on the drive.

diskiostat

Collected by: storpool_stat

Collected from: /proc/diskstats, storpool_initdisk --list

This measurement collects I/O stats for all HDD and SATA SSD drives on the system. For extra information, see the documentation of the Linux kernel at https://docs.kernel.org/admin-guide/iostats.html.

Tags:

  • device - device name

  • hostname - host name of the node

  • server - SP_OURID of the node

  • sp_id - disk ID in StorPool (if applicable). Journal devices are prefixed with j

  • ssd - is the drive SSD.

Fields:

  • queue_depth - queue utilization

  • r_wait - wait time for read operations

  • read_bytes - bytes transferred for read operations

  • reads - number of read operations

  • reads_merges - merged read operations

  • utilization - device utilization (time busy)

  • w_wait - wait time for write operations

  • wait - total wait time

  • write_bytes - bytes transferred for write operations

  • write_merges - merged write operations

  • writes - number of write operations

diskstat

Collected by: storpool_stat Collected from: /usr/lib/storpool/server_stat

These metrics show the performance of the drive and operations as seen from the storpool_server processes.

Tags:

  • disk - the ID of the disk in StorPool

  • hostname - host name of the server

  • server - SP_OURID of the server

Fields:

  • aggregation_completion_time -

  • aggregations - number of aggregation operations performed

  • disk_initiated_read_bytes -

  • disk_initiated_reads -

  • disk_read_operations_completion_time -

  • disk_reads_completion_time -

  • disk_trims_bytes -

  • disk_trims_count -

  • disk_write_operations_completion_time -

  • disk_writes_completion_time -

  • entry_group_switches - (internal)

  • max_disk_writes_completion_time -

  • max_outstanding_read_requests - peak read requests in the queue

  • max_outstanding_write_requests - peak write requests in the queue

  • max_transfer_time -

  • metadata_completion_time -

  • pct_utilization_aggregation - drive utilization for aggregation

  • pct_utilization_metadata - drive utilization for metadata operations

  • pct_utilization_reads - drive utilization for read operations

  • pct_utilization_server_reads - drive utilization for server reads

  • pct_utilization_sys - drive utilization for system operations

  • pct_utilization_total - drive utilization in total

  • pct_utilization_total2 -

  • pct_utilization_unknwon -

  • pct_utilization_user -

  • pct_utilization_writes - drive utilization for write operations

  • queued_read_requests - number of read operations in the queue

  • queued_write_requests - number of write operations in the queue

  • read_balance_forward_double_dispatch -

  • read_balance_forward_double_dispatch_pct -

  • read_balance_forward_rcvd -

  • read_balance_forwards_sent -

  • read_bytes - bytes transferred for read operations

  • reads - read operation

  • reads_completion_time -

  • server_read_bytes - bytes transferred for server reads

  • server_reads - server reads (requests from other servers)

  • transfer_average_time -

  • trims - TRIM operations issued to the device

  • write_bytes - bytes transferred for write operations

  • writes - write operations

  • writes_completion_time -

iostat

Collected by: storpool_stat

Collected from: /proc/schedstat, /proc/stat

This measurement collects data for the I/O usage and latency for the volumes attached in hosts via the native StorPool driver. These are the same as for diskiostat.

Tags:

  • hostname - host name of the node

  • server - SP_OURID of the node

  • volume - volume name.

Fields:

  • queue_depth - queue utilization

  • r_wait - wait time for read operations

  • read_bytes - bytes transferred for read operations

  • reads - number of read operations

  • reads_merges - merged read operations

  • utilization - device utilization (time busy)

  • w_wait - wait time for write operations

  • wait - total wait time

  • write_bytes - bytes transferred for write operations

  • write_merges - merged write operations

  • writes - number of write operations

iscsisession

Collected by: storpool_monitor, pushed via the monitoring system Collected from: storpool iscsi sessions list

This measurement is storpool iscsi sessions list, collected once a minute. The data in it is counters, not differences, except the tasks_* fields, which are the current usage of the task queue.

Tags:

  • ISID - ISID

  • connectionId - numeric ID of the connection

  • controllerId - SP_OURID of the target exporting node

  • hwPort - network interface number (0/1)

  • initiator - initiator IQN

  • initiatorIP - initiator IP address

  • initiatorId - internal numeric ID for the initiator

  • initiatorPort - initiator originating TCP port

  • localMSS - MSS for the TCP connection

  • portalIP - IP of the portal

  • portalPort - TCP port of the portal

  • status - status of the connection

  • target - target name

  • targetId - numerical target ID

  • timeCreated - timestamp of connection creation

Fields:

  • dataHoles - data “holes” observer, either because of dropped packets or reordering

  • discardedBytes - amount of data discarded

  • discardedPackets - number of packets discarded

  • newBytesIn - bytes in SYN packets

  • newBytesOut - bytes in SYN and/or ACK packets

  • newPacketsIn - number of SYN packets

  • newPacketsOut - bytes in SYN and/or ACK packets.

  • retransmitsAcks - number of fast retransmits

  • retransmitsAcks2 - number of second retransmits

  • retransmitsTimeout - number of retransmissions because of a timeout

  • retransmittedBytes - amount of retransmitted data

  • retransmittedPackets - number of retransmitted packets

  • stats_dataIn - amount of data received

  • stats_dataOut - amount of data sent

  • stats_login - number of login requests

  • stats_loginRsp - number of login responses

  • stats_logout - number of logout requests

  • stats_logoutRsp - number of logout responses

  • stats_nopIn - number of NOPs received

  • stats_nopOut - number of NOPs sent

  • stats_r2t -

  • stats_reject -

  • stats_scsi -

  • stats_scsiRsp -

  • stats_snack -

  • stats_task -

  • stats_taskRsp -

  • stats_text -

  • stats_textRsp -

  • tasks_aborted - task slots with ABORT tasks

  • tasks_dataOut - total number of tasks for sending data

  • tasks_dataResp - tasks responding with data

  • tasks_inFreeList - available tasks slots

  • tasks_processing - task slots currently being processed

  • tasks_queued - task slots queued for processing

  • tcp_remoteMSS - MSS advertised from the remote side

  • tcp_remoteWindowSize - remote side window size

  • tcp_wscale - TCP window scaling factor

  • totalBytesIn - Total bytes received

  • totalBytesOut - Total bytes sent

  • totalPacketsIn - Total packets received

  • totalPacketsOut - Total packets sent.

memstat

Collected by: storpool_stat Collected from: /sys/fs/cgroup/memory/**/memory.stat

This measurement describes the memory usage of the mode and its cgroups. For the full description of the fields, see the documentation of the Linux kernel at https://docs.kernel.org/admin-guide/cgroup-v1/memory.html.

Tags:

  • cgroup - name of the cgroup;

  • hostname - host name of the node

  • server - SP_OURID of the node.

Fields:

  • active_anon

  • active_file

  • cache

  • hierarchical_memory_limit

  • hierarchical_memsw_limit

  • inactive_anon

  • inactive_file

  • mapped_file

  • pgfault

  • pgmajfault

  • pgpgin

  • pgpgout

  • rss

  • rss_huge

  • swap

  • total_active_anon

  • total_active_file

  • total_cache

  • total_inactive_anon

  • total_inactive_file

  • total_mapped_file

  • total_pgfault

  • total_pgmajfault

  • total_pgpgin

  • total_pgpgout

  • total_rss

  • total_rss_huge

  • total_swap

  • total_unevictable

  • unevictable

netstat

Collected by: storpool_stat

Collected from: /usr/lib/storpool/sdump

This measurement collects network stats for every StorPool service running on any node, for the StorPool network protocol.

Tags:

  • hostname - host name of the node

  • network - network ID

  • server - SP_OURID of the node

  • service - name of the service.

Fields:

  • getNotSpace - number of requests rejected because of no space in local queues

  • rxBytes - received bytes

  • rxChecksumError - received packets with checksum errors

  • rxDataChecksumErrors - received packets with checksum error in the data

  • rxDataHoles - “holes” in the received data, caused by either packet loss or reordering

  • rxDataPackets - received packets with data

  • rxHwChecksumError - received packets with checksum errors detected by hardware

  • rxNotForUs - received packets not destined to this service

  • rxPackets - total received packets

  • rxShort - received packets that were too short/truncated

  • txBytes - transmitted bytes

  • txBytesLocal - transmitted bytes to services on the same node

  • txDropNoBuf - dropped packets because no buffers were available

  • txGetPackets - transmitted get packets (requesting data from other services)

  • txPackets - total transmitted packets

  • txPacketsLocal - transmitted packets to services on the same node

  • txPingPackets - transmitted ping packets.

servicestat

Collected by: storpool_stat

Collected from: /usr/lib/storpool/sdump

Tags:

  • hostname - hostname of the node

  • server - SP_OURID of the node

  • service - service name

Fields:

  • data_transfers - number of successful data transfers

  • data_transfers_failed - number of failed data transfers

  • loops_per_second - processing loops done by the service

  • slept_for_usecs - amount of time the service has been idle

task

Collected by: storpool_monitor, pushed via the monitoring system

Collected from: storpool task list

This measurement tracks the active tasks in the cluster.

Tags:

  • diskId - disk initiating the tasks

  • transactionId - transaction ID of the task (0 is RECOVERY, 1 is bridge, 2 is balancer)

Fields:

  • allObjects - the sum of all objects in the task

  • completedObjects - completed objects

  • dispatchedObjects - objects currently being processed

  • unresolvedObjects - objects not yet resolved.

template

Collected by: storpool_monitor, pushed via the monitoring system

Collected from: storpool template status

This measurement tracks the amount of used/free space in a StorPool cluster.

Tags:

  • placeAll - placement group name for placeAll

  • placeHead - placement group name for placeHead

  • placeTail - placement group name for placeTail

  • templatename - template name

Fields:

  • availablePlaceAll - available space in placeAll placement group

  • availablePlaceHead - available space in placeHead placement group

  • availablePlaceTail - available space in placeTail placement group

  • capacity - total capacity of the template

  • capacityPlaceAll - capacity of the placeAll placement group

  • capacityPlaceHead - capacity of the placeHead placement group

  • capacityPlaceTail - capacity of the placeTail placement group

  • free - available space in the template

  • objectsCount - number of objects

  • onDiskSize - total size of data stored on disks in this template

  • removingSnapshotsCount - number of snapshots being deleted

  • size - total provisioned size on the template

  • snapshotsCount - number of snapshots

  • storedSize - total amount of data stored in template

  • stored_internal_u1 - internal value

  • stored_internal_u1_placeAll - internal value

  • stored_internal_u1_placeHead - internal value

  • stored_internal_u1_placeTail - internal value

  • stored_internal_u2 - internal value

  • stored_internal_u2_placeAll - internal value

  • stored_internal_u2_placeHead - internal value

  • stored_internal_u2_placeTail - internal value

  • stored_internal_u3 - internal value (estimate of space “lost” due to disbalance)

  • stored_internal_u3_placeAll - internal value

  • stored_internal_u3_placeHead - internal value

  • stored_internal_u3_placeTail - internal value

  • totalSize - The number of bytes of all volumes based on this template, including the StorPool replication overhead

  • volumesCount - number of volumes

per_host_status

This measurement collects inventory and per host data for monitoring and host-wide system checks.

Collected by: storpool_stat

Collected from: ph_status splib module.

Fields:

  • rootcgprocesses - Shows processes that run without a memory constraint in the node’s root cgroup, used for monitoring to prevent OOM and deadlocks.

  • apichecks - Checks that the API address and port are both reachable as a part of monitoring (catches blocked ports).

  • iscsi_checks - Reports which iSCSI remote portals are unreachable from this node and their MTU, as part of monitoring node and cluster-wide network issues.

  • service_checks - Reports all StorPool local services running and enabled state.

  • inventorydata - Collect the following used for inventory and comprehensive alerts:

    • allkernels - list of all installed kernel versions on the node.

    • by-id - list of all device symlinks in the /dev/disk/by-id directory.

    • by-path - list of all device symlinks in the /dev/disk/by-path directory.

    • conf_sums - map of sha256sum checksums of /etc/storpool.conf and all files in the /etc/storpool.conf.d/ directory.

    • cputype - the CPU architecture as recognized by the tooling in the splib.

    • df - list of lines output of df.

    • dmidecode - raw output of dmidecode, used mostly for RAM compatibility alerts.

    • free_-m - the output of free -m command.

    • fstab - the raw contents of /etc/fstab.

    • kernel - the running kernel of the node.

    • lldp - the output from lldpcli show neighbours in JSON.

    • lsblk - the raw output from lsblk.

    • lscpu - the raw output from lscpu.

    • lshw_-json - the JSON output from lshw.

    • lsmod - the raw output from lsmod.

    • lspci_-DvvA_linux-proc - the raw output from lspci -DvvA linux-proc.

    • lspci_-k - the raw output from lspci -k used mostly for device driver inventory.

    • mounts - list of lines output from /proc/mounts.

    • net - map of symlink name and the path it is leading to for all devices in the /sys/class/net directory.

    • nvme_list - the raw output from nvme list.

    • os - the operating system string as detected by the tooling in splib.

    • revision - the contents of /etc/storpool_revision, as well as the output from the storpool_revision tool in JSON format.

    • spconf - map of key/value for all resolved values from the configuration files in /etc/storpool.conf and /etc/storpool.conf.d/*.conf files for this node.

    • sprdma - map of files and their contents in the /sys/class/storpool_rdma/storpool_rdma/state directory.

    • taint - list of all modules reported as tainted (to detect live patched kernels).

    • unsupportedkernels - all kernels later than the presently running one that do not have StorPool kernel modules installed.

    • vfgenconf - the JSON configuration created for the network interfaces used for StorPool to enable hardware acceleration.

  • net_info - Report the present network state as reported by /usr/lib/storpool/storpool_ping netInfo from the point of view of each local service, used for more comprehensive monitoring checks.

  • systemd - Report node’s systemd units and their state.

  • sysctl - Report the following sysctl values:

    • kernel.core_uses_pid

    • kernel.panic

    • kernel.panic_on_oops

    • kernel.softlockup_panic

    • kernel.unknown_nmi_panic

    • kernel.panic_on_unrecovered_nmi

    • kernel.panic_on_io_nmi

    • kernel.hung_task_panic

    • vm.panic_on_oom

    • vm.dirty_background_bytes

    • vm.dirty_expire_centisecs

    • vm.dirty_writeback_centisecs

    • vm.oom_dump_tasks

    • vm.oom_kill_allocating_task

    • kernel.sysrq

    • net.ipv4.ip_forward

    • net.ipv6.conf.all.forwarding

    • net.ipv6.conf.default.forwarding

    • net.nf_conntrack_max

    • net.ipv4.tcp_rmem

    • net.ipv4.conf.all.arp_filter

    • net.ipv4.conf.all.arp_announce

    • net.ipv4.conf.all.arp_ignore

    • net.ipv4.conf.default.arp_filter

    • net.ipv4.conf.default.arp_announce

    • net.ipv4.conf.default.arp_ignore

  • proc_cmdline - The cmdline of the running kernel.

  • one_version - The version of the OpenNebula plugin if one is installed.

  • cloudstack_info2 - Information about the installed CloudStack plugin for StorPool.

  • kdumpctl_status - Reports the output from kdump-config status or kdumpctl status depending on the OS version, used to ensure the kdump service is configured and running correctly.

  • iscsi_tool - Reports the output from the following views of the /usr/lib/storpool/iscsi_tool:

    • ip net list

    • ip neigh list

    • ip route list

    Used to monitor various local node parameters for more comprehensive monitoring alerts.