Monitoring data collected

Overview

StorPool provides a hosted monitoring system that allows all StorPool customers to easily and reliably monitor the health and performance of their StorPool storage clusters. The health and performance information is collected by the storage and client nodes and sent over an encrypted connection to the StorPool monitoring servers, running in StorPool’s own infrastructure in a data center with restricted physical access.

What data is collected

The data that is collected includes performance metrics of the storage nodes, storage devices, storage network traffic, and some metadata of the stored volumes. Following is an exhaustive list of the data collected and stored by the StorPool hosted monitoring system:

Cluster status

  • List and status of the disks

  • List and status of the StorPool services

  • List and status of the storage networks

  • List and status of the attachments

  • List and status of the relocation tasks running

  • List and status of the volumes and snapshots

  • List and status of the placementGroups and templates

  • Status of relocator and balancer services

  • cluster (mgmtConfig) configuration

  • maintenances set in progress

  • List and status of the iSCSI sessions

  • pending I/O requests that have been active more than 10 seconds - client

Per-host status

  • Running kernel version and other installed kernels

  • Processes running in the root cgroup

  • Connectivity to the node with the active API

  • Crash reports of the StorPool processes (that include internal logs and core dumps)

  • Crash reports of the Linux kernel

Performance monitoring

  • List and status of the disks

  • List and status of the templates

  • List and status of the iSCSI sessions

  • Storage server performance metrics: reads and writes per second, bytes/s, network transfer time, processing time, disk read/write time, disk busy time, queues length, system tasks utilization

  • Stats from /proc/diskstat (I/O stats) for all attached StorPool and system disk drives - number of iops, bytes/s, busy time, and so on

  • Stats for CPU usage and queue for every CPU

  • Memory usage for all cgroups in the system (cache, rss, and so on)

  • Network and other stats from the StorPool services on the node

Metadata

The collected information listed above contains some metadata about the volumes stored. This includes:

  • Volume name

  • Volume size

  • Volume utilization - used space

  • StorPool template name used to create the volume

  • Replication factor for the volume

  • QoS parameters - configured IOPS and bandwidth limits

  • Tags of the volume

The format and the information contained in the volume names depend on the cloud management system used. It typically contains a UUID of the virtual disk (OpenStack) or disk sequence numbers (OpenNebula) like “one-{VM_ID}-{Disk_ID}”.

Cloud management systems may store additional metadata in the volume tags, like virtual machine ID that is using the volume or the backup policy (used by the volumecare service).

User data

Attention

No user data is collected, processed, or transferred by the monitoring agents and StorPool monitoring servers. The content of the volumes and snapshots are never read or processed, except for the main function of the storage system - to store and retrieve the user data on user requests.

In the crash reports collected, any user information (like data buffers) is not recorded at all. No extra information can be received from these than what’s described above. For the Linux kernel crash reports, only the backtrace/log information is sent, the crash report itself is not transferred.

How the data is collected

The data is collected using agents, part of the StorPool software, running on the storage, and client nodes with installed StorPool client. The agents regularly collect the health-check and performance metrics, perform preliminary processing - validation, aggregation, calculation of derived parameters, and send the data to the hosted StorPool monitoring system.

The agents that collect and send the information are implemented in scripting languages and are available for audit by the StorPool customers.

How the data is sent

All monitoring and performance metrics data is sent by the agents over encrypted HTTP/TLS or SSH connections. Connections are established by the agents on the storage and client nodes with installed StorPool software, to a pool of redundant monitoring servers in the StorPool’s own infrastructure. The TLS/SSH connections can be established over the Internet or using a purpose-built VPN between the Storage cluster and StorPool infrastructure. In both cases, TLS encrypted communication is used between the agents and the monitoring servers.

The destination servers, where the monitoring and performance data is sent are configured locally on the storage nodes and are under the customer’s control.

Agents sending the data are authenticated by the monitoring servers using individual per-cluster pre-shared keys before the data is accepted for processing.

How the data is processed

StorPool monitoring servers store and process the received data locally on the StorPool hosted monitoring system. No data is sent to external systems or third parties for storing or processing. The health check data is used to update the current health status of the elements. It is not stored in raw format, but as the processed health status of the elements and historical data about the events - changes in their status.

Performance metrics are stored in a raw high-resolution data format in a time-series database and in an aggregated 1-minute resolution format. The aggregated data is stored for 12 months, after which it is deleted. The high-resolution data is stored for 48 hours.

Information about the monitoring events, like disk or node failure, cluster health status change, etc. can be sent using 3rd party services to the subscribed customers. These notification services currently used are Slack (slack.com) and e-mail. Notifications are configured/enabled per cluster.

All servers where the monitoring and performance data is processed and stored are owned and managed by StorPool and are part of the infrastructure managed by StorPool. The servers are located in a certified data center with restricted access.