Common monitoring via the StorPool API

1. Introduction

The operations of a StorPool cluster can be monitored via the API. It returns JSON and is easy to automate and integrate with different monitoring systems. Below is an explanation of the different elements that can be monitored and their meaning.

Most of this work is based on the running monitoring system of StorPool, available at https://spnagios.storpool.com. If you are a customer and don’t have access to it, please contact us via our ticketing system to request your credentials.

To see the JSON for a specific command, try storpool -B -j COMMAND, i.e.:

[root@one1 ~]# storpool -B -j disk list
{
   "data" : {
      "101" : {
         "agAllocated" : 544,
         "agCount" : 1444,
         "agFree" : 900,
         "agFreeNotTrimmed" : 1,
         "agFreeing" : 1,
         "agFull" : 0,
         "agMaxSizeFull" : 0,
         "agMaxSizePartial" : 4,
         "agPartial" : 535,
         "aggregateScore" : {
            "entries" : 0,
            "space" : 1,
            "total" : 1
         },
         "description" : "",
         "device" : "/dev/sda1",
         "empty" : false,
...

Note

The -B option means “batch”, it will retry the request if there are transient errors.

2. Internal elements

These elements do not have status of their own, but provide information about other components.

2.1. Tasks

Available in the CLI via task list.

The tasks that run in the cluster are operations that fall in one of these three categories:

  • Transaction ID 0 - recovery task, a drive is recovering the data changes it missed while its server was down;

  • Transaction ID 1 - bridge tasks, sending data between clusters (“Multisite”);

  • Everything else - balancer/relocator and similar tasks.

The following can be done with the information from the tasks:

  • To see if there is a server or drive in recovery. This information is useful to know when some service or node can be restarted, as it’s recommended to wait for the recoveries to finish before doing any other maintenance.

  • To see if there is re-balancing going on, which can affect cluster performance.

  • To see if any bridge operation is in progress.

To track the progress of the tasks, you can see the count of completed objects per task. Please note this cannot be used for estimating completion time, as the total size of the objects cannot be estimated from this output.

2.2. Attachments

Available in the CLI via attach list.

The attachments are volumes presented as block-devices to a specific node. The information here can be used to decide which clients do not have attachments and their downtime does not affect any known users of the storage system.

3. Visible elements

These are the elements of the system that should be monitored directly.

3.1. Networks

Available in the CLI via net list.

This is a list of the network interfaces and state of the storpool_beacon services on each node. Please note that in the JSON there are two types of network definitions per host, one is network, the other is rdma (for Infiniband networks), and they all should count towards the number of interfaces for the node.

What should be monitored here:

  • that there is the same number of network interfaces for each node, and that it’s at least two. StorPool does not support clusters in production with less than two interfaces.

  • that all networks for the node are up.

  • that beaconStatus is in the NODE_UP state.

  • that clusterStatus is in the CNODE_UP state.

  • (as of 16.02) that joined is true.

  • that there is no node missing from the list of networks.

3.2. Services

Available in the CLI via service list.

These are the different services running on the nodes that perform their separate tasks.

Globals that need to be monitored for all services:

  • Is the service up (status in the JSON)?

  • Are all the services running the same major version (16.01, 16.02) of StorPool?

3.2.1. Server

Performs communication with drives and provides access to them.

Note

For installations with multi-server, the server’s ID (SID) in the JSON is in two parts, node id (N) and server instance id (I). The formula is SID=I*4096+N, and the intances are counted from 0.

What should be monitored on server instances:

  • Is it in recovery (see tasks above)?

3.2.2. Client

Provides access to the OS and processes on it to StorPool volumes.

For every client, the list of active requests can be fetched via client N activeRequests in the CLI. It should be verified that there are no requests that have been waiting for more than 1 second, and to generate warning if such were found.

Alerts for this service can be suppressed if there are no attachments on it, not to generate warnings for hosts in maintenance.

3.2.3. Management

Provides access to the management of the StorPool cluster.

What should be monitored for this service:

  • There need to be more than one such service in the cluster. The recommended number is 3.

  • There needs to be exactly one (1) node that’s active.

3.2.4. iSCSI

Provides access via iSCSI to the StorPool cluster.

If you have iSCSI clients, you need at least one of these to be running. Also, please note that you should keep track if such service existed in the cluster, as it might not get reported if the whole cluster goes down and comes back up without these services present.

3.2.5. Bridge

Provides snapshot transfer service between cluster.

If you have the bridge services, you need at least one of these to be running. Also, please note that you should keep track if such service existed in the cluster, as it might not get reported if the whole cluster goes down and comes back up without these services present.

3.2.6. Disks

Available in the CLI via disk list.

Note

For monitoring disk usage (and template usage below) we recommend using hysteresis. This means that if the change between consecutive checks is less than 1%, not to change the status, even though a value has gone above (or below) a specific watermark.

Disks have the most options that need to be monitored.

For disk state, the following needs to be monitored:

  • If the disk is seen by the cluster (the device field is not empty);

  • If the disk is not in recovery (see tasks above);

  • If the disk was scrubbed less than 2 weeks ago (lastScrubCompleted field);
    • StorPool scrubs the drives of the system every week, but that operation can be delayed because of load. Also please note that prior to 16.02 new drives get their lastScrubCompleted field set to 0.

  • If a disk that was known to be in the system is not there now.
    • This is possible to happen while the server is starting and hasn’t added the disk yet. For this reason we recommend to have an “inventory” system that keeps track of what drives were reported by StorPool and remove drives from it manually or when a drive hasn’t show for more than 2 hours.

For disk usage stats, the following values need to be checked, with watermarks for warning/critical state, and with hysteresis:

  • General disk usage - agAllocated / agCount

  • Objects usage - objectsAllocated / objectsCount

  • entriesFree should be above a certain threshold, we recommend the warning to be below 100000 and critical below 70000.

For the disk errors, they should be monitored not based on their absolute value, but on the velocity of this value, i.e. rate of change. The recommended watermark for marking a disk critical is more than 100 errors within 48 hours.

Currently there is no recommended way to monitor the dis-balance of data on the disks.

For every drive, the list of active requests can be fetched via disk N activeRequests in the CLI. It should be verified that there are no requests that have been waiting for more than 1 second, and to generate warning if such were found.

3.2.7. Templates

Available in the CLI via template status.

Template status is the standard way of seeing the amount of free space in the cluster.

Note

For a discussion of the meaning of “free space” in StorPool please refer to the documentation, as it’s not the same as “space you can allocate”, but “data you can write in”.

For easier understanding of the data, we recommend not to get data for all templates, but to group them based on placeHead, placeAll, placeTail and replication. Otherwise you’ll have repeated data.

Note

Also please note that there can be two or more templates that have overlaping placement groups, or placement groups with overlaping drives. This means that the sum of the free space of several templates can be more than the actual data you can write to the system.

For a template, first you need to check if there are any volumes allocated on it. A template without volumes (i.e. that’s not in use) should not generate any alerts.

The available space is stored in stored->free, the total space in stored->capacity. This is also recorded for the placement groups, and that information should also be displayed, so it would be known which placement group is the limiting factor.

3.2.8. Volumes

Available in the CLI via volume status.

As of 16.01/16.02 this is not recommended to be ran often, as it would generate too much load on the cluster.

3.3. General cluster status

Most of the above checks are with too low granularity and do not provide an easy to understand picture of the state of the cluster. Here are some ideas how to group the data above for getting a better picture:

3.3.1. Disks

  • If there are disks missing/gone from just one server, that should be a warning condition, if there are on more servers, it’s critical.
    • Please note that this is not the case when using replication factor 2, then every missing drive should be treated as critical.

  • Any drive above the critical usage watermark should set the condition to critical.

  • Any drive in recovery should set the condition to warning.

3.3.2. Networks

  • Any critical conditions in a network should be treated as such for the cluster.

  • Any node with just client and no attachments should not trigger the above.

3.3.3. Services

In general, the loss of one should be treated as warning, more than that - as critical.

3.3.4. Cluster services

There are a few services for the whole cluster that need to be monitored:

  • relocator should always be on.

  • balancer is currently (as of 16.01 and 16.02) handled by an external service and shouldn’t be monitored.