StorPool iSCSI balance procedure

1. Introduction

Since the introduction of iSCSI scalable, we have a list of controllers for each target. The target is always served from the first controller in the list that’s up (i.e. the portalGroup redirects the connections to that one, and that’s the only controller that can serve the target/volume); If the first controller that’s up in the list is changed, the session is disconnected to connect to the new first one in list. The iSCSI balancing is helpful when the controllers have an uneven number of targets potentially causing performance issues or when new iSCSI controller nodes are added or old ones removed from the cluster. Adding manually a number of controllers to each target’s controller list is a difficult and confusing task. For this case, we can use iscsi_balancer tool. Here are explanatory instructions on how this tool works.

2. iSCSI balancing by controllers usage

We can check what are the current sessions for each of the controllers:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py current_usage

 Current Usage
--------------------------------------------------------------------------------------------------------------------
| Controller:1                                                                                                     |
--------------------------------------------------------------------------------------------------------------------
| iqn:volume-name                                        |  Current Controller Id:1 | controllers:[1, 2, 3]        |
--------------------------------------------------------------------------------------------------------------------
| Controller:2                                                                                                     |
--------------------------------------------------------------------------------------------------------------------
| iqn:volume-name                                        |  Current Controller Id:2 | controllers:[2, 1, 3]        |
| iqn:volume-name                                        |  Current Controller Id:2 | controllers:[2, 3, 1]        |
--------------------------------------------------------------------------------------------------------------------
| Controller:3                                                                                                     |
--------------------------------------------------------------------------------------------------------------------
| iqn:volume-name                                        |  Current Controller Id:3 | controllers:[3, 2, 1]        |
| iqn:volume-name                                        |  Current Controller Id:3 | controllers:[3, 2, 1]        |
| iqn:volume-name                                        |  Current Controller Id:3 | controllers:[3, 2, 1]        |
--------------------------------------------------------------------------------------------------------------------
| Total usage:                                                                                                     |
| Controller:1 --> Targets count:1                                                                                 |
| Controller:2 --> Targets count:2                                                                                 |
| Controller:3 --> Targets count:3                                                                                 |
--------------------------------------------------------------------------------------------------------------------

There are two types of balancing:

  • Intrusive - This type will change the current controller ID in target controller list and will lead to ISCSI sessions reconnects.

  • Non Intrusive - This type will keep current controller ID and will only add and re-arrange controllers in the target controller list.

We can determine what type of balancing is necessary - an intrusive or a non-intrusive one, based on the current usage per controller.

Note

The target’s controller list can contain a maximum of 7 controllers.

Attention

Before we start with the iscsi balancing procedure, it is highly recommended to dump the ISCSI sessions using the check_iscsi_sessions tool.

/usr/lib/storpool/check_iscsi_sessions -m dump <file.dump>

2.1. Non intrusive balancing

Print pre and post balance status

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_non_intrusive -p
...
[root@SPECTA_STP_NODE1 iscsi_balancer]# /opt/storpool/python3/bin/python3 iscsi_balancer.py balance_non_intrusive -p
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol19 --> currentControllerId:1 --> Controllers list after update:[1, 2, 3]
iqn.2023-01.com.spectra:vmware-datastore --> currentControllerId:2 --> Controllers list after update:[2, 3, 1]
iqn.2023-01.com.spectra:iopstesting --> currentControllerId:2 --> Controllers list after update:[2, 3, 1]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol20 --> currentControllerId:2 --> Controllers list after update:[2, 1, 3]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol22 --> currentControllerId:2 --> Controllers list after update:[2, 1, 3]
iqn.2023-01.com.spectra:spectra-hdd-vol1 --> currentControllerId:3 --> Controllers list after update:[3, 2, 1]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol17 --> currentControllerId:3 --> Controllers list after update:[3, 1, 2]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol18 --> currentControllerId:3 --> Controllers list after update:[3, 2, 1]
iqn.2023-01.com.spectra:mgmt-cls-vol01 --> currentControllerId:3 --> Controllers list after update:[3, 1, 2]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol21 --> currentControllerId:3 --> Controllers list after update:[3, 1, 2]
...

Execute balancing:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_non_intrusive --exec

2.2. Intrusive balancing

Attention

This operation will change current currentControllerId and will lead to ISCSI sessions reconnects.

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_intrusive -p
...
[root@SPECTA_STP_NODE1 iscsi_balancer]# /opt/storpool/python3/bin/python3 iscsi_balancer.py balance_non_intrusive -p
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol19 --> currentControllerId:1 --> Controllers list after update:[1, 2, 3]
iqn.2023-01.com.spectra:vmware-datastore --> currentControllerId:2 --> Controllers list after update:[2, 3, 1]
iqn.2023-01.com.spectra:iopstesting --> currentControllerId:2 --> Controllers list after update:[2, 3, 1]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol20 --> currentControllerId:2 --> Controllers list after update:[2, 1, 3]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol22 --> currentControllerId:2 --> Controllers list after update:[2, 1, 3]
iqn.2023-01.com.spectra:spectra-hdd-vol1 --> currentControllerId:3 --> Controllers list after update:[3, 2, 1]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol17 --> currentControllerId:3 --> Controllers list after update:[3, 1, 2]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol18 --> currentControllerId:3 --> Controllers list after update:[3, 2, 1]
iqn.2023-01.com.spectra:mgmt-cls-vol01 --> currentControllerId:3 --> Controllers list after update:[3, 1, 2]
iqn.2023-01.com.spectra:prod-vpc-cls01-stp-prim-vol21 --> currentControllerId:3 --> Controllers list after update:[3, 1, 2]
...

2.3. Execute balancing:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_intrusive --exec

3. iSCSI balancing by target load

Balancing is required not only for the distribution of controllers, but also for the SCSI load between the target and the initiator. The load value is formed by taking the SCSI load and dividing it by the time now minus the time of the created session.

Here is an example with jq:

storpool -j iscsi sessions list |jq -r '.data.sessions[] |select(.stats) | .target + " "+  ( (.stats.scsi/(now-.timeCreated)))'

We can balance sessions per controller by load value, thus improving the overall performance of the cluster by removing excessive load from one of the controllers and spreading it to the others for a better average load per controller, leading to better performance.

In this case first we can check the summed load value for each of the controllers:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py print_by_load -L

Controller: 1  Load: 5580
Controller: 3  Load: 3500
Controller: 2  Load: 2660

Print all sessions on controllers by load value:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py print_by_load -l

All controllers by load: [2, 3, 1]
Controller: 2
----------------------------------------------------------------------------------------------------
iopstesting --> 2
prod-vpc-cls01-stp-prim-vol20 --> 507
mgmt-cls-vol01 --> 639
vmware-datastore --> 1511
----------------------------------------------------------------------------------------------------
Controller: 3
----------------------------------------------------------------------------------------------------
spectra-hdd-vol1 --> 41
prod-vpc-cls01-stp-prim-vol17 --> 269
prod-vpc-cls01-stp-prim-vol21 --> 590
prod-vpc-cls01-stp-prim-vol18 --> 1130
prod-vpc-cls01-stp-prim-vol22 --> 1469
----------------------------------------------------------------------------------------------------
Controller: 1
----------------------------------------------------------------------------------------------------
prod-vpc-cls01-stp-prim-vol19 --> 5580
---------------------------------------------------------------------------------------------------

To manually migrate a session to a controller:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_by_load -m <volumeName> <controller Id> <print/exec>
[root@SPECTA_STP_NODE1 iscsi_balancer]# sp-python3 ./iscsi_balancer.py print_by_load -l
All controllers by load: [2, 3, 1]
Controller: 2
----------------------------------------------------------------------------------------------------
iopstesting --> 2
prod-vpc-cls01-stp-prim-vol17 --> 265
prod-vpc-cls01-stp-prim-vol20 --> 502
mgmt-cls-vol01 --> 629
vmware-datastore --> 1578
----------------------------------------------------------------------------------------------------
Controller: 3
----------------------------------------------------------------------------------------------------
spectra-hdd-vol1 --> 41
prod-vpc-cls01-stp-prim-vol21 --> 577
prod-vpc-cls01-stp-prim-vol18 --> 1103
prod-vpc-cls01-stp-prim-vol22 --> 1496
----------------------------------------------------------------------------------------------------
Controller: 1
----------------------------------------------------------------------------------------------------
prod-vpc-cls01-stp-prim-vol19 --> 5503
----------------------------------------------------------------------------------------------------

First, we can print the command:

[root@SPECTA_STP_NODE1 iscsi_balancer]#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_by_load -m prod-vpc-cls01-stp-prim-vol17 3 print
Your request:
  storpool_q -d '{ "commands" : [ { "targetSetControllers" : { "volumeName" : "prod-vpc-cls01-stp-prim-vol17", "controllers" :  [3, 2, 1]  } } ] }' iSCSIConfig

And then execute the command:

[root@SPECTA_STP_NODE1 iscsi_balancer]#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_by_load -m prod-vpc-cls01-stp-prim-vol17 3 exec

Interactively migrate sessions:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_by_load -i
[2, 3, 1]
Choice controller from list:3
Choice volume from Controller 3:
spectra-hdd-vol1 --> 41.19008797883139
prod-vpc-cls01-stp-prim-vol17 --> 269.1417953826738
prod-vpc-cls01-stp-prim-vol21 --> 590.1786538742178
prod-vpc-cls01-stp-prim-vol18 --> 1130.8843531276598
prod-vpc-cls01-stp-prim-vol22 --> 1469.2086930345286
Choice volume from list:prod-vpc-cls01-stp-prim-vol17
OK
Controller to migrate:2
Please choice from following actions(print,update,exit):update
Success!

Auto migrate sessions:

#/opt/storpool/python3/bin/python3 iscsi_balancer.py balance_by_load -a

Auto migrate is the process that takes the busiest controller’s busiest session and migrate it to the controller with the least load (One session per execution)

Attention

When the balancing is complete we have to check for missing/not-yet-reconnected ISCSI sessions from the file which we dumped earlier:

/usr/lib/storpool/check_iscsi_sessions -m check <file.dump>