Reliant - Key Performance IndicatorsWe are continuing our series on performance troubleshooting tips and tricks this week. Today we are looking into queue length and how queue length can tie into other performance indicators and affect the overall health of your array.

Performance Troubleshooting at Various Levels
Multiple performance indicators,such as SP utilization, cache utilization, and queue length tend to all come together and can be looked at on different levels. There are primarily two levels by which you can examine these various performance indicators – the LUN level or the array level.

Starting at the array level can help you narrow down where the performance problem is occurring. Then, you can start digging down into individual LUNs to determine which LUN is causing your performance issue and figuring out why the issue is occurring.

Correlation between Performance Indicators

So, you may start out by pulling various graphs showing your response times, queue length, SP utilization, and cache utilization. These graphs may show a correlation between multiple performance indicators. For example, you could look at your response time and see that it spikes considerably when your cache utilization gets too high.

What this type of correlation will tell you is that your system is most likely engaging in forced flushing, or where your array can no longer send any more I/O to cache and must be sent directly to the RAID group or storage pool where the LUN resides. Because your cache utilization is high your array’s response time has also increased because the array is not able to use cache and the amount of time required to accept a request and send a response.

Queue Length as Indicator of Performance

Another performance indicator you may want to take a look at is queue length. So, what exactly is queue length? When you think of a queue, what first comes to mind is probably waiting in line. The longer the line – the longer you have to wait. When looking at queue length, you are basically trying to figure out where your requests are in line for processing. 

Queue lengths tend to be best around the number four and, really, no higher than about 10. If you start seeing that number spike, this is an indication of a performance issue. Since arrays are multi-tenant systems, you will have more than one server trying to access data simultaneously. Your array will manage the queue on each system to indicate when each system is going to be able to make and handle a request. Essentially, you don’t want that queue depth number getting too high because that will mean you will have systems sitting there waiting for an extended period of time for I/O. 

Now,the question is – why does the queue length tend to spike? The answer to this could be multiple reasons, however, these reasons tend to all relate to the amount of stress and transactions that are being pushed onto your SPs at any given point in time. So, if you have very high utilization, SP and cache, you’ll definitely start to see that queue length start to climb.

So, what do you do to control your queue length? 

Lowering Your Queue Length

One way to control your queue length is to learn what type of workload is coming in from each of your systems and then either segregate or isolate them to where they are not attacking the same controller with similar requests at the same time. Another way you can lower your queue length is the move these workloads to a less taxed system that may have the additional CPU cycles necessary to handle those requests simultaneously with another system.