Reliant - Cache UtilizationLet's say you look in your storage system and SP utilization is not an issue but you're still seeing lackluster utilization overall. The next thing you can look at is the cache utilization on the SP. So, the SP, or storage processor, has a layer of cache - which is basically its volatile memory where data is on the processor before it is written to disk.  

Now, let’s say you have a storage environment with very high IOPs requirements, and someone made the decision to run this environment on SATA drives. Let’s face it – you won’t have the IOPs capability that is required for that environment. So what happens? You storage processor identifies the hot blocks on these LUNs and keeps all of this data in cache. So, as the transactions are coming and the I/O is coming in, everything is being written to the cache. The disks are simply not fast enough for the storage processor to unload this data on the discs in a timely fashion and allow new data to come into the cache.

Key Performance Indicators

A key performance indicator to look at here is the percentage of dirty pages on your storage processor. So, what is a dirty page? Dirty pages are basically pages of cache that have not yet been written to disc. These dirty pages usually have a high and low watermark between 60-80%. What happens here is that your SP cache will hit 80% and then start dumping the data to disc which will bring the cache back down to 60% and then works its way back up to 80% and so on. So, what you will see in your performance graph is that your cache utilization will have peaks and valleys that stay between 60-80%. This is an indication of healthy cache utilization and is the way it is designed to work.

100% Cache Utilization

So, what happens in a situation with very high cache utilization? Your percent of dirty pages will climb up to 100%and then it will then flat-line. The array is no longer able to flush the data to disc fast enough. At this point, you are in danger of what is called "forced flushing.” What forced flushing is, is basically your array saying, “I am not taking any more I/O into my cache until I can dump what is currently existing in the cache to disk.” So, any additional I/O coming into the array no longer has the ability to use the cache. Instead, it goes directly to the RAID group or the storage pool where the LUN resides.