EMC UNITY: Drive IOPS Explained

I am sure a lot of you have seen some of the hero numbers for the drive IOPS in Unity. I think we all feel this is Marketings doing, so I looked into it. First, are these numbers “real”? Absolutely! These values are directly observed via Unity metrics, looking at the “Disk IOPS” per-disk metric, while running performance tests against LUNs and file systems in the Unity system. These performance tests are designed to closely mimic I/O that is observed in real customer systems; however we are not counting cache hits, we are not short-stroking drives to decrease latency and increase performance, we are not using artificially small I/O sizes, or using sequential streams to increase IOPS, etc.

The basic methodology for the testing is to configure the desired storage pool, designed so that the drives will be the limit to performance. Then we create LUNs / file systems to fill the pool to capacity, and write to all available space in order to populate the thin objects. Then we run a 100% random workload across 100% of the pool capacity. The workload is 50% read / 50% write, using a mix of I/O sizes between 8KB and 32KB. We run the workload as fast as it will go, and this gives us the peak IOPS that the drive can deliver under that load. Response time at this point is horrible, and we would never advise a customer to run anywhere close to that level, so we back off the workload to a point where reasonable response times are achieved. By no coincidence this is usually around 70% of the maximum achieved. We then verify the per-drive IOPS level, and it is this number that is reported. That being said, these numbers are probably not what you can expect in the real world.

This makes the way we traditionally sized pools a little more difficult. When it comes to Unity, apparently you should not do any sizing based on the drive IOPS anymore. That is the major change in Unity. There is no simple way to estimate system/background IOPs just based on what the hosts are causing. There is no “rule of thumb” for calculating IOPS. The RAID calculations done previously were straightforward, but now with meta data and other background operations, there is no easy way to calculate what will happen on a customer’s system.

So what are these numbers good for, then? They basically provide an upper limit of how far you can push the drive without getting into trouble. Think of them as a replacement for drive utilization statistics. (Unity does not provide the same drive utilization % metrics as what we have had in other products.) If your system is showing sustained per-drive IOPS above this level, they may be seeing problems. Just like max per-port bandwidth data, these values are an indicator of when an individual component might be the bottleneck.

The idea behind the RoT numbers was that you could translate expected front-end load into expected back-end load based on simple RAID algorithms, and then figure out how many drives it would take to meet that load based on some per-drive performance rating. That model has been falling apart for years, with the switch from RAID Groups to Storage Pools, and then from thick to thin, and the addition of multi-tier pools, and redirect-on-write snapshots, compression, deduplication, etc. Even without all that, the RoT numbers were artificially low so as to be conservative in sizing estimates. The official RoT might have been 150 or so, but a quick review of USPEED data would routinely show that the drives were doing above 400 or 500 IOPS in mixed testing.

So the RoT numbers were artificially low, but they have also been falling behind, too. Drives, even the spinning rust variety, HAVE gotten faster over the years. Moving from 3.5″ to 2.5″ form factor (thereby reducing head seek distance), going from FC to 3Gb SAS to 6Gb SAS to 12Gb SAS, moving from 512B to 4K sector size. Not to mention firmware enhancements on the drives, and code improvements in our own products in the ways that we use the drives. All provide small improvements that have been adding up over time.

On the Flash side of the house, the improvements are vast. And the change is coming much quicker there. The currently shipping Unity Flash drives offer at least double the performance of the first SLCs that were introduced into the CX series. And Flash vendors today are offering drives that should double that performance again.

 

One comment

Add Comment

Required fields are marked *. Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.