This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
Enterprises have long struggled with the storage conundrum of high capacity vs. high performance. Hard disk and solid state drives (SSD) each offer advantages the other cannot match. Hard drives are the reigning kings of capacity, but lack the performance, ruggedness and power-saving features of SSDs. For all their advantages, SSDs are miles away from hard disk drives in delivering the capacity and cost-effectiveness required by enterprise applications.
Offering such varied but compelling attributes, these two technologies are making tiered storage a relevant and highly logical approach to managing data. It just makes sense to combine the best of disk- and flash-based technology within one enclosure, and now a new category of tiered storage called real-time tiering finally does just exactly that.
[STORAGE SMACKDOWN:SSDs vs. Hard drives]
The administrative challenge with automated tiered storage is ensuring the hottest data is on the SSD level at all times. With a dynamic workload that is often presented by multiple virtual machines, this is no easy task. With virtual server environments, applications are responding to changing activities, resulting in an unpredictable and ever-changing workload on the storage. Real-time tiering in the storage array tackles this data-allocation challenge, taking automated tiered storage to a new level by ensuring the most critical data always reside on SSDs, while less active data blocks are moved to the hard disk drive tiers.
Autonomic, real-time tiering with SSD drives and built-in virtualization overcome the two major limitations found in most tiered storage systems today. By automating the migration of data in real-time, the storage system virtualizes both the SSDs and hard disk drives at the sub-LUN-level across multiple RAID sets. Intelligent algorithms continuously monitor I/O access patterns and automatically move hot data to the SSDs to maximize the speed of I/O operations and, therefore, improve performance of the aggregate application workload.
So when a Web server, for example, experiences a rush of high volume activity based on the latest social trends, real-time tiering ensures the hot data is handled by the fast SSD drive tier. As the trend fades, a SAS hard disk drive tier houses "warm" data, and as activity continues to decrease, "cold" data is moved to an archive layer consisting of near-line SAS disk drives. All of this is done in real time, automatically and seamlessly, without any need for intervention from administrators.
Real-time tiering goes beyond automated tiering alternatives
Traditional automated tiered storage offerings suffer from two common and sometimes significant limitations. The most crippling is insufficient frequency of data migration between tiers. Many SAN solutions are capable of moving data to the SSD tier on a daily basis only owing to the disruptive nature of the migration itself. Because halting an application to migrate the data is normally unacceptable, the migration must occur during periods of minimal or no demand, which is usually overnight. Such a daily or nightly migration may be sufficient for applications where the datasets can be known in advance (and fit entirely in the SSD tier), but it is of little or no value for all other applications where the most active data changes throughout the day.
The second limitation of traditional automated tiering involves how both the system itself and the data migration process are managed. Most systems require the IT department to determine which data sets will be migrated at what times between the tiers. This might be part of a regular schedule (e.g. a certain day of the week or month), or on an ad-hoc basis as needed (e.g. an event or project scheduled for the following day). Either way, a sub-optimal determination inevitably results in minimal improvement in performance.
SSD tiering extends cache performance
For years, flash technology has been used as cache to improve storage array performance. Performance gains from cache are limited by the practical size of a cache memory. Overcoming these limitations will extend the performance gains afforded by caching into the SSD storage tier as shown in the diagram below. Note how the practical size of a cache limits its ability to achieve gains beyond a certain point.
The SSD's far larger capacities (in the hundreds of Gigabytes range), which are also available at a significantly lower cost-per-byte than cache memory, make it possible to scale the performance gains considerably. But this will only be the case if the migration of data between tiers can be made sufficiently intelligent and dynamic to keep pace with the real-time changes in hot data.
How real-time tiering works
Real-time tiering utilizes three separate processes, all of which operate in an autonomic manner in real-time, including:
- Scoring to maintain a current page ranking on each and every I/O using an efficient process that adds less than one microsecond of overhead. The algorithm takes into account both the frequency and recency of access. For example, a page that has been accessed five times in the last 100 seconds would get a high score.
- Sorting of all high-scoring pages occurs every five seconds, utilizing less than 1 percent of the system's CPU. Those pages with the highest scores then become candidates for promotion to the higher-performing SSD tier.
- Migration is the process that actually moves, or migrates, the pages: high scoring pages from hard disk drive to SSD and low scoring pages from SSD back to hard disk. Less than 80MB of data are moved during any five-second sort, so the impact on overall system performance is minimal.
Measuring performance advantages of real-time tiering
Real-time tiering solutions can deliver up to 100,000 random read and 32,000 random write I/Os per second. The chart below shows the potential performance gains achievable with real-time tiered storage. Naturally, the higher the "hit rate" in the SSD tier, the higher the gains. But even a conservative hit rate of 70% (easily attainable in most environments) can deliver a 3x improvement in application performance. Far greater performance gains can be realized when the SSD tier handles 80% or more of the I/O load, even though that tier represents only 5% to 10% of the system capacity.
In summary, many, if not most applications today are I/O-constrained, which limits their performance when using traditional hard disk drives, whether directly attached or in virtualized storage area networks. Caching helps improve performance to a point, but fails to scale because it quickly reaches the point of diminishing return on the investment. SSD technology that uses fast flash memory increases I/O rates by 2,000 to 3,000 times compared with hard disk drives, and a combination of the two in a tiered storage configuration offers the most cost-effective way to achieve significant performance gains today.
Dot Hillempowers the OEM and channels community to bring unique storage solutions to market, quickly, easily and cost-effectively. Offering high performance and industry-leading uptime, Dot Hill's RAID technology is the foundation for best-in-class storage solutions offering enterprise-class security, availability and data protection. The Company's products are in use today by the world's leading service and equipment providers, common carriers and advanced technology and telecommunications companies, as well as government agencies and small and medium enterprise customers.
Read more about data center in Network World's Data Center section.
This story, "Real-time tiering in a two tiered storage configuration offers the best performance gains" was originally published by Network World.