Mazda Motor Corp., with 900 dealers and 800 employees in the U.S., manages around 90TB of data. Barry Blakeley, infrastructure architect at Mazda's North American operations, says business units and dealers are generating ever-increasing amounts of data analytics files, marketing materials, business intelligence databases, Microsoft SharePoint data and more. "We have virtualized everything, including storage," says Blakeley. The company uses tools from Compellent, now part of Dell, for storage virtualization and Dell PowerVault NX3100 as its SAN, along with VMware systems to host the virtual servers.
The key, says Blakeley, is to migrate "stale" data quickly onto tape. He says 80% of Mazda's stored data becomes stale within months, which means the blocks of data are not accessed at all. To accommodate these usage patterns, the virtual storage is set up in a tiered structure. Fast solid-state disks connected by Fibre Channel switches make up the first tier, which handles 20% of the company's data needs. The rest of the data is archived to slower disks running at 15,000 rpm on Fibre Channel in a second tier and to 7,200-rpm disks connected by serial-attached SCSI in a third tier.
Blakeley says Mazda is putting less and less data on tape -- about 17TB today -- as it continues to virtualize storage.
Overall, the company is moving to a "business continuance model" as opposed to a pure disaster recovery model, he explains. Instead of having backup and offsite storage that would be available to retrieve and restore data in a disaster recovery scenario, "we will instead replicate both live and backed-up data to a colocation facility." In this scenario, Tier 1 applications will be brought online almost immediately in the event of a primary site failure. Other tiers will be restored from backup data that has been replicated to the colocation facility.
Adapting the Techniques
These organizations are a proving ground for handling a tremendous amount of data. StorageIO's Schulz says other companies can mimic some of their processes, including running checksums against files, monitoring disk failures by using an alert system for IT staff, incorporating metadata and using replication to make sure data is always available. However, the critical decision about massive data is to choose the technology that matches the needs of the organization, not the system that is cheapest or just happens to be popular at the moment, he says.
In the end, the biggest lesson may be that while big data poses many challenges, there are also many avenues to success.