September 14, 2011, 11:30 AM — In the first part of my 2-part review of Windows Server 8 I looked at some of the best of the more than 300 new features Microsoft packed into the upcoming server OS. Now it's time to turn our attention to some massive storage enhancements.
[ See also: Windows Server 8: Highlights of the upcoming server OS ]
My personal highlight of the entire three days of the Windows Server 8 reviewers workshop were the talks about Storage. The killer-feature to me is the new and built-in data deduplication, which detects duplicate data in files and folders, puts it in a separate store (System Volume Information) and simply gets rid of the redundant bytes. The file itself is 100% intact, though once it gets accessed it pulls the (now missing) information back from the one single data store.
Now, deduplication isn’t something groundbreaking. It's been done before, and it's been done well, but dedup has never found its way into the OS, which means it's deeply integrated and highly manageable. Microsoft Research invested 2 years on this algorithm and came up with techniques to minimize the performance impact caused by pulling one piece of data from one part of the disk and when fetching other pars from the data duplication store (fragmentation!); according to Microsoft's server team, dedup has a less than 3-4% impact on overall performance when accessing the data, although only performance tests will tell the true story.
However, the benefit greatly outweighs the possible downsides. Generally, you can expect a chunking rate of between 30% and 90%, which is absolutely amazing. On day 3 of the Windows Server 8 reviewers workshop, I had the chance to catch up with the development and program management team behind data deduplication and found out a couple of interesting tidbits:
Deduplication automatically runs on "idle". Say you've enabled deduplication on drive E and copy 20 gigs of files over, deduplication wouldn't start immediatelly. It would, however, wait until the server isn't quite as busy and perform the deduplication process. You have to keep in mind that going through files and detecting data is quite an I/O eater.
Admins can determine which files get deduplicated based on their age. Maybe you don’t want to dedup files that the server just created.