Today, the industry is trying to find an easy way to store ephemera from our lives. From the places we visit to the toss-away comments between friends, the goal is to find a fast and efficient way to store endless tidbits from everyone on earth.
The smartest people approaching this problem quickly realized they could make their job dramatically easier by cutting corners and blithely ignoring any glitch. If some status update disappeared, who would notice? If somebody checked in to a service while at a coffee shop and failed to be crowned mayor of that coffee shop, it wasn't a big deal because they would probably return again tomorrow. After the new class of data caretakers recognized that they could save a fortune on compute cycles and infrastructure simply by loosening requirements, they started building NoSQL and other so-called data stores.
Now, saving time and money by trading away accuracy rules the Web. Try searching for an older email message with some of the Web-based tools. They're quietly leaving some of the older ones out of the index. This often reflects a slow erosion of standards for search. Google, for instance, quietly ended the ability to use true boolean searches with the plus sign. Expect to see more and more Web engineers subtly tossing aside the fanatical commitment to accuracy once common among database administrators.
Programming trend No. 10: Real parallelism begins to get practical for all
Computer architects have been talking about machines with true parallel architectures for years, but the programmers in the trenches are just starting to get the tools that make it possible.
The parallelism is appearing in two prominent areas: multinode databases and Hadoop jobs. Some mix the two.
Most NoSQL data stores offer to help spread the workload over multiple machines. Some offer automatic sharding, which splits the data set into pieces, synchronizes the machines that host a given piece, and directs queries to the right machines as necessary. Some offer duplication or backup, a feature that's a bit older; some do both.
Hadoop is an open source framework that will coordinate a number of machines working on a problem and compile their work into a single answer. The project imitates some of the Map/Reduce framework developed by Google to help synchronize Web crawling efforts, but the project has grown well beyond these roots.
Tools like this make it easier than ever to toss more than one machine at a problem. The infrastructure is now solid enough that the enterprise architects can rely on deploying racks of machines with only a bit of hand-holding and fussing.