Storage Tip: Intelligence for unstructured data
Send your Storage question to David Hill today! | See other Storage tips from David
What seems to be the problem? A recent newspaper article stated that the latest job of surveillance cameras is to interpret the threats they see. This is a software-intelligence-enabled real-time use of unstructured data rather than the use of intelligent analytic forensic tools after the fact (see previous storage tip on digital surveillance) on unstructured data. This is just the latest example of how unstructured data is being used in organizations and IT organizations are likely to acquire the custodial responsibility for such applications. And therein lays your challenge. Not only will there be more data to manage and store, but the data protection strategies are more likely different than those used for structured information.
What do you need to know? A great deal of confusion surrounds the discussion of the structure of data. General agreement exists that database information is structured information as data and its associated metadata are tightly coupled. The way to determine whether or not data is structured is to ask whether or not the data can be sorted. If the answer is yes, the data is structured.
The disagreement exists over what is semi-structured and what is unstructured data. General agreement exists that e-mail is considered semi-structured and that videos, pictures, audio files, and medical images are unstructured. However, word processing documents, and presentations are considered unstructured, but they are really semi-structured documents.
The difference between a semi-structured file and an unstructured file is simple. Both have file metadata, but you can search on semi-structured data, such as an HTML document using standard tools (think Google). You cannot do that natively with an unstructured file; you can only sense it, such as viewing a video or listening to an audio file. (You can also sense a word processing document, but you can also search on it.)
The reason for distinguishing the different types is that each of the three is managed differently. However, there is movement afoot to add intelligence to unstructured data and thereby making it more manageable.
Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
jfruh
Apple syncing patent can't come soon enough
pasmith
New Twitter features borrow from 3rd party clients
Esther Schindler
Open Source Changes the Software Acquisition Process
mikelgan
How to set up continuous podcast play on the new iTunes
David Strom
Five important Windows 7 mobility features
sjvn
Guard your Wi-Fi for your own sake
Sandra Henry-Stocker
Grepping on Whole Words
Sidekick: The Good News & the Bad News
Either way you look at it Microsoft Data Center management did not follow standards or best practices in this failure. In which case it makes me wonder more about the outsourcing of corporate data much less personal data.
- mburton325
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.













