When it comes to big data analytics, there is a vigorous discussion about how to extract value from unstructured data.
It is estimated that the majority of data makes up 80–90% is unstructured1
and we continue to generate more. This includes content from social media sites such as Facebook posts, tweets, LinkedIn discussions, in addition to blogs and emails. You also have social networks within the enterprise, such as Jive, Yammer, Huddle and Salesforce Chatter. On top of that there is machine-to-machine data emerging from the Internet of Things.
It is accepted that better, timely access to the right information — structured or unstructured — can yield significant business benefits: greater productivity and increased revenue, reduced costs, getting more innovative products to market faster, and better customer relationships.
But what does it take to get at this information? Are we asking the right questions?
Extracting value from unstructured data is a classic big data challenge. Simply organizing information
prior to using Hadoop or MapReduce can be a project in itself. Though semantic, contextual search and natural language processing (NLP) tools have made and continue to make progress, these approaches generally assume you know what question to pose. I’ll get to those questions in a moment. First, let’s look at how you can set yourself up to ask them.