Does "Big Data" just mean lots of data?
It would be interesting to discover when the term "Big Data" first appeared and why this particular term was used. I have tried using Google but they removed the "TimeLine" option which makes uncovering the answer to this type of question very difficult. If anyone knows the answer please let me know.
Anyway, we are stuck with this term and many of the customer that I meet are completely confused by this strange term - "Big Data". Nearly ever customer I talk to assumes that the word "Big" implies huge quantities of data. Many of the storage vendors have done a great job in convincing the technical teams that big data means storing petabyes or even zetabytes of data. I heard one customer the other week tell me that their storage vendor had told them to expect a 50X growth in data volume over the next 5-10 years. While this might be true the worrying part was that the customer was taking this as fact and claiming that "Big Data" would be central to their future IT plans (note that this related to IT plans and not business plans).
In reality, "Big Data" is the wrong term for a very important subject area that has the potential to revolutionize many businesses. Big Data is not just about collecting and storing data that is complex in terms of the way it is structured. Many software and hardware vendors typically focus on just these two elements: large data volumes and complex data structures. The other key elements of "Big Data" story, velocity and value, are somehow getting lost yet these two elements are critical in turning Big Data data sets into real business revenue. Storing big data data sets simply incurs costs. Analysis of big data data sets, usually in real-time, is where the real opportunities lie and where the money is.
The majority of use cases for Big Data show how information can be captured from individual customers, individual web sessions, specific machines or components, individual patients. individual sportsmen and sportswoman etc... All the use cases I have seen focus on being able to track and record discreet interactions both inside and outside the business. The focus is the complete reverse of "Big". The focus is on micro or nano level data. A better term would be "Nano-Data" . According to wikipedia nanotechnology is the "engineering of functional systems at the molecular scale". This relates very nicely to my definition of Big Data since Nano-Data or Nano-Analytics can be considered the "analysis of operational interactions at an individual level". The "individual" can be defined as any of the items I listed earlier (web clicks, customers, machine components, tracking data etc).
The accumulation of lots of nano data sets makes it possible to extract deep analytics about the combined or group or herd data set. i.e. moving the analysis from nano to micro to macro levels. Of course you do not have to store zetabytes of nano data to provide effective analysis. Sometimes the analysis can be based on reasonably small data sets (ten's of TB).
Last month I spoke to a customer in the US who had not considered Big Data as a suitable technology for their business because they only had about 10TB of data that they needed to analyze. Once we (the sales team and myself) had talked to them about the analytical opportunities that "Big Data", or Nano-Analytics, could open up for them they became very excited and we are now working with their business and IT teams to define an data platform that allows users to build data filters using data mining tools then pivot, slice and dice in every way possible.
This makes me wonder how many other customers have already turned away from Big Data because they have incorrectly assumed that their data volumes are too small? If we had a better term, such as Nano-Data or Nano-Analytics then may be more customers would be looking to invest in this new area of analytics. Over the next few months I will try out this term on our sales teams and customers to get their reaction. Let me know what you think.