What Does Big Data Actually Mean?

2014-09-24 03:17:47

The term Big Data is ubiquitous and enigmatic. It’s so overused that it has practically morphed into a meme for using fancy math to make technology better. In a Center for Technology Innovation analysis of Big Data in education the term was defined as a, “group of statistical techniques that uncover patterns.” But, others disagree, so what is Big Data?

While the use of the term is quite nebulous and is often co-opted for other purposes, I’ve understood “big data” to be about analysis for data that’s really messy or where you don’t know the right questions or queries to make — analysis that can help you find patterns, anomalies, or new structures amidst otherwise chaotic or complex data points. Usually this revolves around datasets with a byte size that seems fairly large relative to our frame of reference using files on a desktop PC (e.g., larger than a terabyte) and many of the tools around big data are to help deal with a large volume of data, but to me the most important concepts of big data don’t actually have much to do with it being “big” in this sense (especially since that’s such a relative term these days). In fact, they can often be applied to smaller datasets as well. Natural language processing and lucene based search engines are good examples of big data techniques and tools that are often used with relatively small amounts of data.

As computational efficiency continues to increase, “big data” will be less about the actual size of a particular dataset and more about the specific expertise needed to process it. With that in mind, “big data” will ultimately describe any dataset large enough to necessitate high-level programming skill and statistically defensible methodologies in order to transform the data asset into something of value.

“Gig data” is the situation where an organization can (arguably) say that they have access to what they need to reconstruct, understand, and model the part of the world that they care about. Using their big data, then, they can (try to) predict future states of the world, optimize their processes, and otherwise be more effective and rational in their activities.