Big Data is More Than More Data

by Brian Timoney

As a buzz-worthy term “Big Data” has a lot going for it: easy to remember, vague enough that a shared clear meaning is always in doubt, and its own O’Reilly conference. Like programmers describing the merits of their software only in terms of the number of lines of code, talking about big-ness merely in terms of number of records in a database seems to be missing some larger point.

Though we in the geo world have experience with bulky data (e.g. LIDAR, maybe SCADA…), what’s looming with the sensor web, ubiquitous GPS, etc. is on a different scale altogether. One would hope that given our advantage of being schooled in the analysis and display of location our industry would have more than a leg up on our uninitiated brethren. But then haven’t we already seen the story  of companies who succeeded by understanding scale then figuring out the geo part later?

Paradoxically enough, success with Big Data may be as much a question of Art as of Science. Because the phrase “the data tells the story”–which was never true–is even more misleading in the context of Big Data due to its size and speed. A common analogy is that of sticking one’s face in front of an open fire hydrant: the expectation of the data telling its own story and you’d emerge a bit dazed, only able to conclude that you experienced some sort of odorless liquid.

unwieldy and unpleasant

                      Big Data is unwieldy and difficult to bend to one's will


Context and narrative are key no matter what data you’re dealing with, but without it Big Data in particular has little value. To use another analogy, the value of Big Data is directly correlated to an organization’s ability to mine big data for meaningful, actionable information.  The decidedly mixed record of the enterprise in doing anything interesting with their data besides storage, retrieval, and elementary reporting fueled this great take that there There’s no such thing as big data.” That’s why Michael Driscoll sees “Big Analytics” as a necessary complement to Big Data. This is where geo can shine: there is no more immediate context than location context; throw in spatial analysis and now you’re cooking with Crisco®.

~  ~  ~  ~  ~  ~  ~  ~  ~

So Big Data requires more than adding a couple of sub-select statements to your trusty SQL queries.  Parallel processing strategies, NoSQL databases, and much faster methods of moving data from server to browser (Node.js) are some of the weaponry needed to tame the beast. Now I mentioned O’Reilly’s Strata Conference in New York in September: who doesn’t feel smarter and better-looking at an O’Reilly conference despite (or because of) its $3245 price tag?  But I have a better deal:  how about high-fiber, roll-your-sleeves-up, geospatial-centric sessions on Big Data scalability, turbo-charged spatial analysis, and moving all these bits and bytes around in the web world for one-third the cost!  Now, truth in advertising, we’re a homely bunch, but that will improve after a few craft brews at our Mile High altitude. If money is no object, how player would it be to hit up FOSS4G in Denver, then be primed to drop the knowledge amongst the black-clad, rimless-specs set at Strata New York two weeks later?

Answer: very player.


—Brian Timoney


* photo courtesy of the hitthatswitch Flickr stream