Big Data as “Computation Brought to the Data”

A recurrent meme at the Hadoop World conference last week was the idea that part of “Big Data” or even “the heart” of Big Data is “bringing the computation to the data.”

At first I thought that the main impact of this — beyond the very real observation that “shared-as-little-as-possible” architectures are great for scaling data processing — was poetic: it was sort of a democratization of compute power, or liberating compute power from the dark satanic mills of Oracle or the like.

But there appear to be architectural implications as well.  A stateless or practically stateless approach to data weakens any hope of transactional integrity, for example.  If you coordinate enough to be certain that everything will be undo-able, you’ll never get anywhere on your data.  You need probabalistic assurances, not logical ones.

Also, new approaches will be needed for security and storage in an architecture where a vast universe of data/computation nodes coordinate.  Maybe there are startups looking at this today, but would love to hear of anything interesting going on in these areas.