Big Data and Turkeys

Since a lot of people grumble about the “Big Data” meme — “what is this _big_ data anyhow” — I thought an analogy might help.

Big Data:Data::turkeys:chickens

A turkey is “really” just a big chicken.  Same limbs.  Same white and dark meat, same spices and herbs, similar taste.

But the scale of the turkey introduces new problems and requires new solutions:

  • Will it fit in your oven?
  • Will anything else fit in your oven if the turkey is there?
  • Where will you cook the others things if they won’t fit?
  • Do you have a roasting pan and rack big enough for a turkey?
  • Can you muscle the turkey up and down-stairs to brine it in the cooler (the only place it will fit)?

Ok, I won’t belabor the point: Big Data is different from data because the scale means your old techniques won’t always work.

Have a great holiday.

Big Data, Big Dreams

We’ve got to be at or near the Peak of Inflated Expectations in the Hype Cycle for Big Data.  It’s the point where the meme seems so powerful that everyone wants to associate themselves with it.

But, as happened with data mining, unstructured data mining, and other fevered dreams of extracting ponies from the manure heap of raw data, what if the insights we all believe are lurking in our data… aren’t lurking, or can’t be lured out of hiding?

I ran across a couple of posts this week that bear on the issue.

A post from Jeff Jonas. who can always be relied on to smash false idols, deals with this question.  As Jonas says:

The problem being; often the business objectives (e.g., finding a bomb) are simply not possible given the proposed observation space (data sources).

Dan Woods re-posts another variation on this theme:

…the data created and maintained outside your company is becoming much more important than the data that you can acquire from internal sources. Yet, few companies realize this and fewer are taking action. Instead, they are suffering from the Data Not Invented Here Syndrome.

In other words, there’s a difference between Big Data techniques and magic.  Sigh.

Your thoughts?

Where is the Big Data market at today?

Valhalla has been looking over the Big Data market, trying to answer the question: “how far along is the market?”  Are there really only four or so Big Data users — the likes of Google, Yahoo, Facebook, and Twitter — or are there more?  Is it an Early Adopter (or even merely a Tech Enthusiast market), or has it crossed the chasm?  What are the use cases?

Here are some of our findings:

1.The Big Data market is an Innovator/Early Adopter market overall, with possible Early Majority beachheads in web analytics and adtech

Although our interviewees described a larger number of use cases – “voice of the customer” analytics in marketing, M2M sensor processing, fraud and risk analysis, predictive analytics of various types – there was no hard evidence for widespread uses of Big Data today in these use cases, and many of the interviewees described them as “nascent” or “near-future” use cases.

There was, however, agreement that web analytics and adtech platforms were much further along in terms of using Big Data techniques for projects which were important to the customers’ businesses and mainstream today.

·         AdTech users employ Big Data technologies for real-time bidding (RTB) and managing and matching 3rd-party data to ad inventory or online user data (this area seems to be called “data management platforms”, an area where DemDex (which was acquired by Adobe for $xxxM) is perhaps the poster child.

·         Web analytics users employ Big Data technologies for indexing web pages and extracting performance indicators from raw weblogs.

2.     Informants believe that Hadoop and its stack is likely to remain the central platform for the Big Data market, but there is contradictory evidence

I don’t personally agree with this finding, but our interviewees all said, implicitly and explicitly, that the Hadoop stack was going to be the basis for Big Data technologies going forward.

One very thoughtful analyst said explicitly that the MapReduce/Hadoop stack would evolve over time, and that new technologies – like Dremel or Storm or Spanner and so forth – would be incorporated into the Hadoop ecosystem rather than creating new ecosystems of their own.

The only problem with this point of view is that “legacy” Big Data techniques – data warehousing, RDBMS, classic Business Intelligence suites – have a vast market share and a long history of productive use cases.   How these platforms will interoperate in the future is unknown, and whether an approach like Hadapt’s (where a “classic” RDBMS or BI technology suite runs within the Hadoop stack) will prevail is still too early to call.

3.     Wikibon’s analysis sizes the Big Data market today at $5B

A quantitative Wikibon analysis, which is quite thoughtful, concludes that $480M of this revenue comes from what they call “pure play” vendors (i.e., Hadoop infrastructure vendors and some other NoSQL or NewSQL) and the balance from legacy players.

Very curious about your thoughts on this.