Big Data: where do you start?

Big Data: where do you start?

Up until now I haven’t written about Big Data, but the events of spring 2015 have meant I can’t avoid it any longer.  Let me explain.

Last year we were engaged by a fairly large financial services business to help them envision how they could commercialise their data.  Our main conclusion was that they could use it to extend beyond being a Business to Business service provider, into a number of Business to Consumer markets.  The nature of their data meant that they were well placed to develop insights that few other companies would have, and to prove the point we undertook a proof of concept with reasonably promising results.  The data volumes weren’t huge and we easily ran the predictive modelling algorithms on our laptops.

Then the organisation’s CIO started to talk about the need to buy a strategic Big Data platform.  We really couldn’t see a business case: we explained that taking into account all the potential analyses highlighted, we really couldn’t recommend spending a few million on a platform when a reasonably meaty laptop would do the job fine.  And if they did want to foray into the world of Hadoop and Spark then an Amazon cluster could easily be spun up for a short time at minimal cost.  Any business would be glad to keep its money in the bank, right?

Wrong.  We were told we were clearly not the right company for them, goodbye.  They were after someone who would give them a platform.

How did we get it so wrong?  Well sometimes the shoe just doesn’t fit and anyway we had plenty more going on.  Including a request from an insurance company to design a dependable, real-time system for providing internet quotations.  There was a tight business deadline to go live and a lot riding on the timing.  The quotation structure was standard, the query highly repetitive and we judged that low risk with dependable fast performance at relatively low cost was the way to go: we opted for SQL Server 2014, an in-memory relational database.  Feedback was good and we were duly short-listed.

Then at a very honest face to face meeting we were told we had not been successful and instead a little-known proprietary database platform designed for Hadoop was going to be used.  More risky but the perception was that it offered something new, as yet not fully worked out; business case not fully clear yet, but in the era of Big Data it’s worth trying…

A hark back to the late nineties?

Is data really the new oil?

Is data really the new oil?

The point is this: we’re seeing some pretty level-headed and serious businesses willing to risk some sizeable investments in setting up Big Data platforms even where there’s no clear business case at all.  There seems to be the combined effect of a push from fear of being left behind in the dash for the “new oil”, and a pull from the potential of greatly increased stock valuation if a company can become perceived as a data-led technology business.

Back in 1999 I was fresh out of university and working for Unilever in Business Intelligence.  Anyone who was anyone was working in a “dot com” company and if you were, money seemed to be no object.  Investors piled money in to invest on the hope of future profits.  The internet was the thing, and everyone wanted websites, even staid old Unilever.  Some pretty level-headed and serious businesses were willing to risk some sizeable investments in becoming e-businesses, even where there was no clear business case at all.  Sound familiar?  Everyone was learning how to hand-code HTML, even though it wasn’t clear how these websites would benefit your business.

.COM stock collapse

.COM stock collapse

Then came the crash.  Some companies have never recovered, some have: eBay.com’s shares fell from $107 at the peak to just $7 and then rose to over $400 a decade later.

So what?

All the hype aside, analytics based on Big Data does have very real business potential, but the technologies and the business models are going to take time to mature; for anyone who’s been working with data for any length of time, knitting together a Big Data solution out of the current crop of open source tools can feel like winding the clock back 20 years to a time when hand coding was the only option.  Over the next few blog articles we’re going to attempt to guide a way through some of the potential pitfalls and identify where real value can be found from Big Data.

.COM RIPs

.COM RIPs

We’re going to start by considering where Big Data can be successfully applied.  We’ll then examine some of the methods that can be used to bring about those applications.  Finally we’ll turn our attention to some of the Big Data infrastructure that’s needed underneath.