The problem:

A large company in the retail / consumer goods sector with many hundreds of purchase points, millions of customers and many tens of thousands of products described to us a problem: their data warehouse has billions of rows in it and they want to be able to store up-to-the-second information and get responses from the full data set in about one second.

One of their business drivers is the analytics behind “next best action“, the ability to interact with each customer in a fully personalised way and make them an offer which is very likely to be of interest to them personally and therefore likely to be accepted.  They have a very short period of time in which to interact with their customers and act.  Today this is impossible for them.  The problem is that disk-based technology is simply not fast enough.  Solutions such as SAP HANA have been investigated but are perceived as so expensive that the business case becomes very weak.

Enter in-memory analytics:

The solution Red Olive proposed was to create an analytical appliance based on emerging IBM technology.  By partnering with a database performance specialist, together we have created an analytical appliance based on relatively modest IBM hardware (2 * E7520 QC CPUs at 1.86 GHz and 512GB RAM) costing under £50k.

A test data set has been used for benchmarking.  The data is structured as a star schema with a fact table of just over 600 million line items, 150 million orders, 15 million customers, 20 million products and a million suppliers.  Two main query types have been tested.  The first is to retrieve results relating to one customer, as in the anaytical business case described.  The second is to return a very large dataset, to replicate an operational application such as supply chain replenishment forecasting.  The results have been little short of phenomenal.

Results:

The acceleration in both work types has reliably been in the order of 900-1,000 times the maximum speed achievable using a fully indexed disk-based system.  Increasing the number of CPUs has been shown to further decrease response time in a predictable and linear fashion.

Conclusions:

The benchmarking is not yet fully completed but our conclusion so far is that while this solution has some limitations when it comes to handling changing work types, in situations where reliable repeat performance is needed it offers some exciting and affordable options for processing data warehouses and data sets in the sub-5 TB region.  Further development of the software being used will be taking place during 2013 and these limitations are expected to be removed in the 12-18 month timeframe.

For more detailed information about the benchmark or to arrange to try this out for free on your own company’s data, please contact us or call us on 01256 831100.