When Google launched its big data service, BigQuery, the company had three simple goals: make it cheaper, faster and easier to use than the competition. These are lofty ambitions, but ones that we’ve found to hold up under scrutiny and real workloads.
A key benefit of BigQuery is that it is a completely hosted service. You pay small amounts of money for data storage and then for the processing time used, with Google devoting the necessary resources automatically. It’s a big shift from the old way of working, where you paid a fixed amount for virtual nodes, specified to match the maximum workload.
Huge time savings
Google further pushes its price advantage with incredible performance. Effectively, as you’re paying for processing time only, the faster you can complete a job, the cheaper it is.
When Ocado switched from using Apache Hive (a data warehouse infrastructure built on top of Hadoop), it saw that BigData was 80X faster.
Red Olive has seen similar performance gains with its clients, too. Mark Fulgoni, Principal Consultant at Red Olive, says: “One of our clients, a major media publisher, had to wait about three hours to process a day’s worth of data. Since switching to Google BigQuery, it has seem a dramatic drop in processing time, down to under 15 minutes.”
To add to this, BigQuery can now use the full ANSI SQL instruction set, letting you query your data in a familiar way. This is incredibly important as SQL-trained workers will be familiar with the BigQuery environment, so they can easily query large datasets and pull out meaningful information.
Moving to BigQuery
As a tool, BigQuery is incredibly powerful, but shifting to it isn’t quite as simple as it may seem. With its database as a service model, we’re moving into a new world where many people aren’t familiar with the best architecture for ETL (Extract, Transform, Load – the process of converting collected data into the right structure for processing). As a result, there’s a real danger that a BigQuery service could be over specified to replicate the existing ways of working. This negates the potential cost benefit of BigQuery.
Jefferson Lynch, Client Director at Red Olive says: “We are one of the few UK companies with experience of working on BigQuery – now at 18 months and counting – and many years’ experience working with databases as a service.
“As a result, our experts are well placed to engineer the ideal architecture to make the most of Google’s service. By splitting ETL jobs into lots of little packages with no dependencies, you can pay small amounts of money for each BigQuery job as you need it. This will keep your BigQuery costs on the lowest tier possible while maximising performance. With our business experience of analysing big data, we can help format and query your data, to give you the insights that can transform your business.”