Good business-oriented data warehouse design and implementation was one of the main reasons Red Olive was set up, but I feel like we haven’t given it much attention recently so I thought I’d go back to the company’s roots, particularly in the context of the current interest in agile data warehouse development.
My first data warehouse experience: the sloth
Many years ago while I was at a former employer we began implementing SAP BW v2.0b with the aim of establishing an enterprise data warehouse. We attended the training, read books and took what I’d later realise was a design approach of “if we build it, they will come”. We switched on the standard SAP extractors and used what BW called the “Persistent Storage Area” (PSA) as an initial entry point for newly loaded data. It was quite literally a bottom-up design. We created what we viewed as a data warehouse layer out of BW “Operational Data Store” (ODS) objects with data in third normal form. We then created what we viewed as data marts on top, out of more ODS objects, then cubes which each fed various reports.
Once we had all this data in the warehouse and about 3 months after starting the project we began organising meetings to gather business requirements.
The initial requirements were to come from the neighbouring team who were implementing SAP R/3, the ERP system that’s now called SAP ECC. We wrote detailed user spec’s, functional spec’s and technical spec’s for each report requirement, and we had hand-offs between user and reporting analyst, reporting analyst and area technical analyst, area technical analyst and central data warehouse analyst. There were testing documents at each stage and the process of creating a report took 6-7 weeks. It got much slower once we started sending work off-shore and formalised the process even further. There was no real data available for us to test against until well into the ERP project. It took 9 months to go live with the first 15 reports.
Once it was live the data warehouse ballooned and the cost of the servers to process and store so much data ballooned too. After about 12 months of live running we checked and less than 5% of the data we were storing had been accessed at all in the previous 3 months. We were devastated.
Later experience: the horse
A few years later I moved company and had a completely different experience. I worked with a designer familiar with dimensional modelling. We started by interviewing a range of business representatives. The discussions focused on what they wanted in fairly broad terms, and in the space of a few days we spoke to people in multiple departments. The descriptions we got were in business language and often fairly vague, and there was nothing about specific reports or where the data was going to come from; the design was completely top-down. This high-level design was recorded as a “bus matrix” in Excel and soon there was a meeting to present back to the users, to check this logical model did what they wanted. The model didn’t just do what one person wanted, it was for all of them, in business language they understood. There were some changes to the model, but broadly it did what the business people said they wanted. We were about 4 weeks in.
The designer added more detail to the bus matrix and specified where the data came from and how the data transformations would be carried out. A single ETL developer was added to the team and another 4 weeks later the first data for the first few reports was loaded, from a stable source system. Another 4 weeks later and the first few reports had been designed, built and were in testing. After 14 weeks we went live with a small but functioning system on fairly cheap hardware. We were only extracting the data that the business people needed right now. The reports were fairly basic but they were used and liked by the business people and they soon demanded more. The project felt like it was being done on a shoestring, but it was a big success.
Now, blink and you’ll miss it: the cheetah
Now things have moved on again. We’re not looking to get into the Inmon / Kimball design debate here, it depends what you’re trying to do, but there are some clear lessons. We at Red Olive are fans of dimensional modelling because we’ve seen it give business people great results many times and that’s why we offer training in it. We still believe the bus matrix is a terrific control tool to help prevent data warehouse project managers from getting lost once they’re in the woods, but these days we use much better, quicker, self-documenting tools to go through the design and build of the data warehouse (more on that in a later blog).
Like a growing number of others we take the best principles from the agile approach although we’re not agile practitioners per se and we tend to apply “scrum” more than extreme programming. Here are the main things we like about the approach:
- The accountability between the product owner and the scrum master makes for really clear and crisp deliverables.
- The absolute clarity of the list of deliverables in a sprint, which then remains fixed once the work is underway, is a satisfying way for the team to focus and deliver without getting distracted.
- The accumulating, accretive deliverables which deliver business benefit from the start makes sure you maintain business support throughout and keeps the business risk low.
- The sheer speed!
Next time we’ll look at where an agile approach to data warehousing works well and how to get the most business benefit out of it.