10 steps to ETL
Whether you buy or build your ETL tool, ETL development can be broken down into 10 steps:
- Create a high-level design for the target model.
- Choose, install, configure and learn the ETL tool of choice.
- Determine your default strategies.
- Drill down by target table.
- Develop the historical load for dimensions.
- Populate historical fact tables.
- Develop the dimension table incremental processing.
- Develop the fact table incremental processing
- Implement aggregate and OLAP loads.
- Create the plan for ETL operation (ie, production support), automation and future enhancements.
Thank you for the great post Tod McKenna.
Data Warehousing simplified
Two of my favorite articles about data warehousing are Data Warehousing for Cavemen by Philip Greenspun, and the longer version, which is the Data Warehousing chapter of his book, SQL for Web Nerds. While some of the article is dated (HP and Sybase are no longer the leading vendors they were 12 years ago), it is surprising how little has changed. Now there are even more choices of platforms, tools, and programming languages, but the data warehousing problem remains fundamentally the same: how do you build a system that allows non-techies to explore the data?
The goal is that a business expert can sit down at a Web browser, use a sequence of forms to specify a query, and get a result back in an amount of time that seems reasonable.
It turns out that the solution has not changed either, as data warehousing remains an important function for any business that want to make use of the data being collected by their transactional systems. Greenspun walks through a project that his company, ArsDigita, worked on for Levi Strauss. The implementation took two programmers three days to complete. While it is a simple example, it illustrates data warehousing fundamentals, and proves that complexity is not a prerequisite for results that are valuable to the business.
Ruby for BI and DW
Ruby is gaining ground as a programming language, but is far from the most popular choice, especially in the corporate consulting world. However, it has a loyal following for a reason: hackability. It is open source, supported by a dedicated developer community, and it requires little overhead to do powerful things. When resources are tight and time is constrained, a hackable solution has great appeal. It is also great for building prototypes, for prototypes that could end up as production code (think Visual Basic), or for applications that will require frequent changes. Many popular web applications (Basecamp, for example) are built with Ruby, which speaks to both the power and robustness of the language, the libraries and the Ruby on Rails framework.
I came across this blog post which suggests a three Ruby based tools for data warehousing and reporting:
- ActiveWarehouse - Based on the Ruby on Rails framework, ActiveWarehouse simplifies creating the data warehouse dimension and fact tables.
- ActiveWarehouse ETL - The ETL component of ActiveWarehouse that loads the data.
- Ruport - A collection of Ruby tools to facilitate building reports.
I don’t know how maintainable a Ruby DW/BI solution is, but it would be worthy of consideration if you need a cheap and hackable custom solution (and you like hacking Ruby, of course). I would certainly want to build the business rules in the database (probably a good practice anyways), with stored procedures and views, so that if another BI solution was introduced, either as a replacement or as a complement, you would not have to start again from scratch.
4 + 2 Leading Commercial Vendors of BI software
With the dust settling from the 2007 acquisitions, four vendors dominate the business intelligence software market. Information Week recently published a detailed article about the four bright lights of BI, who’s products comprise about half of all the business intelligence installations out there. These are the four, plus two others that are close to being on this list:
- IBM - IBM has a broad line of business intelligence and data warehousing products, sold by its Information Management software division. It’s recent acquisition of Cognos rounded out the offerings and bolstered IBM’s commitment to being a “neutral player”, being able to work with data from any source.
- Microsoft - Compared to the other vendors listed here, Microsoft has only recently entered the enterprise Business Intelligence market. However, it has been able to leverage it’s massive install base of SQL Server, Office and Sharepoint to gain traction and to pave the way for Performance Point Server. Microsoft has a more compelling offer for the SMB market, where the other large vendors have had limited success.
- Oracle - Like IBM, Oracle has an array of BI and DW products, mostly from acquisitions. Most notable is Hyperion, a brand that Oracle has maintained. Integration among the product sets is generally weak, many products overlap and there is no coherent technology roadmap.
- SAP - With the acquisition of Business Objects, SAP displaced its Netweaver brand and became a major BI vendor overnight. Business Objects has been allowed to operate independently so far, but tighter integration with SAP products is on its way.
Here are two others that are not considered leading BI vendors, but they are close:
- Microstrategy - While significantly smaller than the other companies listed here, Microstrategy is now the largest independent, pure-play BI software vendor.
- SAS - SAS is best known for analytic applications, but they do have a business intelligence solution. It is also known as one of the largest privately held companies in the United States.
Build vs. buy, CPM edition
Rick Sherman, of Athena IT Solutions, has published in his latest newsletter a discussion of buy versus build for Corporate Performance Management (CPM) or Business Performance Management (BPM) software. The lower total cost of ownership (TCO) for packaged software assumes that business and technical requirements are met, and this is a grand assumption with CPM for the following three reasons:
- The particular challenge with a CPM package is that the built-in metrics, metadata, business logic, etc. must be a good fit for the business. This requires delving into both the business processes and the technical details of the software implementation.
- CPM relies on BI and ETL tools to feed it data. Is the CPM package compatible with the existing BI tools? Or does the CPM package come with its own set of BI tools that must be adopted?
- If multiple CPM packages are required, is there are risk of creating CPM silos? These will be even harder to break down than data silos, especially if the packages are from different vendors.
More often than not with CPM, the buy vs. build question becomes buy vs. build vs. buy and customize, as the cost of retro-fitting business processes to fit the software is so high.
Joel’s Architecture Astronauts
One of my favorite software gurus, Joel Spolsky, has published a rant against the architecture astronauts, in particular Microsoft and Ray Ozzie. It was several years before when he first coined the term, as “smart thinkers” who “create these absurd, all-encompassing, high-level pictures of the universe”. His point this time is that far too much computer science talent is wasted by deep pocketed companies (Google and Microsoft), working on technology that is doomed to fall flat, because it fundamentally does nothing new, or worse, does nothing.
it’s a fun programming exercise that you’re doing because it’s just hard enough to be interesting but not so hard that you can’t figure it out.
I agree with Joel. The noise the architecture astronauts create is bad for the software industry. As buzzwords are invented, new names are given to old concepts, intangible software of dubious benefit is marketed and sold, the IT industry losses credibility. I suspect that all software vendors that reach a certain size are guilty of this, particularly publicly traded ones, with their eternal quest for explosive revenue growth. In the business intelligence niche, we sees this same phenomenon. Take this example, courtesy of Oracle:
Oracle offers the industry’s most comprehensive set of analytic and business intelligence applications with capabilities such as corporate performance management, interactive dashboarding, and embedded analytic functionality.
Perhaps less airy-fairy then Microsoft’s Hailstorm, but still pretty high level. With my company, I am trying to find the opportunity in all this noise, by sorting the wheat from the chafe for my clients and focusing on business value. I love exploring new technologies, but if today’s astro-architecture is valuable enough for business, there will be plenty of time to get on board tomorrow.
Do spreadmarts have their place?
Business Intelligence software vendors spend a great deal of effort preaching against the dangers of spreadmarts: valuable corporate data trapped inside Excel files all over the place. On the other hand, when the BI tool fails to meet a requirement, every vendor is happy to demonstrate how they can be a hybrid solution. Reports can be exported to Excel, reports can be opened directly from an Excel plug-in, and spreadsheets can be updated automatically with figures from the data warehouse.
This speaks to the maturity of BI software in general. There is a grudging acceptance of two things:
- not every forseealbe requirement can be met with a shrinkwrapped software solution
- people like Excel
As BI vendors move into the broader SMB market, the interoperability with Excel will be key. For many, Excel is the de facto standard for reporting and analysis.
Cognos Business Intelligence Technical Articles
I have started posting some technical articles on my company website. My focus is on Cognos 8 Business Intelligence, as I am currently interested in that type of work. Any feedback would be appreciated, either through the contact page or by leaving a comment. If you work with Cognos Business Intelligence, let me know if there is anything that you would like to see as I am open to suggestions for future topics.
Secure Business Intelligence Development
In order to build a secure business intelligence system, business intelligence developers need to be more security conscious as they go about creating data models, cubes, and reports. eWeek has an article titled 5 Steps to Secure Development, which outlines how to make security an integral part of the enterprise software development process. These lessons are equally applicable to Business Intelligence projects.
- Definition - Start thinking about security from the beginning of the project and build it into the project plan. Most BI vendors will have a security framework for preventing unintended access to data, but how well does it match up with existing business processes? Will the BI system will leverage the existing security infrastructure? Is there any custom coding required?
- Education - According to the article, there is lack of security training across the IT industry. Be sure that the team knows how to roll out secure applications, and how to establish appropriate responses to security breaches. Shutting everything down is effective, but such drastic actions will quickly undermine the confidence of end users.
- Equipment - An emphasis on security can risk slowing down a project, but having the right software tools can mitigate this risk. Look for analyzers and automated testing tools that have security testing features.
- Test, test, test - Testing must be expanded beyond functionality, performance and data validation. Security testing means studying potential failures to see they can be exploited. How a component or the systems fails is as important as preventing it from failing in the first place.
- Monitoring - As part of the roll-out, alerts and processes must be put in place to monitor for failures and suspicious activity. For example, being alerted to huge spikes in activity and abnormal amounts of data being downloaded by a single user or in a single location.
Most business intelligence vendors take security seriously, with published guidelines for implementing security and details about how their software handles various threats. Here are two examples from Cognos and Microsoft. However, despite these convincing assurances, the responsibility for a secure system ultimately lies with the project team.
Searching for the next big thing in Business Intelligence
Like most people in the IT industry, I am always looking for the next big thing in my niche. As my company does BI consulting with IBM Cognos 8, a trend I am keenly following is the evolution of the next BI platform. Clearly, SOA, SaaS and cloud computing will have an impact, and the details are slowly becoming evident. Shawn Rogers, blogging on the b-eye-network, recently wrote about three recent developments at Google that provide some insight into the future of business intelligence:
- Panorama Software announces a BI solution that integrates with Google Docs.
- SalesForce, the poster child for hosted enterprise software, announces the integration of SalesForce with Google Apps.
- Google’s Big Table, the database component of the Google App Engine, is available for storing data “in the cloud”, opening the door for application developers to leverage Google’s massive infrastructure.
While it is debatable if Google can (or even wants to) become a player in the Enterprise BI market, they are setting trends in software development and providing clues as to what future of BI software suites will look like.








