10 steps to ETL
Whether you buy or build your ETL tool, ETL development can be broken down into 10 steps:
- Create a high-level design for the target model.
- Choose, install, configure and learn the ETL tool of choice.
- Determine your default strategies.
- Drill down by target table.
- Develop the historical load for dimensions.
- Populate historical fact tables.
- Develop the dimension table incremental processing.
- Develop the fact table incremental processing
- Implement aggregate and OLAP loads.
- Create the plan for ETL operation (ie, production support), automation and future enhancements.
Thank you for the great post Tod McKenna.
Data Warehousing simplified
Two of my favorite articles about data warehousing are Data Warehousing for Cavemen by Philip Greenspun, and the longer version, which is the Data Warehousing chapter of his book, SQL for Web Nerds. While some of the article is dated (HP and Sybase are no longer the leading vendors they were 12 years ago), it is surprising how little has changed. Now there are even more choices of platforms, tools, and programming languages, but the data warehousing problem remains fundamentally the same: how do you build a system that allows non-techies to explore the data?
The goal is that a business expert can sit down at a Web browser, use a sequence of forms to specify a query, and get a result back in an amount of time that seems reasonable.
It turns out that the solution has not changed either, as data warehousing remains an important function for any business that want to make use of the data being collected by their transactional systems. Greenspun walks through a project that his company, ArsDigita, worked on for Levi Strauss. The implementation took two programmers three days to complete. While it is a simple example, it illustrates data warehousing fundamentals, and proves that complexity is not a prerequisite for results that are valuable to the business.
Teradata Express Edition
I don’t know for how long this has been available, but it is worth pointing out now: a free developer edition of Teradata, known as Teradata Express Edition. Teradata is the most scalable database on the market, making it a popular choice for massive data warehouses. The express edition is limited to 4GB of data and it is not for production use. However, I am a strong proponent of these express editions. They are a great way for IT professionals to have an in depth look at a technology without committing anything to the vendor’s sales team. For the vendor, having an express edition is great marketing, attracting companies, who they probably would not have reached anyways, to have a look at the product in their own environment and on their own time.








