Five Security Questions to ask about your BI Project
Security is often an afterthought on BI projects. The conventional thinking, as encouraged by BI vendors, is that the BI implementation will leverage the existing security infrastructure, be it LDAP, Active Directory Server, or otherwise. While that might be the case, thinking about security cannot end there. The following five questions need to asked, and answered, early on in any BI project:
- How will penetration testing be done?
- How are user IDs and passwords managed? (Single sign-on is the preferred answer.)
- What is the encryption policy? 128-bit SSL encryption is typical, but sensitive data might require more.
- Are the servers shared with other applications? Are the servers virtual servers? If so, what is in place to prevent data leaks, malicious or accidental?
- What is the backup and disaster recovery plan?
As BI moves outside the firewall, onto mobile devices, and perhaps running on servers hosted by a 3rd party, the questions around security need to be asked early, and if necessary, often.
Nine Steps for an Effective Change Managment Process
Once a business intelligence system goes “live” and into production, a change management process is an essential tool for keeping production servers humming, the jobs running and the business users happy. A balance must be struck between being strict enough to maintain control and accountability, but agile enough that changes can be made quickly for the business. Here are the nine steps for an effective change management process:
- Change Request
- Change Approval
- Planning - includes the evaluation of any risks and a back-out plan
- Testing - simulation in a lab or on a non-production system
- Scheduling - can the change fit into the regular maintenance window?
- Communication - all affected groups need to be notified
- Implementation - actually making the change
- Documentation
- Follow Up - closing the loop with all involved parties
As business intelligence and data warehouse applications move into more mission-critical, operational roles in the enterprise, change management is an essential consideration.
Windows Power Users: Unleash your Inner Unix!
Microsoft has released Windows PowerShell as part of Windows 2008 Server, and it is also available here for 2003 Server and Vista and XP. Microsoft has always had scripting languages for its various operating systems: DOS, Windows Scripting Host (basically, Visual Basic lite), and a myriad of 3rd party languages. So what is different this time? The depth and breadth of the Windows PowerShell brings it close to the functionality of Unix shell scripting. Here are a few highlights:
- The syntax and keywords are similar to C#
- The commands can be strung or piped together, i.e. the output of one command becomes the input for another.
- Windows servers can be administered remotely, including the starting and shutting down of services, retrieving errors from log files, etc.
- Many resources and sample scripts are available, and a blog is being written by the PowerShell team.
Windows PowerShell is built in to the operating system, and it can automate many of the endless tasks necessary to keep a server farm running. If you do any Windows server administration work at all, time spent learning PowerShell would be time well spent.
Free Database Roundup
- IBM DB2 Express-C - Find a review of it here. I am currently running it on my laptop, primarily for use with IBM Cognos 8 Business Intelligence. I have posted an install guide, and a guide to setting up the Cognos 8 samples on it.
- Oracle 10g Express - A starter Oracle database. Resources and comparisons with the other free databases can be found here on TechRepublic.
- Sybase ASE Express - The free version of Sybase ASE is only available for Linux. The developer edition is available for other platforms, but it has much more restrictive terms of use.
- SQL Server 2005 Express - A light version of SQL Server for desktops, small servers and for embedding into applications. Lots of good resources here on the Microsoft site.
Open source databases
- Apache Derby - A small Java relational database which can be embedded or used in client - server mode. IBM Cognos 8 BI Server ships with Apache Derby for use as the default content store, which shortens the time to getting a demo or dev server up and running.
- Berkeley DB - Now distributed Oracle after their acquisition of SleepyCat. Berkeley DB is an embedded database with three separate editions: the original relational database, written in C, a Java
version and an XML edition. - MySQL - The “M” in LAMP. This database is behind many web applications, including WordPress and Wikipedia. MySQL is now owned by Sun Microsystems.
- PostgresSQL - A mature and well supported open source relational database.
- SQLite - As the name implies, a featherweight and self-contained database. It is designed for embedding into applications, in particular applications on mobile devices.
Other free database resources:
- Azzurrie Clay - a data modeling plugin for Eclipse.
- Freebyte’s guide to free databases - a more comprehensive list of open source databases and free database programming libraries.
- Getting started with the NetBeans Database Explorer.
- Free data models to help get started on database schema development.
- QuantumDB - an open source Eclipse plugin for database development.
- SQL Explorer - another Eclipse plugin for database development.
Tool #1 in the BI developer’s toolbox
If you are an IT professional working with business intelligence software, or any software for that matter, than you accumulate a collection a of software tools that help you get the job done. Tool #1 is the text editor. Text is software’s raw material, used when coding, scripting, configuring and documenting. The default text editor for Windows is Notepad, and it is what most people end up using for everything. However, there are many better options. The importance of a text editor in the IT industry is evidenced by the massive collection of open source editors available on SourceForge and Freshmeat.
The Pragmatic Programmer offers some guidelines for making an informed choice of text editor:
- Configurable - the look and feel must be comfortable, and keyboard shortcuts must be available
- Extensible - integrates with other tools (like Cygwin or Perl), has a plug-in architecture
- Programmable - ability to automate multi-step tasks that have to be repeated ad infinitum
- Syntax highlighting
- Auto-completion
- Auto-indentation
- Templates
- Access to help systems from within the editor
- IDE features - compile, debug, preview
The best advice the authors offer is simply this: “Use a Single Editor Well” (Tip 22)
My current favorite editors for Windows are Metapad (which replaces Notepad) and Notepad++. For Unix/Linux, I like VIM. I rarely work on a Mac, but if I did more often, I would surely use TextMate.
10 steps to ETL
Whether you buy or build your ETL tool, ETL development can be broken down into 10 steps:
- Create a high-level design for the target model.
- Choose, install, configure and learn the ETL tool of choice.
- Determine your default strategies.
- Drill down by target table.
- Develop the historical load for dimensions.
- Populate historical fact tables.
- Develop the dimension table incremental processing.
- Develop the fact table incremental processing
- Implement aggregate and OLAP loads.
- Create the plan for ETL operation (ie, production support), automation and future enhancements.
Thank you for the great post Tod McKenna.
Data Warehousing simplified
Two of my favorite articles about data warehousing are Data Warehousing for Cavemen by Philip Greenspun, and the longer version, which is the Data Warehousing chapter of his book, SQL for Web Nerds. While some of the article is dated (HP and Sybase are no longer the leading vendors they were 12 years ago), it is surprising how little has changed. Now there are even more choices of platforms, tools, and programming languages, but the data warehousing problem remains fundamentally the same: how do you build a system that allows non-techies to explore the data?
The goal is that a business expert can sit down at a Web browser, use a sequence of forms to specify a query, and get a result back in an amount of time that seems reasonable.
It turns out that the solution has not changed either, as data warehousing remains an important function for any business that want to make use of the data being collected by their transactional systems. Greenspun walks through a project that his company, ArsDigita, worked on for Levi Strauss. The implementation took two programmers three days to complete. While it is a simple example, it illustrates data warehousing fundamentals, and proves that complexity is not a prerequisite for results that are valuable to the business.
Rules for IT Consulting
While browsing around on IT Toolbox, I came across this post by Josh Berkus about database contracting. This was posted almost a year ago, but his “rules” are timeless, and they can equally apply to any type of IT consulting, be it BI, CRM, web application development, or whatever. Anyone in the business of hiring IT consultants would also find this useful, if for no other reason than to see the work from a different perspective. Here are some highlights and common threads:
- Anything deemed temporary is more permanent than you think. (rule 3)
- Everything takes more time and or costs more money. (rules 2, 6, and 7)
- Documentation is worth the trouble. (rules 8, 12, and 14)
Teradata Express Edition
I don’t know for how long this has been available, but it is worth pointing out now: a free developer edition of Teradata, known as Teradata Express Edition. Teradata is the most scalable database on the market, making it a popular choice for massive data warehouses. The express edition is limited to 4GB of data and it is not for production use. However, I am a strong proponent of these express editions. They are a great way for IT professionals to have an in depth look at a technology without committing anything to the vendor’s sales team. For the vendor, having an express edition is great marketing, attracting companies, who they probably would not have reached anyways, to have a look at the product in their own environment and on their own time.
Ruby for BI and DW
Ruby is gaining ground as a programming language, but is far from the most popular choice, especially in the corporate consulting world. However, it has a loyal following for a reason: hackability. It is open source, supported by a dedicated developer community, and it requires little overhead to do powerful things. When resources are tight and time is constrained, a hackable solution has great appeal. It is also great for building prototypes, for prototypes that could end up as production code (think Visual Basic), or for applications that will require frequent changes. Many popular web applications (Basecamp, for example) are built with Ruby, which speaks to both the power and robustness of the language, the libraries and the Ruby on Rails framework.
I came across this blog post which suggests a three Ruby based tools for data warehousing and reporting:
- ActiveWarehouse - Based on the Ruby on Rails framework, ActiveWarehouse simplifies creating the data warehouse dimension and fact tables.
- ActiveWarehouse ETL - The ETL component of ActiveWarehouse that loads the data.
- Ruport - A collection of Ruby tools to facilitate building reports.
I don’t know how maintainable a Ruby DW/BI solution is, but it would be worthy of consideration if you need a cheap and hackable custom solution (and you like hacking Ruby, of course). I would certainly want to build the business rules in the database (probably a good practice anyways), with stored procedures and views, so that if another BI solution was introduced, either as a replacement or as a complement, you would not have to start again from scratch.








