Give users offline access to DB2 data in a versatile pivot table format.
In "Crunching AS/400 Data with OLAP Cubes and Excel 2000," I outlined how to use the Pivot Table wizard with Excel 2000 (and higher) to create pivot tables from DB2 data using an ODBC data source. This TechTip will go a step further.
Pivot tables are extremely useful because they allow a user to aggregate data in innumerable ways. For instance, sales dollars and quantity can be summarized by calendar quarter, by customer, by item classification, etc. Pivot tables, therefore, relieve IT from writing the same basic report over and over. (How many times have developers written the same basic report, just summarized a little differently?) Instead, pivot tables allow IT to collect the basic data and leave it to the user to summarize and format it as desired.
In this TechTip, I'll demonstrate how a script process can create a local OLAP cube file from DB2 data without using the Excel interface. (An OLAP cube file is a specialized file that stores aggregations in an optimized format.) Once a cube file is created, it can be distributed via a network share, emailed, or posted on a Web site for users to access and review using pivot tables within Excel. Since this is a script process, the cube data build can be scheduled to refresh at regular intervals. Furthermore, creating a local cube file alleviates the need for the user to have DB2 connectivity software and to go through the complicated Create Cube wizard.
Requirements
I'll demonstrate the technique to build a cube file using VBScript, although you could use VBA, .NET, or any other development environment that can take advantage of the ActiveX Data Objects library. The ODBC component from the System i Access CD (formerly iSeries Access) is required to be installed on the computer that will create the cube file. Alternatively, a third-party ODBC driver can be substituted. Keep in mind that almost any database server in your organization can be used as a data source for an offline cube.
You computer should also have the latest Microsoft Data Access Components (MDAC) library installed, along with a version of Excel (2000 or higher.) Most computers already have a recent MDAC version installed.
The Data
Since IBM hasn't given DB2 for i users a large test data set to tinker with (hint to IBM: the CORPDATA sample data is good, but we need more data!), I decided to re-use the data from the original article, an export of Microsoft's Northwind Traders Access database to DB2. The Northwind database stores customer order data. Figure 1 below shows the tables and relationships in the Northwind schema.
Figure 1: These are the tables and relationships in the Northwind schema example. (Click images to enlarge.)
A Quick Review
How can we make this relational data useful to management? One way is to create a cube file that will "pre-aggregate" the sales data, such as quantity and extended price (aka measures) by a variety of summarization columns (aka dimensions) such as item number, item description, item category, and customer name. Once the data is pre-aggregated in a local cube file, we can allow users to quickly review the aggregations in Excel using pivot tables (more about that in a minute).
But first, remember that when defining a cube we need to define what is being aggregated (sometimes called measures, facts, or values) and how to summarize these aggregates (called dimensions or labels). The table below shows a list of measures and dimensions for the sample cube.
Sample Cube Definition |
||
Dimensions |
Measures / Aggregate |
|
Product Category |
Quantity |
Sum |
Country |
Extended Price |
Sum |
Customer |
Highest Price |
Max |
Order Date |
|
|
Product |
|
|
Supplier |
|
When this cube is built, it will contain summarizations of the aggregates by all combinations of dimensions. For example, a user could ask for total quantity and sales by supplier, by country within product, etc., and the cube will give an answer. This example is simple, but other meaningful measures, such as gross profit (extended price minus cost), could be added to the cube as a measure.
If you find yourself lost at this point, it is probably a good idea to review the prior article. Here is a warning from the previous article when designing a cube: local cube files are limited in scalability, so choose your dimension combinations wisely. Each additional dimension exponentially bumps up the number of aggregations done and increases the cube file size, processing time, and wait time accordingly.
A Brief Review of Pivot Tables
What is a pivot table? A pivot table is a tool in Excel that allows summarization, sorting, and analysis of data from a data source. Possible data sources to feed a pivot table are a range of cells in a spreadsheet or, you guessed it, an OLAP cube file.
The best way to talk about pivot table is to show one. Figure 2 shows an Excel 2007 spreadsheet containing a pivot table based on our sample cube.
Figure 2: This Excel spreadsheet contains a pivot table based on our sample cube.
As you can see, the Extended Price column (i.e., sales data) is summarized by date (columns) and product category (rows). The Country box at the top left is called a report filter. In this case, the report filter can be used to drill down to one or more countries.
The beauty is that, by a simple drag-and-drop operation, the data can be summarized in another way. Instead of summarizing rows by product category, we can summarize by supplier or customer or customer within product. If we wanted, we could put the order date (year, quarter, etc.) as a report filter in place of country. The possibilities are endless, and users can do this themselves so that IT doesn't have to write yet another variation on an existing report.
For those who need pivot table help, there are plenty of tutorials in the Microsoft help and on the Web.
Building a Local Cube Offline
In the prior article, we used the "Microsoft Query add-in for Excel" to build the cube data within Excel. Now however, we want to write a program to build a cube. In order to do this, we'll need the following:
•· A connection to the cube's data source (in this case, ODBC will get data from DB2)
•· A cube definition that defines the cube's measures and dimensions
•· A DB2 SQL statement that can gather all of the required cube information into a single result set
With these things in mind, look at the end of this TechTip for the sample VBScript code (CreateSalesCube.vbs) required to build a cube file.
A local cube file is created using a specialized OLE DB provider from Microsoft called MSOLAP. As with any OLE DB provider, to use MSOLAP, you need to do the following:
•· Instantiate an ActiveX Data Objects (ADO) connection object. (ADO is a standard Microsoft technology that allows a data connection to be made to a variety of data sources, usually database servers but also Exchange, Active Directory, and spreadsheets. An OLE DB provider is a vendor-specific plug-in to the ADO library.)
•· Set a connection string to a data source.
•· Open the connection.
Once the connection is opened, a program performs operations against a data source and then closes the connection.
The great thing about using MSOLAP is that it will build a local cube file from a remote data source using instructions specified all within the connection string. Once the connection is opened, the local cube will be built with no subsequent instructions. Therefore, the VBScript program does the following four steps:
•· Create a connection object.
•· Set the connection string to create a cube from a remote data source.
•· Open the connection (which executes the connection string instructions).
•· Close the connection.
Now that's easy enough to understand, but the connection string itself needs explanation. Not your typical connection string, the connection string for MSOLAP is divided into the following sections:
•· LOCATION identifies where the local cube file will be created. It can be a drive path or a UNC path.
•· SOURCE_DSN is the ODBC data source name to use when executing the underlying data source query (in this case, it will be an ODBC DSN called AS400).
•· CREATECUBE is the statement that defines the structure of the cube, including the measures and dimensions. A tutorial on this topic would be extensive, but I will give a hint on an easy way to do this! After conceptually designing my local cube and building a supporting SQL statement, I use the Microsoft Query cube wizard to build this statement for me! (This method is outlined in the previous article.) The file name for an OLAP query created by MS Query is .OQY. Once you've created an .OQY file, simply open it with Notepad and extract the CREATECUBE section.
•· INSERTINTO is the statement that populates the cube from the supplied SQL statement. The DB2 SQL joins various tables to get all the relevant data for the cube. As a side note, if your dimension columns have the possibility of containing a NULL, it is a good idea to modify your SQL to use a function such as COALESCE to replace NULLs with a value such as *Undefined*. Also notice here that the number and sequence of columns in the SELECT statement should correspond with the cube's columns.
With a mammoth connection string like that, it's no wonder all the work is done when the connection is opened!
What to Do with the Local Cube File
Once a cube file is created, there are a number of options for getting the data to the users. Here are a few ideas:
1. Cube files can be opened directly by Excel using the File-Open dialog. Unfortunately, the Excel installation process doesn't automatically associate the cube (.CUB) extension with the Excel application. However, this can be easily remedied by associating Excel with a .CUB file extension using Windows Explorer. Also, if Excel squawks about a data source being potentially unsafe, just tell Excel to "get over it" and move on. When opening a cube file directly, Excel will present an empty pivot table.
2. Cube files can be placed on a network drive for access by Excel users. If my memory serves me, though, with Excel 2000 only one user at a time can access a cube file.
3. The cube creation script can be enhanced to zip and distribute cube files to users via email. This is a good solution for road warrior sales personnel or busy execs who can't necessarily connect directly to the DB2 server but may want to analyze current numbers.
4. If the network configuration allows, copy the cube file right to a user's "My Documents" folder for easy access.
5. Create a pre-built spreadsheet with pivot tables and pivot charts that reference the cube data on a network share. This option alleviates the need to have users start the pivot table process every time they open a cube file. In fact, a workbook can be created that has several predefined pivot table views that can serve as a starting point for users.
6. It is a good idea to schedule the cube to be rebuilt automatically for a sensible period of time (depending on the application) such as daily, weekly, or monthly.
Number Crunching the Fun Way
The combination of OLAP cubes and pivot tables is compelling because it allows users to create their own report variations with existing tools. This allows developers to do more useful things than writing a million variations of a single report. The cube file also allows quite a bit of data to be analyzed without having to purchase an OLAP product.
OLAP cubes allow DB2 users to crunch data offline in a flexible and easy-to-use format. With a little training, the bean counters and sales reps in your company will love them!
Sample VBScript (CreateSalesCube.vbs) Code Required to Build a Cube File
'
' CreateSalesCube.VBS
'
'
'
' This script is intended to build
' a local OLAP cube file (.CUB) from
' "DB2 for i" data on a regular basis.
'
'
' Michael Sansoterra 09/20/2008
'
'
'
'
' Build Connection String
' (4 Components required for MSOLAP provider)
'
' 1. Location of Cube
'
Dim ConnLocation
ConnLocation="LOCATION=C:NorthwindExample.cub;"
'
' 2. ODBC Data Source Name
' (Watch for those embedded double quotes!)
'
' Place your own DSN name, credentials and other
' connection string keywords here.
'
'
' This example assumes AS400 uses the iSeries Access
' (aka System i Access) ODBC Driver
'
Dim ConnDSN
ConnDSN="SOURCE_DSN=""DSN=AS400;UID=USERID;PWD=PASSWORD"";"
'
' 3. Create Cube Statement (Define Dimensions & Measures)
'
Dim ConnCreateCube
ConnCreateCube = _
"CREATECUBE=CREATE CUBE [OrderCube] (" & _
"DIMENSION [Customer]," & _
" LEVEL [All] TYPE ALL," & _
" LEVEL [CompanyName]," & _
"DIMENSION [Country]," & _
" LEVEL [All] TYPE ALL," & _
" LEVEL [Country]," & _
" LEVEL [Region], " & _
"DIMENSION [Product]," & _
" LEVEL [All] TYPE ALL," & _
" LEVEL [ProductName]," & _
"DIMENSION [Supplier]," & _
" LEVEL [All] TYPE ALL," & _
" LEVEL [CompanyName1]," & _
"DIMENSION [Category]," & _
" LEVEL [All] TYPE ALL," & _
" LEVEL [CategoryName]," & _
"DIMENSION [OrderDate] TYPE TIME," & _
" LEVEL [All] TYPE ALL," & _
" LEVEL [Year] TYPE YEAR," & _
" LEVEL [Quarter] TYPE QUARTER," & _
" LEVEL [Month] TYPE MONTH," & _
" MEASURE [Quantity] FUNCTION SUM," & _
" MEASURE [Ext Price] FUNCTION SUM," & _
" MEASURE [Highest Price] FUNCTION MAX);"
'
' 4. Insert Data into Cube
'
Dim ConnInsertCube
ConnInsertCube="INSERTINTO=INSERT INTO OrderCube(" & _
"[Quantity], [Ext Price], [Highest Price], [CompanyName], " & _
"[Country].[Country], [Region], [ProductName], " & _
"[CompanyName1], [CategoryName], [OrderDate]) " & _
"OPTIONS ATTEMPT_ANALYSIS " & _
"SELECT OD.Quantity, " & _
" OD.UnitPrice*OD.Quantity ExtPrice1," & _
" OD.UnitPrice*OD.Quantity ExtPrice2," & _
" CS.CompanyName, CS.Country," & _
" CS.Region, PD.ProductName," & _
" SP.CompanyName," & _
" CT.CategoryName, OH.OrderDate" & _
" FROM Northwind.OrderDetails OD " & _
"INNER JOIN Northwind.Orders OH ON OH.OrderID=OD.OrderID " & _
"INNER JOIN Northwind.Products PD ON PD.ProductID=OD.ProductID " & _
"INNER JOIN Northwind.Suppliers SP ON SP.SupplierID=PD.SupplierID " & _
"INNER JOIN Northwind.Customers CS ON CS.CustomerID=OH.CustomerID " & _
"INNER JOIN Northwind.Categories CT ON CT.CategoryID=PD.CategoryID "
'
' Define ADO Connection Object
' Set the connection provider to MSOLAP
'
Dim Connection
Set Connection = CreateObject("ADODB.Connection")
Connection.Provider = "msolap"
'
' Combine 4 components of the MSOLAP connection string
' into one
'
Connection.ConnectionString = _
ConnLocation & _
ConnDSN & _
ConnCreateCube & _
ConnInsertCube
'
' Open the connection (this will create the cube and
' populate the cube from the remote data source. The
' provider works to pre-aggregate the cube's measures by
' the various combinations of the cube's dimensions)
'
Connection.Open
'
' Close The Connection
'
Connection.Close
Set Connection=Nothing
'
' Signal user the script is done
'
MsgBox "Cube has been created!"
LATEST COMMENTS
MC Press Online