What Is OLAP?
Online analytical processing (OLAP) databases allow you to create a data warehouse that is built as a summary of data from another source (SQL Server, ODBC database, OLE DB database, etc.) An OLAP database is built by creating groupings, called dimensions, that are based on related fields, called levels, within your data. These dimensions are organized in a tree structure to allow you to drill down from one level in the dimension to another. The example in Figure 1 shows what a dimension based on geographic region, state, ZIP code, and customer name would look like. In this example, regions are made up of states, which are broken down into ZIP codes. The ZIP codes contain customer names.
Region | State | ZIP Code | Customer |
Northeast | | | |
| New York | | |
| | 10001 | |
| | | Widget World |
| | | Widget Warehouse |
| | 12205 | |
| | | Widgets R Us |
| Pennsylvania | | |
| | 18062 | |
| | | The Widget Man |
| | | Widget Mania |
Figure 1: This is an example of data in an OLAP database.
The OLAP database will perform a defined aggregation on specified value fields. These value fields are referred to as measures. The measures are available at any level within the dimension. This means that the measure at any level in the dimension is a summary of the levels directly below it. Within an OLAP database, you can define multiple measures and multiple dimensions. Your data can be viewed using more than one dimension at a time. Because of the three-dimensional view of your data that this gives you, this collection of dimensions is referred to as a cube. If you've ever worked with a PivotTable in Microsoft Excel, you've basically used a sized-down version of an OLAP database. In fact, you can easily link data from an OLAP database into an Excel PivotTable.
Why Use an OLAP Database?
You may be asking this exact question. OLAP databases give you fast access to data aggregations within any of your defined dimensions, at any of the defined levels. The key word there, however, is "defined." If you require access to information that is not part of a currently available dimension, you will need to build a new dimension. This can be time-consuming, depending on the amount of data and the speed of your OLAP server. For this reason, you need to carefully examine your business intelligence (BI) strategy prior to deciding to use an OLAP database.
One of the primary goals of this type of system should be to make data easily available to your users. The power of an OLAP database can be overwhelming, and it's easy to go overboard and try to move all of your data into the OLAP database. The phrase "if it ain't broken, don't fix it" comes to mind here; if your BI goals don't require access to the dynamic views of data that OLAP offers, then it probably doesn't make sense to add the overhead required to maintain an OLAP database.
You also need to carefully consider the design of your database. One person in your organization might want to see the data in one hierarchy, while another person has an entirely different concept of what the hierarchy of the same data should be. It's also important to note that once your database is initially built, you still need to perform periodic updates to refresh the data. If your data is time-sensitive, this may be another reason not to use an OLAP database.
Having said all that, the reasons you should use an OLAP database are numerous. With an OLAP database, you can create dynamic views of otherwise "flat" data and allow decision-makers to see new, important data in a new way. Instead of looking at four reports to get the data they need, they can now start out by looking at total level information and drill down as far as they need to go to the answers they are looking for. For example, sales analysts looking at a "Product Sales" dimension might be viewing total product line sales and notice that sales are off 5% for last month. They drill down to see that the lost sales are in one specific product category. They drill down still further to find out that the lost sales are coming from a specific item. Now, they bring the "Customer Sales" dimension into the fold to find out that the sales decrease relates to a specific region. After drilling down further into this dimension, they are able to trace the drop in sales to specific customers. This is how an OLAP database can really help improve your BI.
Installing Analysis Services
As I mentioned, to use SQL Server 2000 Analysis Services, you must have SQL Server 2000. Analysis Services must be installed separately from SQL Server 2000. To set up this product, look for the MSOLAP folder on the SQL Server 2000 CD. Run Setup.exe and go through the setup wizard. Once setup is complete, you'll be ready to go.
A sample OLAP database called Food Mart is included with Analysis Services. To open the Analysis Services Manager, select Microsoft SQL Server from the Programs menu, then select Analysis Services, and finally Analysis Services Manager. The Microsoft Management Console (MMC) display shown in Figure 2 will be displayed.
Figure 2: This management console is used to create and maintain OLAP databases.
You'll use this console to create and modify OLAP database components. Expand the branches of the Food Mart database down to the Sales cube. Now, right-click on the Sales cube and select Browse Data. The window shown in Figure 3 will be displayed.
Figure 3: You can browse an OLAP database dynamically from this screen.
You can switch the dimension displayed by dragging one of the dimensions displayed at the top of the screen down to the workspace at the bottom of the screen. Double-click on a level within the dimension to expand it and display its "children."
Now that you know what an OLAP database is, I'll show you how to create an OLAP cube with your iSeries data.
OLAP to iSeries via ODBC
I'll start by showing you how to create an OLAP database that connects directly to the iSeries through an ODBC connection. For this example, you'll use the SYSTABLES file in QSYS2 library, simply because it exists on every iSeries. This file contains information about all of the files on your system.
One problem with using this file through an ODBC connection is that, since it resides in a library within the system portion of the library list, it's not accessible through the library list on an ODBC connection. You can get around this problem by creating your own version of this file, which is actually an SQL view of the files QADBXREF in QSYS. Use the source in Figure 4 to create the logical file SYSTABLES in QGPL. Now, create a new ODBC connection to your iSeries that has QGPL in its library list.
|
Figure 4: Information on files from your iSeries will be located in this logical file.
Once you have prepared your data, you can start creating your OLAP database. Open the Analysis Manager and expand the tree so that your server name is visible. Right-click on the server name and select "New Database" from the context menu. Name your new database Sample OLAP DB. Then, expand the tree under your new database and right-click on the Cubes folder and select Cube Wizard under New Cube menu. The cube wizard will be displayed. Since you don't have a data source defined for your OLAP database, you will be prompted to create one. From the initial screen, you will need to select the OLE DB provider for ODBC data sources. On the next screen, select the ODBC data source you created earlier from the drop-down box provided. It's important that you enter a valid user name and password for your iSeries on this screen and select the checkbox labeled Allow Password Saving. This enables the Analysis Manager to access the data source without user interaction later on.
After entering all of the required values, click the Test Connection button to verify that the ODBC data source can be accessed. When you are returned to the cube wizard, expand the new data source and locate the SYSTABLES file created earlier and highlight it. This file will be used as your facts table; that is, the table from which your measures will be obtained. Click Next to display the screen shown in Figure 5, which is used to define the measures for your cube. Remember that the measures define what values will be displayed within the cube.
Figure 5: Any numeric fields from your facts table can be used as a measure.
Double-click on the field named DBXNFL, which contains the number of fields in a specified file and click Next. The next screen displayed is used to add dimensions to your cube. Within an Analysis Services OLAP database, a dimension can be defined as either private, meaning that it is only available to the cube in which it's defined, or shared, which means that the dimension is available to all cubes within the database. Click the New Dimension button to begin defining your first dimension. You will be prompted for the type of dimension being defined. The table in Figure 6 explains each type of dimension.
Dimension Type | Description |
Star Schema | This type of dimension gets its data from a single table that is associated to the fact table. |
Snowflake Schema | As its name suggests, this dimension type contains data from multiple related tables. |
Parent-Child | A Parent-Child dimension uses two fields from a single table to create a parent-child structure. |
Virtual Dimension | A virtual dimension uses data from another dimension's member properties. Each member properties creates one level within the dimension. |
Mining Model | A data mining model and predictable data columns are used to create this type of dimension. |
Figure 6: There are five different types of dimensions that can be defined.
For this example, select a Star Schema dimension type and then click Next.
Now, you need to define the table that contains the data to be contained in your dimension. In this case, that table is going to be the same table used for your facts table, SYSTABLES. Locate that file and click Next. If the dimension you are defining had been based on date and/or time fields, you would have been able to specify this on the screen displayed. It's not, so you'll accept the default of Standard Dimension and click Next to display the list of the fields within the selected table, as shown in Figure 7. When you reach this step in the Dimension Wizard, you must select the fields in the order they are to appear within the hierarchy. Select DBXLIB, which contains the file library, and then DBXFIL, which contains the file name.
Figure 7: This screen defines the fields that make up your level dimensions.
Please note that, as you define the levels for your dimensions, Analysis Services will count the number of different entries within that level. In some cases, this may take quite a while. If a level contains fewer entries than the level directly above it, Analysis Services will display a warning.
After selecting the two fields, click Next. The next screen displayed will allow you to specify the key field for each level within the dimension. In most cases, this will be the same field used to define the level. The next screen is used to specify advanced options for your dimension (which I won't get into right now). After clicking Next, all that's left to do is name your dimension. For this example use the name "File Dimension." Then, by using the check box at the bottom of the window, define whether or not you want this dimension to be a shared dimension. When the cube wizard is redisplayed, click Next. The screen shown in Figure 8 will be displayed.
Figure 8: This screen is used to name our new dimension.
This screen shows your cube structure and is used to name the new cube. Use LibraryFile as the new cube name and click Finish. The screen shown in Figure 9 will be displayed. Use this screen to modify your new cube.
Figure 9: Use this screen to modify your cube structure.
You can use the data tab at the bottom of the screen to browse your data after you process the cube. To do that, click on the Tools menu and select Process Cube. This option is what actually builds the summaries defined within your cube. The first time you process a new cube or anytime you modify a cube's structure, you must define the storage method to be used for the data in your cube. The next screen (not shown here) allows you to choose from three storage methods: MOLAP, ROLAP, or HOLAP.
MOLAP stores a copy of the data and the aggregations within a multidimensional database. This method offers the best performance but uses the most disk space. ROLAP leaves the data in the source database and also stores aggregations within the same database. While this option uses the least amount of disk space, it also offers the worst performance of the three. Finally, HOLAP offers the happy medium because the data is left in the existing database, while the aggregations are stored within a multidimensional database. This method performs well but doesn't use the disk space that MOLAP will use. Which method you choose greatly depends on your circumstances. If disk space is an issue, choose ROLAP or HOLAP. If database performance is your primary concern, MOLAP is the obvious choice. Make your selection and click Next.
The next window displayed, as shown in Figure 10, allows you to specify how many aggregation levels should be built at this time.
Figure 10: The number of aggregations to be built can be specified here.
The number of aggregations you initially build can improve performance at the cost of disk space. You can specify a maximum amount of disk space to use, indicate a performance threshold to reach, or allow the application to continue building aggregations until you tell it to stop. The general idea is that the more aggregations you build up front, the faster you'll be able to retrieve the data from the OLAP cube later. The downside is that the more aggregations built, the more disk space used. Which method you choose will greatly depend on your OLAP server itself. If disk space is a critical resource on the machine, you would want to use the first method to limit the amount of space used. If overall performance is your primary concern, you would want to build aggregations based on a performance gain threshold.
For this example, click the middle button and specify a percent increase of 50%. Click Next on this screen and the screen after to begin processing the cube. Depending on your system, this may take a while. When processing is complete, click Close on the process window. Then, right-click on your new cube and select Browse Data to see the data in your new cube.
The one drawback to using data from the iSeries via ODBC is that the performance of your OLAP database won't be quite as fast as it would be if you were using an SQL Server database located on the same server you're running Analysis Services on. This is because of the overhead used on both sides by the ODBC connection. In Part II of this article, I'll explain how to re-host the data from the iSeries into an SQL Server database to get maximum performance out of your OLAP database. I'll also cover how to access your OLAP database from other applications, including Microsoft Excel, and even how to create an Active Server Page (ASP) that reads and displays data from your OLAP database.
Now that you understand what OLAP is and how to create an OLAP database with your iSeries data, run with it. You'll find that Analysis Services can help you get a handle on all of that "out of control" data.
Mike Faust is the MIS Manager for The Lehigh Group in Macungie, PA. Mike has nearly 15 years experience with midrange systems and personal computers. Be sure to check out Mike's new book The iSeries & AS/400 Programmer's Guide to Cool Things available now from MC Press. You can contact Mike via email at
References
Data Mining with Microsoft® SQL Server 2000 Technical Reference (Microsoft Press)
Microsoft® SQL Server 2000 Reference Library (Microsoft Press)
10 Steps to BI Success http://www-1.ibm.com/servers/eserver/iseries/bi/db2olap.htm
LATEST COMMENTS
MC Press Online