Although common table expression (CTE) support has been available since V4R4, CTEs are still an often-overlooked SQL feature that's useful for generating complicated reports and queries. The simplest way to describe a CTE is that it's like creating a view just for the duration of an SQL statement. CTEs enable you to easily break a complicated query into components (similar to breaking an RPG program into reusable modules). SQL queries can also reuse a CTE (like reusing an RPG module) by referencing the expression multiple times.
Let's look at a simple example to help explain CTEs. Programmers new to SQL often try to code an SQL query as shown below because they want the query to first return a derivation or expression (e.g., the year value extracted from a date column) and then reference the name of that result column later in the query. The following statement names the year extraction result as year_val and then attempts to use that result column name on the WHERE clause.
WHERE year_val>=2000 AND region= :inp_region
The SQL standard, however, does not allow the result column name to be referenced on the WHERE clause. To make this SQL statement legal, the programmer would have to duplicate the year(postdate) function on the WHERE clause or create an SQL view that includes this derivation. If no other programs need this derivation, then creating another DB2 object might be overkill. Instead, this might be a good place to use CTEs, as demonstrated in the following query.
In this query, the ByYear CTE is used to allow reuse of the extracted year value. A CTE is defined with the WITH clause at the beginning of the SQL statement. Similar to variables defined in the beginning of a program, once the CTE is defined, it can be referenced and used on the main SQL statement in exactly the same way you'd use a DB2 table. (A later example shows a DB2 table joined to a table expression).
(SELECT year(postdate) year_val, region, revenue, profit, store_id
FROM s ales WHERE region=:inp_region)
SELECT *
FROM ByYear
WHERE year_val>=2000
This example also demonstrates one advantage that CTEs have over SQL views: A CTE definition can include references to host variables or parameter markers (e.g., :inp_region). In this example, the program sets the value of the region host variable to 'NE'.
This simple example explains how to use a CTE, but CTEs are usually used for more than just reuse of a derivation or expression. They are more often used to break a complicated query into multiple steps. For instance, suppose you have a table or physical file with different types of data in each row and there's a record type field that identifies the type of data in that row. In this case, you'd want to join the table to itself to see related data in a consolidated fashion. The database monitor file produced by the STRDBMON command is an example of a table containing different types of data. Thus, CTEs can be quite helpful when trying to query and analyze monitor data.
The following query uses a CTE to filter and select only the SQL statements that are performing a table scan. Table scans sometimes indicate that the SQL statement needs tuning by creating an index. The tablescans CTE first goes through the monitor data selecting only rows with a type or record ID of 3000 (i.e., QQRID field). A 3000 row is generated for each SQL request that is implemented by DB2 as a table scan. Then, the tablescans CTE is joined to the same underlying file (mydbmon) to pair the table scan information with the SQL statement text, which is contained in the 1000 row type.
WITH tablescans(StmtID,EstRows, TotRows) AS
(SELECT DISTINCT qqucnt, qqrest, qqtotr FROM mydbmon WHERE qqrid=3000)
SELECT SUM(qqi6) "Total Time", COUNT(*) "Times Run", qqucnt “Statement ID”,
integer(avg(bEstRows)) "Est Rows Selected", integer(avg(TotRows)) "Table Rows",
qq1000 “Stmt Text”
FROM mydbmon a, tablescans b
WHERE qqrid=1000 AND qqucnt = b.StmtID)
GROUP BY qqucnt, qq1000 ORDER BY 1 DESC
The table below shows the result of the query.
(Click image to enlarge.)
Using the tablescans CTE simplifies the base query by hiding some of the selection into the expression definition. Note that the optimizer does not automatically create a temporary table with the contents of the CTE; instead, it tries to access the result set defined by the CTE without making a temporary copy of the data. The CTE support does allow for multiple CTEs to be defined and used on a single SQL statement.
Kent Milligan is a DB2 UDB Technology Specialist on IBM's eServer Solutions Enablement team. He spent the first seven years at IBM as a member of the DB2 development team in Rochester. He can be reached at
LATEST COMMENTS
MC Press Online