If your environment shares one set of embedded SQL programs or SQL packages among multiple library lists, you may be paying a performance penalty for access plan rebuilds.
Thanks to the spectacular library list concept, it's common to have multiple instances of application data libraries on the IBM i. Further, a typical IBM i application stores its programs in one or more object libraries, separate from the data libraries. This configuration allows a single set of application programs to operate on multiple, independent data libraries.
Developers who know the power of the library list perform due diligence to make sure their SQL code can run over multiple sets of libraries (such as dev, test, and multiple production libraries) so that only one copy of the program is required. However, if you share SQL code containing prepared statements (such as embedded SQL, persistent stored modules (PSMs, aka SQL routines), prepared dynamic statements stored in an SQL package, etc.), there may be a subtle performance tax lurking.
For the record, the term program used herein refers to any DB2 for i embedded SQL program that contains SQL statements that have a reusable access plan and use the system (*SYS) naming convention. This list includes SQL-based triggers, functions, and stored procedures because, under the covers, they are created as C embedded SQL programs.
To illustrate the hidden tax problem, consider the following simple application library lists:
Application Library List #1
QTEMP
APPDATA (data library for company #1 containing tables, views & data areas)
APPOBJ (program object library including embedded SQL programs)
QGPL
Application Library List #2
QTEMP
APPDATA2 (data library for company #2 containing tables, views & data areas)
APPOBJ (program object library including embedded SQL programs)
QGPL
The application programs in library APPOBJ are shared in common and can run against one of two data libraries: APPDATA and APPDATA2. A new embedded SQL RPG program called ORDERSR is introduced into APPOBJ and is called with library list number #1 in play. The program contains one prepared SQL query.
After the program finishes, you execute the Print SQL Information (PRTSQLINF) command against the program object (PRTSQLINF ORDERSR *PGM). Inside the command's spool file output, you find an SQL4021 message like this:
SQL4021 Access plan last saved on 02/09/14 at 14:21:22.
A saved access plan is a good thing because the next time the program runs, DB2 normally won't take the time to build it from scratch. If you run the program again with the same library list, DB2 will not resave the access plan associated with the SQL statement in the program.
Next, you change the job's library list to use library list #2 (with the data library changing from APPDATA to APPDATA2) and call the program again. You run PRTSQLINF again and see message SQL4021 again, but this time with a more recent timestamp than the original. Something like this:
SQL4021 Access plan last saved on 02/09/14 at 14:24:30.
What happened? Why was the access plan saved again?
The answer is because DB2 has detected that the tables it's now querying with library list #2 are different from the ones it started with in the original access plan. Even though the tables and indexes may indeed be structurally the same between libraries APPDATA and APPDATA2, DB2 (for various reasons) has to rebuild the query's access plan.
The point is that the rebuild process takes time for DB2 to complete. For programs that are called relatively infrequently, it's probably not a big deal. But, if your environment features the same SQL programs being called rapidly and repeatedly against several alternative data libraries, there is a tax on DB2 to constantly rebuild and save the access plans.
In this situation, one possible solution would be to create separate program libraries for SQL objects (a companion SQL program library to each data library). This would allow each data set to have its own access plans without the cost of rebuilding the access plans. The drawback of this configuration is that every time a program is created or changed it needs to be deployed to multiple libraries.
For a second solution, IBM's Scott Forstie said his approach to tackle this problem would be to change the static SQL program to use dynamically prepared SQL statements (using PREPARE rather than EXECUTE IMMEDIATE). Apparently, this will perform better, and the only drawbacks are that dynamic SQL statements can be a problem to manage and you lose the convenience of having newer versions of DB2 for i automatically catalog your static statements (see below). Thanks for the advice, Scott!
Using the New Program and Package Statement Views in IBM i 7.1 TR8
So how do you know if you have this access plan rebuild problem? It's hard to tell, especially because PRTSQLINF is limited in what it reports about access plan updates. With PRTSQLINF, you can tell an access plan was updated recently (but not how frequently.)
Starting in IBM i 7.1 TR8 (group PTF level 29), two new catalog views were added to the QSYS2 schema: SYSPROGRAMSTMTSTAT and SYSPACKAGESTMTSTAT. These views are intended to return information about individual statements within an embedded SQL program/PSM or SQL package. The views contain a row for all static statements stored in all programs and packages on the IBM i. Of interest to this topic is a column called NUMBER_REBUILDS that can be used to determine whether DB2 is spending precious time rebuilding an access plan.
To illustrate how to find unwanted access plan rebuilds, consider an SQL procedure that runs a prepared SQL statement. When this procedure is run alternately against multiple data libraries, the SYSPROGRAMSTMTSTAT view can be queried to check whether DB2 is rebuilding the access plan:
Step 1: Build an SQL stored procedure with a prepared SQL statement:
CREATE OR REPLACE PROCEDURE DEV.ORDERS_SQL
(IN StartDate TIMESTAMP,
IN EndDate TIMESTAMP,
IN CustomerId INT)
LANGUAGE SQL
SPECIFIC DEV.ORDERS_SQL
DETERMINISTIC
RESULT SETS 1
SET OPTION DATFMT=*ISO,USRPRF=*OWNER,COMMIT=*NONE
BEGIN
-- NOTE: Make sure that *SYS naming is used when creating procedure
DECLARE Orders CURSOR WITH RETURN TO CLIENT FOR
SELECT soh.SalesOrderId,CarrierTrackingNumber,OrderQty,
UnitPrice,UnitPriceDiscount,soh.ModifiedDate,
Cast(ORDERDATE AS DATE) OrderDate,
Cast(DUEDATE AS DATE) DueDate,
Cast(SHIPDATE AS DATE) ShipDate,
Status,
PurchaseOrderNumber,
AccountNumber,CustomerId,
p.Name AS ProductName,p.ProductNumber,p.Color
FROM SalesOrderDetail sod
JOIN SalesOrderHeader soh On soh.SalesOrderId=sod.SalesOrderId
JOIN Product p On p.ProductId=sod.ProductId
WHERE soh.OrderDate BETWEEN StartDate AND EndDate
AND soh.CustomerId=CustomerId
FOR READ ONLY;
OPEN Orders;
SET RESULT SETS CURSOR Orders;
END
;
Step 2: In "Run SQL Scripts" call the procedure several times while changing the data library in the library list:
CL: CHGLIBL (ADVWORKS);
CALL DEV.ORDERS_SQL('2001-01-01-00.00.00','2005-01-01-00.00.00',676);
CALL DEV.ORDERS_SQL('2001-01-01-00.00.00','2005-01-01-00.00.00',670);
CL: CHGLIBL (ADVWORKS08);
CALL DEV.ORDERS_SQL('2002-01-01-00.00.00','2009-01-01-00.00.00',30070);
CL: CHGLIBL (ADVWORKSTS);
CALL DEV.ORDERS_SQL('2003-01-01-00.00.00','2004-05-01-00.00.00',670);
Step 3: Query the SYSPROGRAMSTSMTSTAT view and find out the rebuild count:
SELECT *
FROM QSYS2.SYSPROGRAMSTMTSTAT
WHERE PROGRAM_SCHEMA='DEV'
AND PROGRAM_NAME='ORDERS_SQL'
AND NUMBER_REBUILDS>0;
Assuming the procedure was called four times as shown in Step 2, the SYSPROGRAMSTMTSTAT view reveals the procedure has been executed four times and the plan was rebuilt three times (one for each library list change). Of course, examining the output of PRTSQLINF can be used as well to show when the access plan was last updated (although it doesn't give a count of how many times it was rebuilt.)
The abridged query results from Step 3 are shown here:
PROGRAM_NAME |
NUMBER_TIMES_EXECUTED |
NUMBER_REBUILDS |
LAST_QDT_REBUILD_REASON |
STATEMENT_TEXT |
ORDERS_SQL |
4 |
3 |
B6 |
DECLARE ORDERS CURSOR … |
The last reason (B6) for the plan being rebuilt is:
B6 - The query was reoptimized because the value of a host variable changed and the access plan is no longer valid
The reason codes in column LAST_QDT_REBUILD_REASON explaining why the access plan was rebuilt can be found in this web page. (Interestingly, this B6 code is the given reason for the access plan rebuild even if the host variable values are the same across library sets.)
There are many other useful columns contained in this view (such as cumulative rows affected, module name, host variable count, etc.), so please check out the references at the end of this tip. To recap, SYSPROGRAMSTMTSTAT contains run-time information about all of the static prepared statements in all of the embedded SQL programs on the IBM i. That is extremely useful to a DBA or developer who is searching for queries that return large row sets or take a long time to run.
The same "rebuild access plan" demonstration can be made for SQL packages. Consider the following embedded SQL RPG code snippet that connects to a remote system (using DRDA) and runs a parameterized INSERT statement against one of three application data libraries. When this program (named PKGTESTR) is called, an SQL package object (*SQLPKG) will be created on the remote system (SYS520DEV). Unless overridden, the SQL package name and schema on the remote system will be the same as the program name and schema from the local system. The primary purpose of the remote SQL package is to store the access plan for various prepared SQL statements.
Exec SQL CONNECT TO SYS520DEV;
Exec SQL SET CONNECTION SYS520DEV;
StartStamp=%TimeStamp;
For i=1 To 2000;
If %Rem(i:3)=0;
Exec SQL CALL QCMDEXC ('CHGLIBL (ADVWORKS)');
ElseIf %Rem(i:2)=0;
Exec SQL CALL QCMDEXC ('CHGLIBL (ADVWORKS08)');
Else;
Exec SQL CALL QCMDEXC ('CHGLIBL (ADVWORKSCP)');
EndIf;
Exec SQL
INSERT INTO DATA (C1,C2)
SELECT PRODUCTID,COUNT(*) NO_TRANS
FROM TRANSACTIONHISTORY
WHERE TRANSACTIONDATE BETWEEN :START AND :END
GROUP BY PRODUCTID
ORDER BY NO_TRANS DESC
FETCH FIRST 3 ROWS ONLY;
If SQLCode<*Zero;
Dsply SQLCode '';
EndIf;
EndFor;
EndStamp=%TimeStamp;
Exec SQL DISCONNECT SYS520DEV;
The INSERT statement is run 2000 times, executing alternately against one of three libraries. After running the program from the local system, the SYSPACKAGESTMTSTAT can be queried on the remote system to see how many rebuilds occurred.
SELECT *
FROM SYS520DEV.QSYS2.SYSPACKAGESTMTSTAT
WHERE PACKAGE_NAME='PKGTESTR'
AND NUMBER_REBUILDS>0;
The number of rebuilds in the package for the INSERT statement on the remote system shows 4000, a number higher than I expected. Perhaps it has something to do with the association between the SQL package and the embedded SQL program. The query against SYSPACKAGESTMTSTAT also showed the statement was executed 2024 times (instead of 2000), so there may be some extra things going on in the DRDA world that I don't understand. But the point is that the view reveals there are quite a few rebuilds happening.
Remember, there are legitimate reasons, such as a significant row count change in a table or a new index that can cause DB2 to rebuild a query plan. Therefore, having some rebuilds over time shouldn't be surprising. But a significant ratio of rebuilds to executions can reveal some areas where DB2 may be spending too much time on rebuilds.
A query against these new views can be used to find, for example, all packages and programs that have statements with greater than 10% rebuild:execution ratio:
SELECT PACKAGE_SCHEMA,PACKAGE_NAME,NUMBER_TIMES_EXECUTED,
NUMBER_REBUILDS,STATEMENT_TEXT,LAST_QDT_REBUILD_REASON
FROM QSYS2.SYSPACKAGESTMTSTAT
WHERE CASE WHEN NUMBER_TIMES_EXECUTED=0 THEN 0 ELSE
DEC(NUMBER_REBUILDS,9,0)/NUMBER_TIMES_EXECUTED END>.1
AND PACKAGE_SCHEMA<>'QSYS'
UNION ALL
SELECT PROGRAM_SCHEMA,PROGRAM_NAME,NUMBER_TIMES_EXECUTED,
NUMBER_REBUILDS,STATEMENT_TEXT,LAST_QDT_REBUILD_REASON
FROM QSYS2.SYSPROGRAMSTMTSTAT
WHERE CASE WHEN NUMBER_TIMES_EXECUTED=0 THEN 0 ELSE
DEC(NUMBER_REBUILDS,9,0)/NUMBER_TIMES_EXECUTED END>.1
AND PROGRAM_SCHEMA<>'QSYS';
In summary, it's customary to share a single program library over many data library sets. Unlike other database platforms, the library list concept available to DB2 for i is convenient and prevents unnecessary duplication of program code. However, there is a slight drawback to this approach because the changing of the library list can cause an access plan rebuild for prepared statements.
How much of an impact this has on a system depends on its horsepower (or CPW), the frequency of invocations of the trigger/function/procedure, the complexity of the statement(s) being rebuilt, and the frequency of library set switches. Unfortunately, I'm not familiar with any place in the DB2 world that indicates how much time DB2 spent on an access plan rebuild, so the actual cost is hard to quantify.
The rebuild-related columns in the new SYSPROGRAMSTMTSTAT and SYSPACKAGESTMTSTAT views can help identify which procedures and packages have SQL statements that are undergoing frequent rebuilds. If you suspect these rebuilds are causing a performance drag, potential solutions include distributing multiple copies of the embedded SQL program or, better, changing the program to dynamically prepare the SQL statement(s).
References
QSYS2.SYSPACKAGESTMTSTAT—IBM Documentation
QSYS2.SYSPROGRAMSTMTSTAT—IBM Documentation
LATEST COMMENTS
MC Press Online