10
Fri, Jan
4 New Articles

The Power and Magic of LPG

Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

With the availability of V5R3 i5/OS and DB2 UDB for iSeries, the SQL Query Engine now supports a powerful strategy for minimizing query input/output (I/O) and maximizing query performance. This new strategy appears as magic, given the Query Optimizer's ability to rewrite the query using generated local selection predicates where none were specified. This article explores the technique and benefits of look-ahead predicate generation (LPG).

In the world of query optimization, minimizing or even eliminating the reading and processing of unnecessary data is the key to good performance because I/O operations are relatively slow operations. The job of the query optimizer is to choose the appropriate methods and build a strategy that will access and process the rows as quickly and efficiently as possible. With a high number of strategies available, the optimizer will normally be able to build a query plan that meets the user response time requirements.

Within a query, the local selection predicates are used to specify which rows are to be selected for processing. One technique for eliminating rows from further processing requires reading the data from the table(s) and testing values. Another technique for eliminating rows is accomplished without actually reading the data from the table(s) but instead relies upon mathematic principles and other database objects to avoid testing rows in the table(s). For example, given a query with local selection that identifies one row out of a one-billion-row table, the database engine can read and test every row (one billion tests), or it can use an index to identify only the one matching row, eliminating all the other rows without reading and testing them. While the result will be the same, the difference in query performance between these two strategies will be significant.

Another query scenario that can cause a lot of reading (and poor performance) is joining of tables. Specifying a join usually causes the database engine to select rows from one table and perform a read operation to find any matching rows in another table. By definition, the join can reduce the result set, but only by testing for the existence of the row. The potential for reading and ultimately rejecting many rows makes this type of query problematic. Again, if the query optimizer can employ specific strategies to eliminate rows before the join, query performance can be increased and use of database server resources can be reduced.

DB2 UDB for iSeries supports many methods and employs many strategies to substantially reduce the reading of rows and the processing of data. One such strategy is called look-ahead predicate generation (LPG). This powerful and elegant feature can significantly reduce query times.

A simple join scenario will help explain why the LPG support can have such a positive affect on a query's performance.

A Couple of Methods

Assume two tables: one relatively small (SmallTableA) and one relatively large (LargeTableB). A join query is issued referencing both tables. Local selection is defined against SmallTableA, but not LargeTableB. The selectivity of the local selection predicate is very high (i.e., it identifies two rows in SmallTableA).

SELECT   *
FROM     SmallTableA  A,
         LargeTableB  B
WHERE    A.Join_Col = B.Join_Col
AND      A.Col1 IN (112358, 132134)

With no indexes defined on LargeTableB, you have various table access and join processing combinations:

One strategy is to join LargeTableB to SmallTableA. Select all rows via a full table scan of LargeTableB and join every row to SmallTableA via an index or a hash table. Figure 1 illustrates this join.


http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140500.jpg

Figure 1: Here, LargeTableB is joined to SmallTableA. (Click images to enlarge.)

Given no local selection, every row selected from LargeTableB will be used to try the join to SmallTableA. This combination will produce a large number of reads.

Another strategy is to join SmallTableA to LargeTableB. Select all rows via full table scan of LargeTableB to create a temporary indexed list or temporary hash table, as shown in Figure 2.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140501.jpg

Figure 2: Here, SmallTableA is joined to LargeTableB.

Given no local selection to reduce the rows, the temporary indexed list or temporary hash table on LargeTableB will have to represent all the rows from LargeTableB, and both temporary data structures will take a relatively long time to populate. These structures will be relatively large as well.

A Couple of Better Methods

With DB2's LPG and query rewrite techniques, the join scenarios can be enhanced considerably.

If LargeTableB has some local selection defined and this local selection is somewhat selective (i.e., narrowing down the rows to be selected and processed), then the database engine can use methods that will reduce the number of rows accessed. This is where LPG comes in! The query optimizer can "look ahead" to generate local selection from other tables in the query and then transfer that local selection to another table. When this local selection is available, methods that take advantage of indexing technology can be employed to speed up the query request. (Click here for more information about indexing strategies.)

Using the example above, the local selection specified for SmallTableA will be used to identify not only the rows in SmallTableA but also the corresponding join column values. For example, the SmallTableA row containing A.Col1 = 112358 also contains A.Join_Col = Value1 (i.e., the corresponding join column value).

These distinct join column values derived from SmallTableA are used to generate local selection on LargeTableB. Now that LargeTableB has local selection, additional techniques can be applied to minimize or eliminate reading large sets of rows.

Here's a representation of the rewritten query:

SELECT   *
FROM     SmallTableA  A,
         LargeTableB  B
WHERE    A.Join_Col = B.Join_Col
AND      A.Col1 IN (112358, 132134)
AND      B.Join_Col IN (Value1Value2)

The new local selection allows the optimizer to consider additional join strategies. If the (generated) local selection on LargeTableB significantly reduces the number of rows accessed, LargeTableB may be placed in the first (primary) join position and accessed via an index.

Another strategy has LargeTableB placed in the second join position. If the join on LargeTableB is implemented via a temporary hash table, the amount of data placed in the hash will also be reduced by the (generated) local selection. Populating the temporary hash table can be much faster if an index is created for the (generated) local selection. Figure 3 illustrates this strategy.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140502.jpg

Figure 3: Here, LargeTableB is placed in the second join position.

The real magic of LPG comes into play when there is more than one table joined to the larger table.

Assume four tables: three relatively small (SmallTableA, SmallTableB, SmallTableC) and one relatively large (LargeTableD). A join query is issued referencing both small and large tables. Local selection is defined against the small tables, but not the larger LargeTableD. The selectivity of the local selection predicates is very high (i.e., it identifies few rows in SmallTableA, SmallTableB, and SmallTableC).

SELECT    *
FROM     SmallTableA  A,
         SmallTableB  B,
         SmallTableC  C,
         LargeTableD  D
WHERE    A.Join_Col = D.Join_Col1
AND      B.Join_Col = D.Join_Col2
AND      C.Join_Col = D.Join_Col3
AND      A.Col1 IN (112358, 132134)
AND      B.Col6= 'ABC'
AND      C.Col4= 2005
AND      C.Col5= 'January'

These are the various table access and join processing combinations for the first join pair:

  • SmallTableA joined to LargeTableD
  • SmallTableB joined to LargeTableD
  • SmallTableC joined to LargeTableD

Select all rows via full table scan of LargeTableB to create a temporary indexed list or a temporary hash table. For all rows joined to LargeTableD, join to other two SmallTables, as shown in Figure 4.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140503.jpg

Figure 4: Here, all rows joined to LargeTableD join to other two SmallTables.

Now, consider these table access and join processing combinations:

  • LargeTableD joined to SmallTableA
  • LargeTableD joined to SmallTableB
  • LargeTableD joined to SmallTableC

Select all rows via full table scan of LargeTableD and join every row to SmallTableA via an index or a hash table. For all rows from LargeTableD, join to the other three SmallTables, as shown in Figure 5.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140504.jpg

Figure 5: Here, all rows from LargeTableD join to the other three SmallTables.

Using the four-table query example above, the local selection specified for SmallTableA, SmallTableB, and SmallTableC will be used not only to identify the rows in the respective tables, but also to identify the corresponding join column values. These distinct join column values (derived from SmallTableA, SmallTableB, and SmallTableC) are used to generate local selection on LargeTableD using LPG. See Figure 6.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140505.jpg

Figure 6: The distinct join column values use LPG to generate local selection on LargeTableD.

The rewritten query would be represented like this:

SELECT   *
FROM     SmallTableA  A,
         SmallTableB  B,
         SmallTableC  C,
         LargeTableD  D
WHERE    A.Join_Col = D.Join_Col1
AND      B.Join_Col = D.Join_Col2
AND      C.Join_Col = D.Join_Col3
AND      A.Col1 IN (112358, 132134)
AND      B.Col6= 'ABC'
AND      C.Col4= 2005
AND      C.Col5= 'January'
AND      D.Join_Col1 IN (Value1, Value2)
AND      D.Join_Col2 IN (Value1, Value2, ... Value n)
AND      D.Join_Col3 IN (Value1, Value2, ... Value n)

Now that LargeTableD has local selection defined, an appropriate set of indexes can be created and applied to minimize (or eliminate) reading large sets of rows.

Yet another DB2 UDB for iSeries feature can be employed to handle the newly generated local selection--namely, index ANDing. This is the ability of the query optimizer and database engine to use more than one index to specifically identify and access rows within a given table. Using single-column-key indexes defined for each join column in LargeTableD, the database engine can merge the list of rows identified by each respective index and use the final list to access the table. The intersection (ANDing) of all three local selection predicates should result in a much narrower set of rows to be accessed and processed in LargeTableD. Using the indexes allows the database engine to avoid reading a large set of rows, resulting in a more efficient and faster query. See Figure 7.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140506.jpg

Figure 7: Index ANDing allows the database engine to avoid reading a large set of rows.

The optimizer can also use LPG with common join requests that contain a one-to-one relationship. In this case, LPG is used to recursively look ahead and generate local selection from the neighboring table "downstream." This has the tremendous benefit of allowing the optimizer much more latitude in setting the join order. Furthermore, this strategy reduces the risk of any one particular join order delivering poor performance.

Assume four tables all roughly the same size: TableA, TableB, TableC, and TableD. A join query is issued referencing all four tables with a one-to-one relationship (A to B, B to C, C to D). Local selection is defined against only one table, TableA.

SELECT   *
FROM     TableA  A,
         TableB  B,
         TableC  C,
         TableD  D
WHERE    A.Join_Col1 = B.Join_Col1
AND      B.Join_Col2 = C.Join_Col2
AND      C.Join_Col3 = D.Join_Col3
AND      A.Col5 = 'ABC'

Determining the best-performing query plan can be problematic, depending on the selectivity of the local selection predicates defined for TableA and the relative position of TableA in the join order. If all the tables have some local selection defined, the optimizer has many more choices for the query join order and consequently more opportunities to provide multiple high-performance query plans.

Using the four-table query example above, the local selection specified for TableA will be used not only to identify the rows in TableA, but also to identify the corresponding join column values. These distinct join column values (derived from TableA) are used to generate local selection on TableB.

The generated local selection specified for TableB will be used to identify the rows in TableB and identify the corresponding join column values. These distinct join column values (derived from TableB) are used to generate local selection on TableC.

The generated local selection specified for TableC will be used to identify the rows in TableD and identify the corresponding join column values. These distinct join column values (derived from TableC) are used to generate local selection on TableD.

Now the query has been rewritten and optimized with local selection on all four tables (Figure 8).

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140507.jpg

Figure 8: The query is optimized with local selection on all four tables.

Here's a representation of the rewritten query:

SELECT   *
FROM     TableA  A,
         TableB  B,
         TableC  C,
         TableD  D
WHERE    A.Join_Col1 = B.Join_Col1
AND      B.Join_Col2 = C.Join_Col2
AND      C.Join_Col3 = D.Join_Col3
AND      A.Col5 = 'ABC'
AND      B.Join_Col1 IN (Value1, Value2, ... Value n)
AND      C.Join_Col2 IN (Value1, Value2, ... Value n)
AND      D.Join_Col3 IN (Value1, Value2, ... Value n)

Being aware of and planning for LPG will allow the creation of indexes to effectively support join queries. Without this knowledge, and the corresponding indexes, better query performance will be squandered. The strategy for creating indexes for the generated local selection predicates is identical to creating indexes for user-supplied local predicates. Identifying the join columns is a clue to identifying which columns to create indexes on.

Query Magic

Like a good magician, the query optimizer uses techniques that are hidden from the audience. Only the final result is obvious. Given that query rewrite and LPG occurs out of sight, it will be helpful to get some feedback from the optimizer when this strategy is employed. One simple technique is to observe the estimated selectivity of each table in the query when that table has no user-supplied local selection predicates. Without LPG, the estimated number of rows selected should be equal to the total number of rows in the table. If the estimated number of rows selected is less than the total, then some selection is being applied.

Another more accurate technique is to rely on iSeries Navigator - Visual Explain to render the query graph. Once the query graph is displayed, any nodes that are under the influence of LPG can be highlighted (in green). To identify the use of LPG, select the "Highlight LPG" option on the View menu within the Visual Explain window.

http://www.mcpressonline.com/articles/images/2002/LPGArticle(Final)DRAFTV3--02140508.png

Figure 9: iSeries Navigator - Visual Explain renders the query graph.

The key to fully utilizing an optimization capability is in integrating that capability into the normal flow of query optimization--in effect, making that capability pervasive. Traditionally, sophisticated optimization techniques rely on interrogation of the query or query environment to recognize when and where to apply the specific technique. If this identification is successful, then the technique can be successful; otherwise, the technique is useless or even detrimental.

While the DB2 UDB for iSeries LPG may seem like a specialized and rarely needed technique, it really is a pervasive and powerful tool within the optimizer's bag of tricks. The simple join examples illustrated here are but one area in which this capability can be applied. Other areas include queries against star schema and snowflake schema data models, as well as SQL requests with subqueries, common table expressions, or derived tables.

Rob Bestgen is a senior technical staff member and query optimizer design leader on the IBM DB2 for iSeries team in Rochester, Minnesota. Rob can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

Mike Cain is a senior technical staff member and leader of the DB2 for iSeries Center of Competency in Rochester, Minnesota. Mike can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: