Long ago, before my hair was gray, I learned about cross joins. Now, maybe it's my failing memory, but I cannot remember learning any good examples of why I would ever want to do a cross join. However, strangely enough, I recently came across a couple of great examples of using cross joins.
First, let's discuss what a cross join, also called a Cartesian join, is. This type of join connects all records from one file to all records in another file. So if file A has three records and file B has four records, the resulting join has 12 records (see Figure 1). The size of the resulting table grows exponentially with the size of joined tables, leading to performance issues. This is fine with small files, but as the files grow larger, the performance penalty grows higher. So be careful which files you apply this technique to.
Figure 1: Here's a sample of what you get with a cross join.
My favorite example of using a cross join answers a frequent request: "How can I get totals on the same line with my detail?" (See Figure 2.)
Part On Hand Qty Total Qty Percent
101 1,000 31,000 3
102 20,000 31,000 65
103 10,000 31,000 32
Figure 2 : This is the result of a Select with totals on the detail line.
Figure 3 shows the SQL statement used to generate those results. The statement pulls data from the Item Master file. Column 1 is the part number, column 2 is the current on-hand quantity in inventory, column 3 is the total quantity of all items in inventory, and column 4 is each item's percent of the total inventory.
|
Figure 3: This is an SQL statement with a cross join to summary record.
The second file (Totalfile) is not really a file, but rather a SELECT statement that builds a temporary file object for use with just this statement. The column named Total is actually the result of a SUM() on the on-hand quantity column. Figure 4 illustrates how this join works.
Figure 4: Here's how this type of cross join works.
The many variations on this theme allow programmers to pull column functions--such as totals, averages, minimums, maximums, standard deviations, and more--into each detail line. The cross join provides a simple vehicle to bring these column functions into the detail lines.
A second and more expensive use of the cross join is to connect every record in a table to every other record in the same table. Figure 5 shows the basic concept behind this kind of cross join.
Figure 5: Here are the results of a cross join to self.
Finding a good example for this kind of cross join always eluded me until a coworker came to me asking for suggestions on using SQL to solve a problem for a client. The challenge was this: Identify all items whose part number contains one of a selected list of parts. Figure 6 shows a list of sample parts. What the client wanted was to select parts 101, 102, and 103 and then produce a list of other part numbers that contained those part numbers.
Figure 6: These are the sample part numbers.
The SQL statement in Figure 7 accomplishes this task using a cross join. The file is joined to a sub-select of itself, including only selected records. The item numbers of all of these candidate records are evaluated to see if the candidate part numbers are like the selected part numbers. Finally, all records in which both items are exactly alike are omitted.
|
Figure 7: Here's an SQL statement with a cross join to a sub-select.
Figure 8 shows the results. The use of LIKE, STRIP, and CONCAT were critical to performing the comparisons of one item number to another. This problem provides a great example of a situation in which cross joining a file to itself or another file is a great solution to a problem.
Figure 8: And here's your final cross join illustration.
Remember that large numbers of records in the joined files can create serious performance issues. Consider using sub-selects where possible to reduce the size of the result table.
Kevin Forsythe is the author of the new book SQL for eServer i5 and iSeries. He has over 18 years of experience working with the iSeries platform and its predecessors. He has been a member of the DMC team for the past nine years. Kevin's primary responsibility is providing iSeries education, but he also provides customers with project management, system design, analysis, and technical construction. In addition to his technical skills (RPG IV, CL, OS/400, SQL, FTP, Query, VB, Net.Data), Kevin possesses the ability to communicate new and complex concepts to his students. He has been the primary instructor for DMC's iSeries-based AS/Credentials training courses since 1997 and has authored courses such as Advanced ILE, SQL, Embedded SQL, Operations Navigator, and Intro to WebSphere Studio. An award-winning speaker, he has spoken at every COMMON Conference since the spring of 2000.
LATEST COMMENTS
MC Press Online