One-table queries are OK, but data is usually spread over different tables in the database. To write a query that uses data from more than one table, you need to join the tables. This article shows ways to do that.
Using one table in a Select statement won't get you very far: most databases are structured to avoid data repetition and use multiple tables to achieve that goal, which means that you'll probably have to use different tables for the actual data and its human-understandable descriptions.
Let me try to explain this with an example: in my InvMst table example (which I started using in a previous article of this series, I have a few columns that are foreign keys; in other words, they are primary keys to other tables. I can easily write a Select statement that returns the Item ID from InvMst and the item description from the ItmMst table. ItmMst is a simplified Item Master table that contains the Item ID (ItemID) and description (ItemDesc) columns. I'm going to refine the Select statement (from the previous article) that lists the Item IDs that exist in warehouse 333 in order to show the item description:
SELECT InvMst.ItemID
, ItemDesc
, ItemQty
, ExpDate
FROM InvMst InvMst
INNER JOIN ItmMst ItmMst ON InvMst.ItemID = ItmMst.ItemID
WHERE InvMst.ItemID = 'A123'
AND WHID = 333
ORDER BY ExpDate
Even though this statement is similar to the original one, there are some important differences, which you'll find in bold. Let's analyze them carefully: the first is the inclusion a prefix in the ItemID column; this is necessary because the ItemID column exists in both the InvMst and ItmMst tables. The next is the new ItemDesc column in the column list; this will include the item's description in the output, thus making it a bit more "human readable." Then notice that I've changed the FROM clause; I'm repeating the table name. No, it's not a typo; this is what is called an "alias." Basically, I'm providing an alternative name for a table that the database engine will use to distinguish between columns with the same name in different tables. Query Manager and IBM i Access' Data Transfer tools automatically add T1, T2, …, Tn aliases when you specify more than one file in the respective windows. In this example, I chose to use the table name because it's not a very long name (even though you can use really long names for SQL objects, as you'll see in later in this series). Then comes the connection between the InvMst and the ItmMst tables, using the INNER JOIN expression. If you're a WRKQRY user, you've seen something similar to this: when you add a second table to the query (Figure 1)…
Figure 1: Add two tables in WRKQRY.
…the system asks for the type of relation between the tables in a new screen (Figure 2).
Figure 2: Choose the type of join in this screen.
Notice those three options? I'll explain them in a moment. When you press ENTER, the system shows another screen, asking how the files should be joined (Figure 3).
Figure 3: Specify how the files should be joined in this WRKQRY screen.
The INNER JOIN syntax is the following:
INNER JOIN <table> ON <join conditions>
Note that the <join conditions> consists of the list of conditions that link the tables to each other. This list can include columns from other tables, literals, or even expressions. In this example, the link is established via the ItemID column, which has the same name in both tables: InvMst.ItemID = ItmMst.ItemID. As you probably guessed by now, the inner join is an intersection between the InvMst and ItmMst tables. This is the simplest and most common type of join. In fact, it's so common that it takes a more "informal" presentation in many situations; the inner join statement can also be written like this:
SELECT InvMst.ItemID
, ItemDesc
, ItemQty
, ExpDate
FROM InvMst InvMst, ItmMst ItmMst
WHERE InvMst.ItemID = 'A123'
AND WHID = 333
AND InvMst.ItemID = ItmMst.ItemID
ORDER BY ExpDate
The inner join can be "hidden" like this, but the other types of joins can't. These other joins are called outer joins. While an inner join combines each row of the left table (InvMst in this example) with every row of the right table (ItmMst) keeping only the rows where the <join condition> is true, outer joins include the rows produced by the inner join as well as the missing rows, depending on the type of outer join:
- A left outer join, or simply left join, includes the rows from the left table that were missing from the inner join.
- A right outer join, also known as right join, includes the rows from the right table that were missing from the inner join.
- A full outer join includes the rows from both tables that were missing from the inner join.
Note that these very same concepts exist in the WRKQRY Type of Join screen, shown in Figure 2: option 1 is an inner join, option 2 is a left outer join, and option 3 corresponds to a right outer join. Figure 4 illustrates the inner and outer join concepts using Venn diagrams.
Figure 4: These Venn diagrams illustrate the different join types.
There are other, more complex forms of join—the exception joins—that I won't discuss here. The ones I've shown are the most commonly used, and they'll suffice for the majority of situations you'll come across.
Nothing replaces practice when it comes to understanding these concepts, so be sure to write some Select statements with inner and outer joins over your own tables. If you want a real challenge, try using a single Select statement to get the same output that one of your report programs provides! It may not be possible, but it's good, fun practice.
LATEST COMMENTS
MC Press Online