In 1936, track athlete Jesse Owens competed in the Berlin Olympic Games, capturing four gold metals. On his feet were the work of the master running shoe makers, the Dassler brothers. These brothers eventually had a large fight and split. Each brother formed his own firm: One started Puma, and the other started Adidas. Sometimes things get even better when they are split up.
Think of SQL in the same way. There are some tasks you just cant accomplish, or at least not elegantly, without breaking a statement up into subqueries. Subqueries are also known as subselects, inner queries, or inner selects. A subquery is a nested query within another query. You can even nest a subquery within another subquery. A subquery may be used in the WHERE, HAVING, or FROM clauses of an SQL statement. The subquery answers some intermediate question, the result of which will be used to complete the main query.
To demonstrate the subquery feature in SQL, I created four tables. Figures 1 through 4 show sample data for the Customer, Inventory, Order, and Item Type tables. If youd like to put these tables on your system so that you can try out my examples, download the text file that accompanies this article from the Midrange Computing Web site at www.midrangecomputing.com/mc. The text file contains the SQL statements for creating the tables and for inserting rows into them. Note that SQL naming conventions are used here. That is, the term table is used instead of physical file. Row means record, and column means field.
The WHERE Clause
Use a subquery in an SQL WHERE clause whenever there is a need for comparison to summarized data within a table. Suppose, for example, you work for the Adidas outlet. One morning, the sales manager comes and asks for a simple query. She wants to know which orders have a value that is greater than the average value of all the orders. She wants to reward the customers who placed the largest orders with a special gift. Your first reaction may be to type in an SQL statement:
select * from orders
where ordervalue > avg(ordervalue)
Unfortunately, this is not allowed. You may not use a column function like AVG (average) in this manner. Instead, you need a summary of the orders table that will return the average of the ordervalue column. This is a subquery. The subquery will look like this: In our hypothetical scenario, this subquery will return the value of 335.833. To get the query to work, insert the subquery into the WHERE clause comparison: This returns a result set with the orders 22, 23, 25, 30, and 36all of the orders that have a value over $335.83. The parentheses around the subquery are required. They indicate to the SQL engine that this is a subquery and not a syntax error on the users part.
Be careful how you define your subquery when you will use the result set to compare to a column in the outer query. The result set of the subquery must be compatible with the columns datatype to which the comparison is being made. In this example, ordervalue is decimal, so the result set from the subquery must return a numeric value.
Correlated Subqueries
You can also use a subquery in an SQL WHERE clause whenever you need a comparison to find out if a row exists in another table. Suppose the sales manager liked the first report so well that she came back and had another request. She decides to pay a personal visit to customers who are not doing business with the company right now. What she needs is a list of every customer who does not have anything on order. The SQL code to accomplish this looks like the following: Notice that the inner and outer queries are tied together by using the customernumber from the outer query in the subquery. This has a special name. It is called a correlated subquery.
In this statement, the outer query takes each row from the customers table and keeps it in the result set if a row does not exist in the result set of the subquery. In this case, the only row returned is that of customer 208; all other existing customers have something on order. For each row in the outer query, the result set in the subquery is recalculated. EXISTS is a keyword in SQL that returns true if the subquery returns any rows at all. It returns false if no rows are returned. The NOT keyword was added because the request was for customers without orders. If the NOT were removed from the WHERE clause, the result set of the outer query would be all customers who have orders.
ALL and ANY
The next thing you know, the inventory manager pops his head in the door. He needs to know all of the inventory items for which the quantity on hand is greater than or equal to the average quantity on hand of any item type. SQL has an answer here as well:
select avg(ordervalue) from orders
select * from orders
where ordervalue >
(select avg(ordervalue)
from orders)
select firstname, lastname,
customernumber
from customers
where not exists
(select * from orders
where customernumber =
customers.customernumber)
select itemnumber,
itemdescription,
itemtype,
quantityonhand
from inventory
where quantityonhand >= all
(select avg(quantityonhand)
from inventory
group by itemtype)
The subquery returns a result set that will include the average quantity on hand for each item type. The outer query takes each row from the inventory table and compares the quantity on hand from that row with each row in the subquerys result set. The row is included in the result set of the outer query if the outer querys quantity on hand is greater than or equal to all entries in the subquery result set. This statement returns items 102, 104, 105, 108, and 110.
You can take the same example and replace the ALL keyword with the keyword ANY to achieve a different result set. ANY will compare each row in the outer query with the complete result set of the subquery. The row will be included in the outer query result set if the quantity on hand is greater than or equal to any of the subquerys rows. With this change, the statement would return items 101, 102, 104, 105, 106, 108, and 110.
Set Membership
Two minutes later, the inventory manager comes back. He thinks there is a data integrity problem and wonders if there may be a bug in one of your programs that is allowing item types into the item master table that are not in the item type master table.
The keyword IN can be used to compare the rows generated from the outer query to a subselects set of values: In this case, the item type from the outer query is compared against a list of all of the item types contained within the item type master table. If the statement finds a type in the item master that is not in the item type master table, the row will be in the result set. In this case, that would be CASUL. When you run this query, item numbers 106 and 109 appear in the result set.
I have shown how to construct a subselect and how to use a subselect in a WHERE clause. In addition, Ive given examples of a variety of essential keywords like IN, NOT, ALL, ANY, and EXISTS. These are fundamental building blocks for advanced SQL use. From here, I will show you examples of other places within SQL that the subselect can be used. These examples will use the basic syntax already seen in this article.
Using Subselects in the HAVING Clause
The HAVING clause is similar to the WHERE clause. The WHERE clause selects individual rows from tables. The HAVING clause selects summaries of rows. During processing, the query selects the rows that should be included based on the WHERE
select itemnumber,
itemdescription,
itemtype,
quantityonhand
from inventory
where itemtype not in
(select tittyp
from itemtype)
clause. Then, the records are grouped based on the GROUP BY clause. It is this intermediate result set that is then acted upon by the HAVING clause to produce the final result set.
This statement produces a summary of the inventory table by itemtype that shows each item type and the average quantity on hand for that item type.
ITEMTYPE AVG
CASUL 20.0000
DRESS 60.0000
RUN 53.2857
Perhaps you were really only interested in item types that had an average quantity on hand greater than fifty. If so, use the HAVING clause to exclude some of the summarized records. The following example does this and its result set is the same as the previous result set without the row for CASUL.
As with the WHERE clause, you can add subqueries to the HAVING clause for more powerful SQL statements. The inventory manager needs a new report. Once again, he wants to know the average quantity on hand for each item type. But this time, he only wants the item types reported where the average quantity on hand is less that the overall average quantity on hand for the entire inventory. To accomplish this, use a subquery in the HAVING clause.
The result set contains only the CASUL shoe item type.
The JOIN Clause
Imagine that you were limited to just one F-spec per RPG program. Life would be quite interesting trying to build sophisticated information systems with only one file allowed per program. Storing data in relational databases implies normalized data and multiple tables connected by key columns. Whether you are coding in RPG or SQL, you will need to access more than one table at a time to solve complex problems. SQL implements access to multiple tables in the FROM clause by using the JOIN keyword. A JOIN will always specify at least two tables.
A simple join would look like this:
Select itemtype,
avg(quantityonhand)
from inventory
group by itemtype
Select itemtype,
avg(quantityonhand)
from inventory
group by itemtype
having avg(quantityonhand)<50
select itemtype, avg(quantityonhand)
from inventory
group by itemtype
having avg(quantityonhand) <
(select avg(quantityonhand)
from inventory)
Select ordernumber,
ord.customernumber,
firstname, lastname
from orders ord
join customers cust
on ord.customernumber =
cust.customernumber
This joins the order table and the customer table by the customer number so that the result set of order numbers can also include the first and last name of the person ordering. Figure 5 shows the result set.
SQL is so flexible that even here in the join clause, a subselect can be used. How can this be an advantage? Look at this example: What will you get as output? (Hey, you should know this stuff by now!) You will get a list of all orders in which the order value is greater than the average of all order values for that customer. The SQL command certainly was a very simple way to get out this information, much easier and faster than coding in RPG. However, something is missing. The result set does not tell you what the average value is. To get the average value in the result set, you must execute a second query containing the subquerynot very elegant.
Once again, subqueries can come to the rescue. By using a subquery in the FROM clause, you get a final result set that includes the average amount. The ability to combine detailed and summarized data using a subquery in the FROM clause is very powerful and little known.
Select ord.ordernumber,
ord.customernumber,
ord.ordervalue,
ordsum.val
from orders ord join
(select customernumber,
avg(ordervalue) as val
from orders
group by customernumber) ordsum
on ord.customernumber=ordsum.customernumber
where ord.ordervalue>ordsum.val
order by ordernumber
See Figure 6 for a comparison of the result sets of this and the previous examples.
Select This, Select That
So far, every example has been with a select statement. Is it possible to use subqueries elsewhere in SQL? It is not only possible, it is profitable to do so. Subqueries in select statements make your reporting options easier and more powerful. Subqueries in other SQL statements make your maintenance tasks a breeze.
select ord.ordernumber,
ord.customernumber,
ord.ordervalue
from orders ord
where ordervalue>
(select avg(ordervalue)
from orders sums
where sums.customernumber =
ord.customernumber)
What if you needed to create a subset of data in a table? Lets say you have to create a copy of your item master table that only contains items of a single item type. Assuming the copy of the table is called partialInventoryMaster, your query would look like this: This would save all of those pesky CASUL rows to partialInventory Master. As you may recall, they caused the data integrity problem since CASUL did not exist in the item type master table. Be careful, you want to match the layout of the target table with the layout of the result set from the subquery. I used an asterisk (*) to get all of the columns in itemmaster. The copy of the table, partialInventoryMaster, did not have to have all of the columns defined in it that are in itemmaster. That, too, could have been a subset. In that case, the asterisk would be replaced by the explicit column names, in the explicit column order found in partialInventoryMaster. A subselect can be used just as easily in a DELETE statement or an UPDATE statement for powerful control.
Tying It All Up
Some of these examples could have had the exact same results with simpler, non-subquery SQL statements, but I have tried to keep the examples as simple as possible so that you can build upon them later. The key thing to remember is that a subquery may be included in the WHERE, HAVING, and FROM clauses in SQL. When in doubt, just try it out. At worst, the SQL processor will give you a nasty message. At best, you will save yourself hours of RPG programming. It is always a joy to do in one SQL statement what would have taken many lines of RPG, DDS, and CL code.
If more RPG programmers made the small effort to learn SQL, there would be a measurable leap in RPG programmer productivity. On top of that, SQL is cross-platform, so it is a skill that will travel with you wherever you may roam. Now put on your running shoes and keep up with the competition by learning SQL.
FIRSTNAME LASTNAME CUSTOMERNUMBER CUSTOMERTYPE
BILLY SMITH 200 RUN JOHNNY WILLIAMS 201 RUN MYRON WILTON 202 DRESS GUNTHER MASTERSON 204 RUN LARRY LEVIATHAN 205 CASUL HARRY NARVISTUS 206 RUN LARRY LION 207 DRESS CYNTHIA CELLO 209 RUN WENDY LAVA 210 RUN TONYA QUEST 203 RUN YIMMEE AWAY 208 CASUL
Figure 1: This is the sample data for the Customer table.
Insert into partialInventoryMaster
select *
from inventory
where itemtype = CASUL
ITEMNUMBER ITEMDESCRIPTION ITEMTYPE QUANTITYONHAND
100 Fast Running Shoe RUN 16 101 Dress Shoe w/Holes DRESS 45 102 Slow Running Shoe RUN 105 103 Blue Running Shoe RUN 12 104 Sprinting Shoe RUN 66 105 Pilgrim Shoe DRESS 75 106 Wagon Wheel Shoe CASUL 35 107 Pole Vault Shoe RUN 2 108 Fast Green Shoe RUN 80 109 Earthy Shoe CASUL 5 110 Yellow Slow Shoe RUN 92
Figure 2: This is the sample data for the Inventory table.
ORDERNUMBER CUSTOMERNUMBER ORDERVALUE LINEITEMS
19 207 100.00 4 20 201 75.00 3 22 210 500.00 4 23 210 600.00 6 24 205 67.00 1 25 206 360.00 10 30 209 1000.00 35 35 202 100.00 1 36 201 890.00 23 37 200 98.00 2 38 204 62.00 3 39 203 178.00 5
Figure 3: This is the sample data for the Order table.
ITEMTYPE TYPEDESCRIPTION
RUN Running Shoe DRESS Dress Shoe
Figure 4: This is the sample data for the Item Type table.
ORDERNUMBER CUSTOMERNUMBER FIRSTNAME LASTNAME
19 207 LARRY LION 20 201 JOHNNY WILLIAMS 22 210 WENDY LAVA
23 210 WENDY LAVA
24 205 LARRY LEVIATHAN 25 206 HARRY NARVISTUS 30 209 CYNTHIA CELLO
35 202 MYRON WILTON
36 201 JOHNNY WILLIAMS 37 200 BILLY SMITH
38 204 GUNTHER MASTERSON 39 203 TONYA QUEST
Figure 5: Joins are simple to master and can be used to build complex queries.
First Query: ORDERNUMBER CUSTOMERNUMBER ORDERVALUE 23 210 600.00
36 201 890.00 With Subquery:
ORDERNUMBER CUSTOMERNUMBER ORDERVALUE VA 23 210 600.00 550.00 36 201 890.00 482.50
Figure 6: A subquery in a FROM
LATEST COMMENTS
MC Press Online