If you are a System i application developer using SQL for data access, you may have encountered the need to implement a full outer join. Or perhaps you have implemented full outer joins on another database platform and are porting your application to support DB2 for i5/OS. In either case, a full outer join enables a query to return both matching and non-matching rows from both of the tables specified in the join. This is different from inner joins (which return only the matching rows), left outer joins (which return all rows from the table on the left, with null values for non-matching rows on the right), and right outer joins (which return all rows from the table on the right, with null values for non-matching rows on the left). Full outer joins are less common than the other three, but sometimes you might need this sort of behavior in your application reports. Unfortunately, as of V5R4, DB/2 for i5/OS does not support full outer joins. However, you can simulate this functionality.
To demonstrate how to implement full outer joins, let's walk through an example. Suppose your company has an application with three tables: Employees, Departments, and Customers. The format and data of the tables look something like the following:
(Note: Click images to enlarge.)
Now, suppose management wants a report that shows all department names and all employees in each department. This would be easily implemented using an inner join, such as this one:
SELECT departments.departmentname, employees.employeeName,
employees.salesrepid
FROM departments
INNER JOIN employees
ON departments.departmentNumber = employees.departmentNumber
The report would look like the following:
But this report shows only the intersection of rows of the two tables—that is, only the matching rows. What if management also wants the same report to include all departments that contain no employees? Piece of cake: Modify the query to use the following left outer join:
SELECT departments.departmentname, employees.employeeName,
employees.salesrepID
FROM departments
LEFT OUTER JOIN employees
ON departments.departmentNumber = employees.departmentNumber
The report now includes this information:
Management is still not satisfied. Now, they want to include all employees that are not in a valid department (yes, this company has some referential integrity issues). This is where it gets a bit dicey. Since the previous report used a left outer join to include all departments without an employee, adding this additional information to the report would also require a right outer join in the query. The combination of a left outer join and a right outer join in the same query is known as a full outer join, and as mentioned, is a type of join that is not currently supported on DB2 for i5/OS.
If the data was stored in a database that supported full outer joins, the query would look something like this:
SELECT departments.departmentname, employees.employeeName,
employees.SalesRepID
FROM departments
FULL OUTER JOIN employees
ON departments.departmentNumber = employees.departmentNumber
If your database is DB2 for i5/OS, don't worry. You can simulate the full outer join functionality by issuing a union against the results of a left outer join and a right exception join (in case you were wondering, issuing a union of the left outer join query with a right outer join query would not work because it would produce duplicates of all the matching rows in the result table). The simulation would look like the following query:
SELECT departments.departmentname, employees.employeeName,
employees.salesrepid
FROM departments LEFT OUTER JOIN employees ON departments.departmentNumber = employees.departmentNumber
UNION ALL
SELECT departments.departmentname, employees.employeeName,
employees.salesrepid
FROM departments RIGHT EXCEPTION JOIN employees
ON departments.departmentNumber = employees.departmentNumber
ORDER BY departmentname, employeeName
The report now looks like this:
You now have a report that shows all departments (including those with no employees), employees in those departments, and employees without a department. You may think your work is done. But not so fast: Management has one final request: Expand the report to include...
- Customers assigned to the employees
- Employees with no customers assigned to them
- Customers not assigned to a valid sales representative
Including this information in the report requires adding another full outer join implementation to the query. After the necessary modifications have been made, the query looks like this:
WITH FOJ1 AS
(SELECT departments.departmentname AS dept_name,
employees.employeeName AS emp_name, '' AS cust_name,
employees.SalesRepID AS rep_id
FROM departments LEFT OUTER JOIN employees
ON departments.departmentNumber =
employees.departmentNumber
UNION ALL
SELECT departments.departmentname AS dept_name,
employees.employeeName AS emp_name, '' AS cust_name,
employees.SalesRepID AS rep_id
FROM departments RIGHT EXCEPTION JOIN employees
ON departments.departmentNumber =
employees.departmentNumber),
FOJ2 AS
(SELECT dept_name, emp_name, customers.customerName AS cust_name,
rep_id
FROM FOJ1 LEFT OUTER JOIN customers
ON rep_id = customers.SalesRepID
UNION ALL
SELECT dept_name, emp_name, customers.customerName AS
cust_name, rep_id
FROM foj1 RIGHT EXCEPTION JOIN customers
ON rep_id = customers.SalesRepID)
SELECT dept_name, emp_name, rep_id, cust_name
FROM FOJ2
ORDER BY dept_name, emp_name;
Implemented in this more complex full outer join example query are a couple of different techniques to note:
- First, notice the use of the WITH keyword to create common table expressions. If you are unfamiliar with common table expressions, think of them as temporary views that exist only during the query's execution. This recommended approach helps reduce the query's complexity by minimizing the amount of text, especially when referenced multiple times in the query. It also helps explain the steps taking place: FOJ1 and FOJ2 are encapsulations of the full outer join steps. The different segments are color-coded to illustrate how each is mapped. Blue is the first full outer join, red is the second full outer join, and green is the final SELECT statement. The blue full outer join expression is executed first, and the results are referenced by FOJ1, which becomes the left table of the second outer join expression. The results of that expression are referenced as FOJ2 and are used as the table reference in the final SELECT statement.
- The second thing to note is the use of the AS clause to rename columns. This will eliminate column ambiguity as the resulting columns advance through the expressions.
In addition, you should be aware of some performance considerations with full outer join simulations:
- Avoid using SELECT *. Instead, select only those columns that will be used in a subsequent common table expression or by the query. This can improve the performance of the query by minimizing the amount of data that DB2 has to move around.
- This simulation of a full outer will not be as efficient as a true full outer join that is supported natively by the database. Just be aware that you can proactively address potential performance issues by having a good indexing strategy. This means that you should at least have indexes built over all of the join columns. For more information on indexing considerations, refer to the white paper "Indexing and statistics strategies for DB2 for i5/OS."
Join Up!
Full outer joins provide a way for your reports to show all matching and non-matching rows in specified tables. And now you know how to simulate them on the System i.
Gene Cobb is a DB2 Technology Specialist on IBM's ISV Enablement team for System i. He has worked on IBM midrange systems since 1988, with 10 years in the IBM Client Technology Center (CTC), IBM Rochester. While in the CTC, he assisted customers with application design and development using RPG, DB2 for i5/OS, CallPath/400, and Lotus Domino. His current responsibilities include providing consulting services to System i developers, with special emphasis on application and database modernization.
LATEST COMMENTS
MC Press Online