A word that sometimes strikes fear into the programmers, "recursion" is sometimes the only tool for the job, and this article shows you how to recurse in SQL.
Today's article is very simple and straightforward. I'm going to present you with one of the more complex basic SQL statements you'll run across and then break it down piece by piece. I know that the "complex basic" seems somewhat paradoxical, but it's not. The SQL statement I'll be showing you is basic because it has no frills; pretty much everything in the statement is needed to do what we need to do. At the same time, the statement is complex because even at its minimum the statement requires a lot of moving parts. Fear not, gentle reader, I shall act as your guide through this dangerous territory, and at the end, you'll have yet one more powerful weapon in your programming arsenal.
The Database
Before we get to the SQL, let's take a look at the data. First, the relevant portion of the DDS source for the customer master file, CUSMAS:
R CUSMASR
CMCUST 6 0
CMCORP 6 0
CMNAME 50
I've highlighted three fields: the customer number (CMCUST); the corporate customer number, or parent, (CMCORP); and the customer name (CMNAME). Now let's look at the data in the file.
CMCUST CMCORP CMNAME
800,000 0 Top of the Food Chain
801,000 800,000 West Coast Outlet
801,100 801,000 California Unit
801,101 801,000 Washington State Unit
802,000 800,000 East Coast Outlet
802,100 802,000 Illinois Unit
This isn't the entire file, just the records that are specifically relevant to this discussion. These records define a fictitious organization with several levels, the highest level being the corporation Top of the Food Chain, customer number 800000. You can tell it's a top-level organization because it has no parent (CMCORP is zero). Top of the Food Chain has two outlets, one on the west coast and one on the east coast. West Coast Outlet has a customer number of 801000 and a parent of 800000. East Coast Outlet is similar, but with a customer number of 802000. That's it for the second level in the hierarchy. West Coast Outlet has two children, California Unit and Washington State Unit, while East Coast Outlet has only one child, Illinois Unit (yes, I know it's stretching it to put Illinois on the east coast, but bear with me for this example).
The SQL Statement
The business issue is simple. Can I use SQL to iterate through all the parents of a given entity? The answer is absolutely yes, although it takes a little work The statement to get all the parents and grandparents for customer 802100 is shown below:
with Corporate (ID, Parent, Name) as (
select
CMCUST as Id, CMCORP as Parent, CMNAME as Name
from CUSMAS
where CMCUST = 802100
union all
select
CMCUST as Id, CMCORP as Parent, CMNAME as Name
from CUSMAS
join Corporate on
Corporate.Parent = CMCUST
) select * from Corporate
As I noted before, this is a basic but complex statement. There really isn't anything here that can be left out, so I will walk you through each part of the statement. Let me start, though, by explaining the underlying concept. What happens here is that we have defined a common table expression (CTE), and we join that against itself using a UNION. The first half of the UNION is an initial SELECT that defines the root of the query, while the other half is a JOIN that defines the relationship used to get the next level of data. SQL is then smart enough to execute this JOIN over and over until all levels are read (I'm not sure how that magic works; my guess is that the SQL engine repeats the JOIN until it returns no records).
OK, on to the statement, piece by piece.
with Corporate (ID, Parent, Name) as (
This is the defining statement. It lays out all of the elements that will be returned from the query. These names don't have to be the same as they are in the file, and indeed you might find it easier to make your own longer, more intuitive names for the fields in the database. In this case, I want the final result to contain the customer ID, the parent customer, and the name. The with/as syntax is standard for CTEs; note the left parenthesis that identifies the start of the SELECT statement that populates the CTE.
select
CMCUST as Id, CMCORP as Parent, CMNAME as Name
from CUSMAS
where CMCUST = 802100
And this is that statement. Remember, I said there are two components connected via a UNION. The first is the "seed" statement. That starts the initial query. In this case, I've hard-coded it to select a specific customer, but in a more generic application (say in an embedded SQL RPG program), you'd probably use a host variable to contain the seed value. The point is that this first SELECT statement gets the initial record(s) that will then be processed recursively. In this example, I want CMCUST, CMCORP, and CMNAME, but I have to rename them to match the names in the original CTE (thus the AS modifier on each field).
union all
This merges the two halves together. This must be a UNION ALL or else the query will fail with SQL error -342. If you want to remove duplicates, you'll have to do it on the final SELECT.
select
CMCUST as Id, CMCORP as Parent, CMNAME as Name
from CUSMAS
join Corporate on
Corporate.Parent = CMCUST
This is the part of the statement that actually defines the business logic of the recursion. Note that the first three lines simply repeat the original SELECT statement from the first half. That's the nature of a recursive relationship like this; each iteration gets more records from the same table. The recursion is defined in the JOIN, where the next set of records is selected by joining to the CTE using the business relationship. In this case, I am asking for records in the table CUSMAS where the customer field (CMCUST) matches the Parent field (which is CMCORP) in the CTE. Put in English, I am asking for all the parents of the records currently in the CTE. SQL magic then occurs: the new records are added, and the same statement is run on those records. This repeats until no new records are found.
) select * from Corporate
And this is the finale. Now that the CTE (Corporate) has all the records I need, I can select them and process them as needed. In this case, I just want to list them, and when I do, I see this:
ID PARENT NAME
802,100 802,000 Illinois Unit
802,000 800,000 East Coast Outlet
800,000 0 Top of the Food Chain
The seed row is the first row shown, Illinois Unit (customer 802100). The first iteration of the JOIN then retrieved that record's immediate parent, 802000, East Coast Outlet. The final iteration returned the Top of the Food Chain, customer 800000. This is exactly what was expected and a perfect illustration of recursion in action.
Other Uses
This is only one of an entire class of business problems that can be solved by recursion. A tweak here or there, and the same basic technique can provide other results. For example, on the final select, add the clause "where CMCORP = 0" and you'll get only the highest-level parent. Switch the inner join to "Corporate.ID = CMCORP" and now instead of parents, you'll get children.
And this isn't limited to a strict one-to-one or even one-to-many relationship, either. Take the classic many-to-many relationship, the bill of materials (BOM) in manufacturing. A parent item will have many components, and conversely a component item can have many parents. The relationship can go many levels deep. Finding all children and their children for an item is a multi-level BOM explosion, while finding all the parents of a component gives you a where-used query, which can also be used to roll costs up or to drive net change MRP regeneration.
Recursion in SQL is a powerful tool, and once you're comfortable with it, I'm certain you'll find a lot more uses for it in your business applications.
LATEST COMMENTS
MC Press Online