Not everything that's old is outdated!
In my previous article, I discussed the goals I had set out for our programming group and how those goals helped determine the skills that I was looking for among the candidates for a job in our company. One of those goals was to encourage the use of SQL among our programmers for database I/O.
There are lots of reasons for this, including the facts that SQL is a standard database access language that can be used across many systems and programming languages and that the SQL query engine is the only one that IBM is enhancing on the IBM i to improve efficiency in data access. Of course, many other reasons have been listed by other authors here before. My main argument to my programmers, however, is that it makes your programming so much more efficient, especially if you take advantage of some of the more than 100 SQL functions that are available in the language on the IBM i. Furthermore, if you don't find the function you need, you can create your own user-defined function. So if you're stuck on the basics and just using SELECTs to retrieve and sort data in your programs, then you're missing out on the true power of SQL. The IBM Infocenter has a handy list of these functions.
If you're new to SQL or want to move beyond the basics, one book I'd recommend is SQL for eServer i5 and iSeries by Kevin Forsythe. It is well-written, gives a nice overview of the language, and has enough examples that you'll want to keep it as a reference.
To illustrate why SQL helps you to be more efficient as a programmer, I'd like to look at a situation that happened in our shop and discuss one unusual SQL function I had not used before that we used to solve the problem quickly. I'll conclude with some caveats on its use with SQL.
As part of our daily routine, our system copies invoices that have been paid from a current invoice file to an invoice history file. Due to an unusual set of circumstances that I won't discuss here, the invoice records for one day were copied twice, creating duplicate invoice records in our history file. While there are a number of ways to remove the duplicate records, one very easy way is with the use of an SQL statement that includes two functions: the Relative Record Number function RRN and the Minimum function MIN.
For this example, I've abbreviated our INVOICE table so that each row consists of a customer number, an invoice number, the date sold, and the total dollars.
CUSTNO |
INVCNO |
DATSLD |
TOTAL |
10658 |
5717045 |
20120614 |
118.64 |
12546 |
5717041 |
20120614 |
222.76 |
7056 |
5717132 |
20120614 |
398.36 |
After the invoice records were inadvertently copied twice, the INVOICE table looked as shown below:
CUSTNO |
INVCNO |
DATSLD |
TOTAL |
10658 |
5717045 |
20120614 |
118.64 |
12546 |
5717041 |
20120614 |
222.76 |
10658 |
5717045 |
20120614 |
118.64 |
7056 |
5717132 |
20120614 |
398.36 |
12546 |
571741 |
20120614 |
222.76 |
7056 |
5717132 |
20120614 |
398.36 |
Now let's look at the SQL statement that was used to delete the duplicate records.
DELETE from INVOICES A
where rrn(A) > (SELECT Min(rrn(B)) from INVOICES B where A.invcno = B.invcno)
In this case, we're using the rrn() function to help uniquely identify each row in the table. In the subquery, we're finding the row with the smallest relative record number for each unique invoice number by using another function, min(), to take the minimum of the result of the relative record number function for those rows with matching invoice numbers. In the main clause, we use the DELETE statement to delete all rows whose relative record number is greater than the minimum. In the case of our file, we had two rows for each invoice, but the above statement will remove all duplicate rows no matter how many there are, since it is deleting all rows with the same invoice number that have a relative record number greater than the minimum. Finally, we're using two alias table names for the INVOICE table, A and B, so we can join the INVOICES table to itself with invcno in the subquery SELECT.
Before this problem occurred, I don't think I had thought about relative record numbers since the days of the System/34/36, and I frankly didn't realize that IBM had an SQL function to retrieve them. In a more modern table with an identity column, I wouldn't have needed rrn(), since I could have used the identity instead to uniquely identify the rows. In that case, the DELETE statement above would be the same, with the substitution of the identity column name for the rrn() function, and then we'd use the min() to find the minimum identity value. However, in this case, it was a simple table that had no identity column, and the rrn() function certainly came in handy.
One note of caution: You might be tempted to view the rrn() as something similar to either an identity for a row or a number representing the relative position of a row in a table. For example, if rrn(A) = 3, then it's referring to the third row in the table. This is not the case. After deleting the duplicate records from INVOICES, we can review the table with the following SQL command:
SELECT CUSTNO, INVCNO, DATSLD, TOTAL, rrn(A) from INVOICES A
You might have expected to see three rows with relative record numbers 1, 2, and 3. If so, you would be surprised to see the following:
CUSTNO |
INVCNO |
DATSLD |
TOTAL |
RRN |
10658 |
5717045 |
20120614 |
118.64 |
1 |
12546 |
5717041 |
20120614 |
222.76 |
2 |
7056 |
5717132 |
20120614 |
398.36 |
4 |
This is because there are deleted records in the DB2 file that retain their relative record numbers, and we had actually deleted records 3, 5, and 6. If we added another record to the file, and the file is defined not to reuse deleted records, then our next record would be relative record number 7. Furthermore, if we were to reorganize the physical file, then the relative record numbers would be sequenced back to 1 – 3.
Finally, it's important to remember that when accessing records through a logical file or view, the rrn() function returns the relative record number of its base table, not the view. Similarly, if using a distributed or partitioned table, then the rrn() function identifies the relative record number of the row in the partition or the distributed table where it is found; in that case, the rrn() function will not be unique.
In summary, this was an interesting function to use with SQL to access some DB2 information that was being kept about our INVOICES table, and it did exactly what we needed it to do in this particular case. A word of caution should be noted: When mixing older file concepts with SQL, make sure you're aware of the consequences.
LATEST COMMENTS
MC Press Online