Format and name derived columns when you use SQL to do ad hoc analysis of information in an SQL table.
I frequently use Interactive SQL or Run SQL Scripts in Navigator to do ad hoc analysis of information in an SQL table, which often results in one or more new columns being derived or created based on existing columns. When I first began doing these types of queries, I was frustrated with the way SQL identified and formatted a derived column in the result set.
Sometimes a numeric (zoned or packed decimal) derived column had a very large number of decimal positions, or a derived column did not have a meaningful name, or the data type was not what I required, and each of these issues could render the derived column useless or make it difficult to work with.
If you know the tricks, these formatting issues for derived columns can be easily resolved using a couple of SQL column and scalar functions. This article will focus on the DEC and ROUND functions used in conjunction with the AS operator to assign a meaningful name to a derived column and change its data format to something more workable.
This article is based on the SQL function available in V5R4 of i5/OS. I'm assuming that you have a basic understanding of relational database concepts and SQL and that you are familiar with basic use of the SELECT statement.
The SQL examples in this article use the following syntax:
1. Anything in all UPPERCASE is SQL required syntax.
2. Anything in all lowercase is SQL parameter data that the end user must supply.
3. Anything enclosed in single quotes ('S')--upper, lower, or mixed case--is a literal used for comparison purposes and is supplied by the end user.
The SQL examples used in this article will reference a table called EMP (Employee Master) to show how to use the DEC and ROUND functions and the AS operator. EMP contains a total of six rows, one for each employee in the company, which are shown in Table 1.
EMP (Employee Master) |
|||||
Nbr |
Name |
Class |
Sex |
Dept |
Salary |
10 |
Ed |
5 |
M |
911 |
7000 |
20 |
Heikki |
2 |
M |
901 |
6000 |
30 |
John |
5 |
M |
977 |
3200 |
40 |
Mike |
4 |
M |
977 |
6500 |
50 |
Marcela |
3 |
F |
911 |
7500 |
60 |
Frank |
2 |
M |
990 |
6500 |
Decimal Positions Gone Wild
When you derive a new column by operating on an existing numeric column with a column function such as AVG (average), * (multiply), or / (divide), the resulting derived column can have a frustratingly large number of decimal positions. I am frequently asked where all those decimal positions come from and how to get rid of them.
The number of decimal positions for a derived column is determined by the function that created it. For example, the number of decimal places resulting from a multiplication is the sum of the decimal places for the multiplicand and the multiplier. If the multiplicand and multiplier do not have any decimal positions, the multiplication result will not have any. If the multiplicand has three decimal positions and the multiplier has two, the resulting derived column will have five decimal positions.
However, the number of decimal positions resulting from a division or the AVG column function will most likely always be large. In both these situations, the total length of a derived numeric column is 31 positions, which includes decimal positions. The number of decimal positions for a numeric derived column can be determined by this formula:
31 minus (total length minus number of decimal positions)
As an example, let's use SQL to derive or create the average salary for all employees in the employee master table called EMP. The salary field is defined as packed decimal with a precision (total length) of 5 and a scale (number of decimal positions) of 0 (zero). Using the above formula, the result set for average salary will have 26 decimal positions (31 - (5 - 0) = 26). The SQL statement to derive the average salary is relatively simple and is coded as follows:
SELECT AVG(salary)
FROM emp
The results of executing this SQL statement using Interactive SQL derives the column called AVG(SALARY) and is shown below.
AVG(SALARY)
6,116.66666666666666666666666666
If you count the number of positions in this number, including the leading zero that is suppressed (don't count the comma and decimal point), you will find there are 31, which is what we would expect it to be. If you count all those sixes after the decimal point you will find that there are 26, which is what the formula (31 - (5-0) = 26) tells us there should be.
Getting the Decimal Positions Under Control
To get the number of decimal positions under control, the data format for average salary has to be changed from a length of 31 with 26 decimal positions to something more reasonable, such as a length of seven with two decimal positions. The scalar function DEC (or DECIMAL) can be used to easily accomplish this.
The DEC function returns the decimal representation of a number with the specified precision (column length or total digits in the number not counting the sign) and scale (number of decimal positions). A decimal value is a packed decimal or zoned decimal number with an implicit decimal point. The precision and the scale of the number determine the position of the decimal point, and the scale cannot be negative or greater than the precision. The DEC function has the following syntax:
DEC(expression, precision, scale)
The SQL statement to derive the average salary using the DEC function to change the data type of the derived column to have a length of seven with two decimal positions is coded as follows:
SELECT DEC(AVG(salary), 7, 2)
FROM emp
The result of executing this SQL statement derives a column or value called DEC that is seven positions long with two decimal positions (remember that the leading zero in the result is suppressed), and is shown below.
DEC
6,116.66
Now you know where all the decimal positions come from and how to change the data format of the derived column to have a workable length and number of decimal positions.
To Truncate or Round?
From a data analysis perspective, in the example we have been discussing, should the derived value be 6,116.66 or 6,116.67? Should the extra decimal positions in the derived column just be truncated or dropped, or should the ROUND scalar function be used to round the rightmost decimal position to derive a more precise number? The ROUND function can be a very useful tool when doing data analysis, and the decision to use it should be made based on the end use for the derived column.
The ROUND function returns a numeric value rounded to some number of places to the right or left of the decimal point. It has the following syntax:
ROUND(expression, decimal-position)
The decimal-position specified in the ROUND function can be a positive or negative number. If it is a positive number, this is the scale or number of decimal positions that will be to the right of the decimal point, with the rightmost decimal position being rounded. If it is a negative number, this is the number of significant digits to the left of the decimal point that will be set to zero, with the next significant digit to the left of the decimal point being rounded.
If rounding is not required, then the SQL statement using the DEC function in the previous example is OK, and the derived column will be truncated (no rounding) to the scale or number of decimal positions specified by the DEC function. If rounding is desired when using the DEC function, the SQL statement would be coded as follows.
SELECT DEC(ROUND(AVG(Salary), 2), 7, 2)
FROM emp
Note that the ROUND function is placed between the DEC and AVG functions and that you need to specify the decimal position for which the rounding will occur. In this case, it is the number 2 immediately after (Salary), which says the rounding will occur for the second decimal position. Also note that the decimal-position value for the ROUND function must be equal to or less than the scale value for the DEC function if rounding is to occur.
This SQL statement will derive a column or value called DEC that is seven positions long (leading zero is suppressed), with two decimal positions where the second decimal position is rounded. Therefore, as a result of the rounding, the derived value becomes 6,116.67 instead of 6,116.66. The result of executing this SQL statement is shown below.
DEC
6,116.67
What's in a Name?
When a derived column is created using Interactive SQL, it is given a name related to the function that created it. In the first example, which used the AVG function, the derived column was given the name (and column heading) of AVG(SALARY). In the next two examples, which employed the DEC and ROUND functions, the derived column was given the name of DEC. While AVG(SALARY) is more meaningful than DEC, neither of these names is very useful, they can be confusing, and they can be difficult to reference in a complex SQL statement. The simple solution is to us the AS operator to provide a name for the derived column when it is created.
The AS Operator
The AS operator is used in SQL to replace an existing name with a new name or to assign a meaningful name to any value derived in an SQL statement. In many cases, the name that is assigned by the AS operator can be referenced later in the SQL statement in which the AS operator is used.
The syntax of the AS operator clause is simple and shown below:
existing-SQL-name AS new-SQL-name
or
SQL-function AS SQL-name
If we want the rounded, average salary derived in our last example to have the name of AVGSAL, the SQL statement to do that would be coded as follows:
SELECT DEC(ROUND(AVG(Salary), 2), 7, 2)
AS avgsal
FROM emp
Note that the AS operator immediately follows the expression or function it is naming, and there is no comma between the expression or function and the AS operator. The result of executing this SQL statement is shown below.
AVGSAL
6,116.67
Summary
SQL is used to do ad hoc analysis of information in an SQL table, and one or more new columns are often derived as a result of that analysis. Depending on the function used, a numeric (zoned or packed decimal) derived column may not have the desired data type, may not have a meaningful or useful name, and can have a very large number of decimal positions. This can make the derived column unusable or difficult to work with.
The DEC and ROUND functions, in conjunction with the AS operator, can be used to resolve these issues. The DEC function is used to change the length of a derived column and remove any excess decimal positions, the ROUND function is used to round a numeric derived value to a specific number of places to the right or left of the decimal point, and the AS operator is used to change or assign a meaningful and useful name to a derived column.
LATEST COMMENTS
MC Press Online