New in 7.1, FieldProc allows column-level encryption without requiring changes to existing applications.
With all the scrutiny and legislation associated with protecting personal sensitive data, a good number of IBM i developers have had to investigate the effort it would take to add column-level encryption logic into their application programs. Anyone who has done this analysis quickly discovers that it's not a trivial amount of work.
Usually column-level encryption requires changing both the length and the data type of any database field that is going to be protected with encryption. These changes are required because encryption algorithms produce a binary string value whose length is also determined by the same algorithm. Even if you have just one sensitive column in your database that needs the protection offered by encryption, you must change every application that references that DB2 table to accommodate the change in the length and type of that column. Then, you need to figure out how to encrypt and decrypt that column when data access is done without an application—for instance, with the Display Physical File Member (DSPPFM) and Start Data File Utility (STRDFU) commands. As a result of this large work effort and complexity, numerous column-level encryption projects have been postponed or canceled.
While the IBM i 7.1 release does not include a complete column-level encryption solution, it does provide an enabling technology known as field procedure (FieldProc). With this new DB2 field procedure support, column-level encryption can be dropped into your databases without almost no impact to your applications. The FieldProc feature allows developers to register an ILE program at the column level that DB2 automatically calls each time that a row (record) is written or read. This user-supplied program can be written to include logic to encrypt the credit card number for a write operation and automatically decrypt the credit card number for read operations, as shown in Figure 1. This is exciting news to those IBM i customers with a large number of RPG and COBOL programs that are looking to encrypt sensitive data at rest without any application changes.
Existing application programs do not need to be changed or recompiled because the DB2 field procedure interface allows the column's original data type, length, and Coded Character Set Identifier (CCSID) to be unchanged. Instead, the user-supplied FieldProc program specifies the data type and length that DB2 needs to use when writing the encoded version of a column value to disk.
Figure 1: By design, FieldProc encoding and decoding is transparent. (Click images to enlarge.)
Figure 1 portrays a FieldProc-based solution where all of the application interfaces process the credit card number value as a 16-byte character string even though the field procedure causes DB2 to store the credit card number as a 20-byte binary string. When a column is associated with a Field Procedure, the column definition in the SQL table (or physical file) really just becomes the application programmer's view of the column attributes and has no impact on how DB2 internally allocates storage for the column.
While encryption and decryption are expected to be the most common usage of field procedures, this new DB2 support can be used for almost any data transformation or encoding. It is pretty common for DB2 for i tables to contain a long memo or a description column of a couple of thousand bytes in length; often, the column contains actual values that are way under the maximum column size. In this situation, the DB2 FieldProc support could be easily employed to deliver automatic compression and decompression of these longer columns to reduce the disk storage requirements for your tables. Again, all application programs would be completely unaware that a 2000-byte column is being stored internally by DB2 as a 200-byte compressed value.
Registering an existing column to use a DB2 FieldProc program is a simple operation that requires using SQL. This ALTER TABLE statement shows how the named column (cardnum) is associated with user-supplied ILE program (pgmlib/cardencpgm).
ALTER TABLE datalib/orders
ALTER COLUMN cardnumber
SET FIELDPROC pgmlib/cardencpgm
If the table contains data, DB2 for i calls the FieldProc program to encode the card number value in each of the existing rows. Thus, each card number value is stored in encrypted form when this ALTER TABLE statement completes, assuming the supplied field procedure program uses encryption for its encode operation. As you might expect, the SQL CREATE TABLE statement also supports the FIELDPROC clause so that FieldProc registration can be included in a new table definition.
While you can use the SQL ALTER TABLE statement to register FieldProc programs for DDS-created physical files, it is safer to first convert the physical file to an SQL table definition. This is due to the fact that if the Change Physical File (CHGPF) command is used to apply a new DDS definition after the FieldProc registration has been completed, CHGPF will remove the FieldProc without any warning since it's a column attribute that can be defined only with SQL. IBM has documented a methodology that allows most physical files to be converted to SQL tables without requiring any application changes or recompiles.
DB2 field procedure programs can be created by any developer, but please be aware that it takes a deep knowledge of encryption algorithms and best practices to implement a secure encryption solution. If your team does not have this level of expertise, then it may be wise to work with one of the encryption software providers (Linoma Software, nuBridges, and Townsend Security) that have updated their software to support the DB2 FieldProc interface.
Creating a DB2 field procedure program is fairly straightforward, but several guidelines need to be followed. The program has to be an ILE program (service programs are not supported), and the program cannot contain any SQL statements. In addition, the FieldProc program has to be capable of running in a thread since DB2 may invoke the program from a separate system thread. A complete list of the considerations and restrictions can be found in the DB2 for i SQL Programmer's Guide.
From a logic perspective, the Field Procedure program must contain application code that supports being called for three different functions: Registration of FieldProc, Encode a Data Value, and Decode a Data Value. The field procedure program determines which function to perform by examining the parameter list that DB2 passes in on every call. The source member, SQLFP, in QSYSINC/QRPGLESRC or QSYSINC/H, describes these parameters. Figure 2 contains the pseudo-code outlining the basic logic flow that a FieldProc must support.
Figure 2: This is pseudo-code for a field procedure program.
The ALTER TABLE or CREATE TABLE statements will perform the first call of the field procedure, requesting that the registration function be carried out. When this one and only invocation of the registration function completes, the program will return the attributes of the encoded value back to DB2. DB2 stores the encoded value attributes in the DB2 table object so that DB2 does not have to call the registration function again to determine the attributes of the encoded value.
The field procedure program is called for the encode function anytime an interface attempts to write a value into a FieldProc column. Here's a listing of some of those interfaces to give you a better idea of the types of events that can cause an Encode operation on a FieldProc column:
- SQL Insert, Update, and Merge statements
- Native record-level Write operations
- "Writing" CL Commands: CPYF, RGZPFM, STRDFU…
- Trigger Processing
- Query Processing
- Creation of SQL index or keyed logical file
Trigger processing is probably one event that you didn't expect to see on the list. Remember that a trigger program gets passed a copy of the before and after record images for the row operation that initiated the trigger call. To produce these record images for the trigger, DB2 for i uses some of its internal processing steps for insert operations. These internal insert steps are the reason behind the FieldProc program calls for encoded values during trigger processing.
Do not overlook the fact that the creation of an SQL index or a keyed logical file also results in the FieldProc program being called to encode values. What this means is that DB2 uses the encoded value of a FieldProc column when computing the internal key values for an index or logical file. As a result, the ordering of keyed values in a logical file is determined by the encoded value instead of the original data value. If an application is using a logical file with a FieldProc column as one of the key fields to performed keyed sequential access, then there's a good chance that the application will process the records in a different order.
Correspondingly, the field procedure program is called for the decode operation anytime an interface attempts to retrieve a value from a FieldProc column. Here's a listing of some of the retrieval interfaces that can cause a FieldProc decode function call:
- SQL Select and Fetch statements
- Native record-level Read operations
- "Reading" CL commands: CPYF, RGZPFM, DSPPFM, DSPJRN…
- Trigger processing
- Query processing
Query processing is an operation that shows up on the list for both FieldProc decode and encode function calls. When the query optimizer processes a predicate that references a FieldProc column such as empID = '1122', it has two ways of implementing this comparison. The default DB2 behavior for equal comparisons is to call the associated field procedure program and request that an encoded value be created for the search string ('1122'). This encoded value approach offers the best performance because DB2 can compare the encoded version of the search string directly with the encoded values stored in the field procedure column. The other method used by query optimizer for inequality comparisons such as empID >= '1122' by default is to decode all of the values in the FieldProc column before making the comparison with the search string. As you might expect, this method can perform slowly when the query processes a large number of rows. The query optimizer's behavior for FieldProc comparisons is controlled by the FIELDPROC_ENCODED_COMPARISON QAQQINI option. By default, DB2 for i only utilizes the encode methodology for equal and not-equal (<>) predicates as well as for Group By and Distinct processing. Reference the DB2 for i Performance and Query Optimization Guide for more details on this QAQQINI option.
One of the most common questions that developers ask about the IBM i 7.1 field procedure support is whether or not it's possible for the decode function to conditionally return a masked version of the original data. "No" is the current answer to this question. However, the good news is that IBM is working on a set of 7.1 PTFs that will deliver this capability. Check back in a few months for a follow-on article on the new support.
The possibility of data loss is the reason that IBM strongly recommends that the field procedure decode function always return the original value. One of the scenarios where data loss could occur when the decode logic conditionally returned a masked value is with applications using the native record-level access interface to perform updates. Consider a customer service application that is used by call center agents to change account data for clients. Also, assume that the credit card number column is associated with a FieldProc program that returns a masked value for all users except QSECOFR. A customer service agent is called to correct a misspelling in a cardholder's address. When the RPG program retrieves the cardholder's information to display on the screen, the data buffer will contain a masked version of the card number since the agent is not authorized to see the original value. After the agent makes the address correction, the RPG program will write this data buffer containing the masked credit card into the database to complete the update. On this write operation, the field procedure will now be passed a masked version of the card number to encrypt; this is where the original card number is lost.
Now that all of FieldProc details have been covered, let's review the sample field procedure program below.
D FuncCode S 2B 0
D p_FuncCode S *
D OptParms DS LikeDs(SQLFOPVD)
D*
D EnCodTyp DS LikeDs(SQLFPD)
D*
D DeCodTyp DS LikeDs(SQLFPD)
D*
D EnCodDta S 512
D DeCodDta S 512
D*
D SqlState S 5
D SqMsgTxt DS LikeDs(SQLFMT)
D*
D i S 10I 0
D En_ary S 1 DIM(512) Based(En_ary_p)
D De_ary S 1 DIM(512) Based(De_ary_p)
D e S 10I 0
D d S 10I 0
D
D/COPY QSYSINC/QRPGLESRC,SQLFP
C *Entry Plist
C Parm FuncCode
C Parm OptParms
C Parm DeCodTyp
C Parm DeCodDta
C Parm EnCodTyp
C Parm EnCodDta
C Parm SqlState
C Parm SqMsgTxt
/Free
SqlState = '00000' ;
If FuncCode = 8 ; // Return attributes about the Encoded value
// Verify this FieldProc program only being used for
// fixed-length character column
If DeCodTyp.SQLFST <> 452 and DeCodTyp.SQLFST <> 453 ;
// Return error for unsupported data type
SqlState = '38001' ;
Else ;
// The Encoded value has almost all of the same attributes
// as the decoded value
// Start by making the encoded attributes identical
EnCodTyp = DeCodTyp ;
// Encoded value length is twice that of the decoded value
EnCodTyp.SQLFL = DeCodTyp.SQLFL * 2;
EnCodTyp.SQLFBL = DeCodTyp.SQLFBL * 2;
EndIf;
ElseIf FuncCode = 0 ;
// Being Called to Encode value
// set basing pointers for data arrays
En_Ary_p = %Addr(EnCodDta);
De_Ary_p = %Addr(DeCodDta);
// set array indexes and counter
e = 1;
i = 1;
// Encode data by reversing order of card numbers and
// adding increasing number between each card number
For d = DeCodTyp.SQLFL downto 1;
En_Ary(e) = De_ary(d);
e += 1;
En_Ary(e) = %Char(i);
e += 1;
i += 1;
ENDFOR;
//
ElseIf FuncCode = 4 ;
// Being Called to Decode value
// set basing pointers for data arrays
En_Ary_p = %Addr(EnCodDta);
De_Ary_p = %Addr(DeCodDta);
// set array indexes and counter
d = 1 ;
For e = EnCodTyp.SQLFL-1 By 2 DownTo 1;
De_Ary(d) = En_ary(e);
d += 1;
ENDFOR;
Else ; // Invalid function code
SqlState = '38003' ;
EndIf ;
Return ;
/End-Free
As you might expect from the pseudo-code in Figure 2, the first action performed in the program is to check the value of the function code. The function code determines which operation the field procedure program should execute. The first IF statement grabs control when the program is called for registration of the field procedure. Before returning the attributes of the encoded value to DB2, this field procedure program verifies that the column registered has a fixed-length character data type. The SQL data type constant values are defined in the SQL member of the QSYSINC/H source file. Any type of data type can be encoded; this field procedure program contains logic for encoding and decoding only fixed-length character values. If this logic detects an unsupported data type, the returned SQLSTATE's class code value of 38 will cause the FieldProc registration request to fail. The registration logic continues by making the encoded value attributes identical to the original attributes of the column. The decoded data attributes are derived from the column definition. The encoding algorithm used in this example does double the size of the original data value, so that is why the code sets the length attributes of the encoded value to twice the length of the decoded data. Thus, in this example, a card number that is 8 characters in length will have an encoded value that is 16 bytes in length.
The second IF statement handles all of the requests to create the encoded value, which DB2 will store on disk. The encoding algorithm in this example encodes the original data value by reversing the digits of the card number and adding an extra increasing digit value between each card number. For instance, a card number value of '11223344' will have an encoded value of '4142333425261718'. Obviously, this encoding scheme is not an encryption algorithm that will be approved by any security auditor, but this allows you to see how the encoded value can be a different length and value than the original column value. The RPG code uses array processing to reverse the order of the digits and the %CHAR built-in function to insert the extra number between each of the reversed card number values.
The final IF statement is responsible for converting the encoded value into a decoded value that can be consumed by all of the applications and user interfaces. The decode logic converts the encoded version of the card number back to the original card number. The For loop takes cares of the decoding by using arrays to reverse the card numbers into their original order and skipping over the extra digit values that were added into the encoded value.
From this example program, it's clear that Field Procedures gives developers complete flexibility when it comes to creating an encoding solution to meet business requirements. Hopefully, you also now have a clear understanding of the big-picture benefit of the 7.1 FieldProc delivery, which is transparent deployment of column-level encryptions without having to change any of your existing applications. A capability that is bound to make both security auditors and your boss happy.
To find out more about the new FieldProc support, see "DB2 Field Procedures Finally Support Conditional Masking."
as/400, os/400, iseries, system i, i5/os, ibm i, power systems, 6.1, 7.1, V7, V6R1
LATEST COMMENTS
MC Press Online