In recent years,
the perception of the database's importance in the application architecture has
been slowly eroding. Often, the database is reduced to an ambiguous notion of a
persistent data repository. Yet it is really hard to imagine an enterprise-class
solution without the services of a robust and scalable database engine.
In this article, I tackle some of the more esoteric issues often
encountered when using DB2 for i5/OS in a multi-tier environment. I examine how
various cursor attributes and SQL statement clauses impact object-level and
record-level locking. I discuss how the DB2 engine detects and resolves
deadlocks and how to avoid deadlocks altogether. You'll also learn what happens,
in terms of locking and concurrency, when the data is accessed by software
components that use different isolation levels. This discussion is backed by
Java code samples. However, most of the topics covered in this article are
applicable to other programming languages and SQL interfaces.
First Things First
Java jobs and threads use database objects—such
as tables, views, and indexes—to perform transactional work. DB2 locks
these objects to ensure data integrity and to foster the highest possible level
of concurrent access. Thanks to locking, multiple jobs or threads can read the
object and perform non-conflicting updates at the same time. The object-level
and record-level locks are held for the duration of a database transaction. A
transaction allows you to group multiple SQL statements together. You can either
commit a transaction, which guarantees that the database changes are completed,
or you can roll the transaction back, in which case the changes are backed out.
In DB2 literature, a transaction is often referred to as a "unit of work." The
current transaction context is called a "commitment definition." The transaction
processing, locking, and concurrent access are managed by DB2 commitment
control, which is legendary for its mature code base and scalability. In fact,
it can handle up to 500 million distinct row changes in a single transaction!
The object-level locks are obtained based on the intended use (read or
update) and sharing capability (single or multiple job/thread access). From the
i5/OS point of view, SQL tables, indexes, and views are regular operating
system objects: physical and logical files. The files and members are always
locked with *SHRRD (shared read). The data space, which is a structure that
holds the data, can be locked with one of five lock states. Those lock states
and the compatibility between them are shown in the table below.
Object-Level Lock States and
Compatibility
|
||
Object-Level Lock
|
Object-Level Lock States
|
Incompatible Locks (locks that cannot be obtained by other
processes)
|
EXCL
|
Lock exclusive, no read allowed
|
EXCL, EXCLRD, SHRUPD, SHRNUP, SHRRD
|
EXCLRD
|
Lock exclusive, read allowed
|
EXCL, EXCLRD, SHRUPD, SHRNUP
|
SHRUPD
|
Lock shared for update
|
EXCL, EXCLRD, SHRNUP
|
SHRNUP
|
Lock shared, no update
|
EXCL, EXCLRD, SHRUPD
|
SHRRD
|
Lock shared for read
|
EXCL
|
Here's an example of how to interpret the info provided in the
table: The EXCL lock state reserves the object for the exclusive use of the
requesting job/thread. All five lock states are incompatible with EXCL, which
means that no other job/thread is able to obtain any of the five lock states as
long as the given EXCL is held. So EXCL guarantees a single user access until
the work is completed.
The record-level locks are obtained to disallow
conflicting updates and prevent reading of uncommitted changes. DB2 for i5/OS
supports just two record-level lock types: READ and UPDATE. The following table
shows the semantics and the compatibility of the record-level locks.
Record-Level Lock States and
Compatibility
|
||
Record-Level Lock
|
Record-Level Lock States
|
Incompatible Locks (locks that cannot be obtained by other
processes)
|
READ
|
The record is locked for read. Another job may read the same record but
cannot lock the record for update intent. The record cannot be changed by
another job as long as one job holds a read lock on the record.
|
UPDATE
|
UPDATE
|
The record is locked for update intent. Another job may read the record but
may not obtain a read or update lock on it until the lock is released.
|
READ
|
Note: There is another record-level lock called INTERNAL. It is a
short-lived read lock obtained by i5/OS to access the record (row). An
application program has no ability to control this lock, so I will not further
discuss it. Check the help text for the Display Record Lock (DSPRCDLCK) command
for more info.
In DB2, a combination of object- and record-level locks is
used to enforce various isolation levels. In addition to the four ANSI standard
isolation levels, DB2 for i5/OS supports a non-standard isolation level of NONE.
The ANSI SQL isolation level naming differs from the naming adopted by the DB2
family. DB2 for i5/OS, in turn, uses the traditional operating system naming.
This is often a source of confusion and misunderstanding. Therefore, I've
illustrated the ANSI standard names and their DB2 and i5/OS equivalents in the
following table.
Isolation-Level Naming
|
||
ANSI SQL Isolation Level
|
DB2 Isolation Level
|
I5/OS Isolation Level (Commit Level)
|
|
NO COMMIT (NC)
|
*NC
|
READ UNCOMMITTED
|
UNCOMMITTED READ (UR)
|
*CHG
|
READ COMMITTED
|
CURSOR STABILITY (CS)
|
*CS
|
REPEATABLE READ
|
READ STABILITY (RS)
|
*ALL
|
SERIALIZABLE
|
REPEATABLE READ (RR)
|
*RR
|
Keep this table handy when analyzing locking and concurrency issues in
DB2 for i5/OS. The JDBC interfaces use ANSI naming for isolation levels, while
i5/OS utilities such as the Work with Commitment Definition (WRKCMTDFN) command
use the i5/OS naming. The semantics of the isolation levels is well-described in
an article
in the InfoCenter.
When reviewing the isolation-level definitions,
please keep in mind that the ANSI isolation-levels specification does not deal
with locking. Rather, the ANSI spec defines three phenomena: Dirty Read,
Non-Repeatable Read, and Phantom. Then, the isolation levels are defined by the
phenomena that are forbidden to occur. The spec doesn't define how a database
manager is to enforce the restrictions defined for a given isolation level.
Isolation Levels and Locking
DB2 and most other database managers choose to implement the isolation levels by locking. The locks obtained to enforce isolation levels are governed by commitment control. Commitment control provides additional functionality that does not impact or alter the behavior required by the isolation-levels definition. An example of such a value-add functionality is support for updateable cursors and for the SQL FOR UPDATE clause. Note that neither of those features is a part of the ANSI isolation-level spec. The DB2 locking guarantees the ANSI standard compliance. The type of locks acquired to provide this additional functionality is irrelevant from the ANSI spec point of view as long as the disallowed phenomena do not occur.
Basic SQL Operations
Armed with the basic understanding of DB2 locking and concurrency control, let's now analyze several code samples that illustrate the behavior of the i5/OS database. (The sample Java code discussed in this article can be downloaded from the MC Press Web site.) I use two simple Java programs that test various aspects of locking and concurrency when executed at the same time over two separate database connections:
- TestLockingWithIsolationLevels.java—This program creates a table called COFFEES, sets an isolation level, and then performs the four fundamental SQL operations: DELETE, INSERT, UPDATE, and SELECT. The run-time attributes, such as current isolation level, database connection keywords, and so on, are controlled by the properties contained in the testlocking.properties file.
- TestLockingSelectOnly.java—This program sets an isolation level and retrieves a set of rows from the COFFEES table. The program attempts to access the same set of rows that have been modified by TestLockingWithIsolationLevels. The run-time attributes of the program are controlled by the properties contained in the testlockingselectonly.properties file.
The
testing methodology is quite straightforward: First, I modify the
testlocking.properties file to set the isolation level, cursor attributes, etc.
Then, I execute the TestLockingWithIsolationLevels program. I suspend the
execution of the program after, for example, the UPDATE statement but
before the corresponding COMMIT so that all locks obtained in a current
transaction are still held. The following code snippet illustrates these
actions:
+ " SET SUP_ID = sup_id + 100 ");
pu.executeUpdate();
Thread.sleep(60000); [1]
if (!isAutoCommit)
con.commit(); [2]
So, at [1], the thread (program) execution is suspended for 60 seconds.
All locks obtained on behalf of the UPDATE statement are held until the commit
is executed at [2]. Once TestLockingWithIsolationLevels reaches the sleep method
at [1], I execute the TestLockingSelectOnly, which attempts to select the set of
rows that was just updated by the other program. Depending on the isolation
level set for the TestLockingSelectOnly program, the SELECT may require locks
that are incompatible with the locks obtained for the UPDATE. A robust
multi-user application should monitor and handle situations in which these
conflicting locks cannot be obtained within the wait-for-record timeout.
Here's a code example that illustrates these concepts:
outaHere:
while ( i <= numOfAttempts) { {
try {
s = con.
prepareStatement("SELECT t.* FROM " + tableName + " t " ,
cursorType,cursorConcur); [1]
ResultSet rs = s.executeQuery();
int j = 0;
while (rs.next()) {
++j;
}
rs.close();
if (!isAutoCommit)
con.commit();
s.close();
} catch (SQLException sqlexception) { [2]
if(sqlexception.getErrorCode() == -913 && i < numOfAttempts{ [3]
i++;
System.out.println("SQLState: " + sqlexception.getSQLState());
System.out.println("Message: " + sqlexception.getMessage());
System.out.println("Vendor code: " +
sqlexception.getErrorCode());
System.out.println("Trying to recover. Attempt #: " + i);
con.rollback();
Thread.sleep(random); [4]
} else
{throw sqlexception;} [5]
}
}
At [1], the SELECT statement is prepared. It is assumed that the cursor
attributes are set to the default values of ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY. (I cover other cursor attributes later in the
section about ResultSet attributes and additional SQL clauses.) At [2], SQL
exceptions are caught. At [3], I test whether the exception was caused by the
lock timeout. If so, at [4], the program waits for a random amount of
milliseconds before control is returned to the beginning of the FOR loop. The
program then tries to perform the SELECT operation again. In a
typical—even a very busy—environment, the number of attempts should
not exceed five. The random wait set at [4] should be in the range of one to six
seconds. This assumes that the record wait time for the COFFEES table has been
set to five seconds. The rationale behind this setting is that even on a busy
system, a business transaction should run to completion within five seconds. You
may need to adjust this setting for your particular environment. I used the
following i5/OS CL command to set the record wait time:
Finally, at [5], an SQLException is thrown under two conditions: The
initial exception caught at [2] has had an error code other than -913 or the
program has already reached the maximum number of attempts to re-run the
SELECT.
You can check the object-level lock state using the following
Work with Object Lock (WRKOBJLCK) CL command:
Press F6 on the Work with Object Lock screen to see the member-level and
data-space-level lock states. Here's the lock info for the COFFEES table when
both test programs discussed in this section are running:
Figure 1: These are the object-level lock
states for the COFFEES table.
In Figure 1, one job obtained the *SHRUPD
data-space lock, while the other job acquired the *SHRRD data-space lock.
According to the object-level locks table, these locks are compatible, so there
is no conflict on the object level.
The record-level locks can be
examined with the Display Record Locks (DSPRCDLCK) CL command:
Figure 2 shows the record-level lock info for the situation in which the
first program updated the records in the COFFEES table but did not commit the
changes yet. At the same time, the other program attempts to obtain a
short-lived READ lock that is necessary to implement the READ COMMITTED
isolation level.
Figure 2: We have conflicting record-level
locks.
As highlighted in Figure 2, one job holds the UPDATE locks, and
the other job tries to obtain a READ lock on the first row. According to the
table that shows record-level locks, these locks are incompatible, so the second
job lock request is put in a lock wait state. If the UPDATE lock does not get
released within the specified timeout (five seconds in this case), the second
job receives the SQLException with an error code of -913.
I used the
methodology outlined in this section to compile a table shown below that
illustrates the expected locking behavior for the four fundamental SQL
operations and the two most frequently used in Java applications isolation
levels: Read Committed and Repeatable Read. Read Committed is the default
isolation level used by DB2 for Linux, UNIX, Windows (DB2 LUW), and Oracle while
Repeatable Read is the default isolation level used by WebSphere.
Note:
On the DB2 for i5/OS, we recommend the Read Uncommitted isolation level. This
ensures a high level of concurrency and a low level of locking contention.
Remember, however, that the Dirty Reads are allowed with an Uncommitted
Read. The default setting for the DB2 for i5/OS JDBC driver is Uncommitted
Read.
Object and Record-Level Locks for RC and RR Isolation
Levels
|
|||||
Isolation Level
|
SQL Operation
|
Statement Clause
|
Member Lock
|
Data-Space Lock
|
Record Lock
|
READ_COMMITTED
|
DELETE (fast delete)
|
Without WHERE clause
|
*EXCL
|
|
|
READ_COMMITTED
|
DELETE
|
With WHERE clause (searched DELETE)
|
*SHRRD
|
*SHRUPD
|
UPDATE deleted rows
|
READ_COMMITTED
|
INSERT
|
|
*SHRRD
|
*SHRUPD
|
UPDATE inserted rows
|
READ_COMMITTED
|
SELECT
|
|
*SHRRD
|
*SHRRD
|
READ (short-lived at the time cursor is opened)
|
READ_COMMITTED
|
UPDATE
|
|
*SHRRD
|
*SHRUPD
|
UPDATE updated rows
|
REPEATABLE_READ
|
DELETE (fast delete)
|
Without WHERE clause
|
*EXCL
|
|
|
REPEATABLE_READ
|
DELETE
|
With WHERE clause
|
*SHRRD
|
* SHRRD
|
UPDATE deleted rows
|
REPEATABLE_READ
|
INSERT
|
|
*SHRRD
|
*SHRUPD
|
UPDATE inserted rows
|
REPEATABLE_READ
|
UPDATE
|
|
*SHRRD
|
*SHRUPD
|
UPDATE updated rows
|
REPEATABLE_READ
|
SELECT
|
|
*SHRRD
|
*SHRRD
|
READ on open rows cached by the ResultSet implementation; eventually all
rows read by the cursors.
|
ResultSet Attributes and Additional SQL Clauses
As mentioned, DB2 for
i5/OS supports all JDBC spec ResultSet attributes, such as scrollable and
updatable cursors. In addition, the semantics of a SELECT statement can be
modified by adding SQL clauses such as FOR UPDATE or WITH CS KEEP LOCKS. DB2
implements this value-add functionality by modifying its locking behavior. The
modified locking, however, never compromises the ANSI isolation-level
requirements. In other words, the modified locking always guarantees that the
current isolation-level restrictions are satisfied. My experience shows that
this topic may cause a bit of confusion—even among seasoned DB2 and Java
developers—so I compiled the following table that illustrates object- and
record-level locking for various ResultSet attributes and additional SQL
clauses.
Object and Record-Level Locking for Different ResultSet
Attributes
|
||||||
Line
|
Isolation Level
|
Cursor Attributes
|
Statement Clause
|
Member Lock
|
Data-Space Lock
|
Record Lock
|
1
|
READ-COMMITTED
|
READ_ONLY
FORWARD_ONLY Blocked Fetch |
|
*SHRRD
|
*SHRRD
|
READ short-lived, at the time the SQ fetches the row to
build a result set
|
2
|
READ-COMMITTED
|
READ_ONLY
FORWARD_ONLY 1 Row Fetch |
FOR UPDATE
|
*SHRRD
|
*SHRUPD
|
UPDATE the row on which the cursor is currently
positioned
|
3
|
READ-COMMITTED
|
UPDATEABLE
FORWARD_ONLY 1 Row Fetch |
|
*SHRRD
|
*SHRUPD
|
UPDATE the row on which the cursor is currently
positioned
|
4
|
READ-COMMITTED
|
READ_ONLY
FORWARD_ONLY Blocked Fetch |
WITH CS KEEP LOCKS
|
*SHRRD
|
*SHRRD
|
READ all rows read by the cursor (blocks of rows at a
time)
|
5
|
REPEATABLE_READ
|
READ_ONLY
FORWARD_ONLY Blocked Fetch |
|
*SHRRD
|
*SHRRD
|
READ all rows read by the cursor (blocks of rows at a
time)
|
6
|
REPEATABLE_READ
|
READ_ONLY
FORWARD_ONLY 1 Row Fetch |
FOR UPDATE
|
*SHRRD
|
*SHRUPD
|
UPDATE all rows accessed by the cursor
|
7
|
REPEATABLE_READ
|
UPDATEABLE FORWARD_ONLY
1 Row Fetch |
FOR READ ONLY
|
*SHRRD
|
*SHRRD
|
READ all rows read by the cursor; a batch of locks is
obtained
|
8
|
REPEATABLE_READ
|
UPDATEABLE FORWARD_ONLY
1 Row Fetch |
|
*SHRRD
|
*SHRUPD
|
UPDATE all rows accessed by the cursor; locks obtained one
row at a time
|
9
|
Any
|
READ_ONLY
FORWARD_ONLY Blocked Fetch |
WITH RS USE AND KEEP EXCLUSIVE LOCKS
|
*SHRRD
|
*SHRUPD
|
UPDATE all rows ready by the cursor; a batch of locks is
obtained.
|
In this table, the additional FOR UPDATE clause in line 2 is
equivalent to the ResultSet.CONCUR_UPDATABLE cursor attribute in line 3. Using
the cursor attribute is probably a cleaner approach, because it does not involve
changes in the statement text, thus making it more portable. Note how the WITH
RS USE AND KEEP EXCLUSIVE LOCKS clause can be used to switch the isolation level
for a given statement to REPEATABLE_READ and additionally obtain exclusive
(UPDATE) locks for all the rows accessed by the cursor. Use this clause with
caution because it has an adverse impact on the concurrent access to
data.
To recap: The isolation levels define what phenomena are
disallowed. The database manager may choose the method of enforcing the
restrictions. The DB2 for i5/OS locking guarantees the ANSI standard compliance.
The type of locks acquired to provide additional functionality is irrelevant
from the ANSI spec point of view as long as the disallowed phenomena do not
occur.
Deadlock Detection, Resolution, and Avoidance
Simply said, deadlock can be described as the
circular waiting for resources. For example, Job A owns Resource 1 and is
waiting for Resource 2 while Job B owns Resource 2 and is waiting for Resource
1. The database deadlock detection and resolution can be implemented in many
ways. With respect to the deadlock detection, DB2 for i5/OS uses a proactive
approach. The deadlocks are avoided, because two timeout values are associated
with database objects: the maximum file wait time and the maximum record wait
time. In case of conflicting lock requests, one or more of the jobs involved
will fail after a timeout. As mentioned, you can change these timeout values
using the CHGPF or CHGLF commands. For example:
Note that these timeout values are object-specific. In other
words, there is no system value that you could use to change the timeout
behavior for all objects on a system.
DB2 for i5/OS also implements so-called
"simple deadlock detection," which covers the cases where only two jobs have
mutually exclusive locking requests. In this case,
if the lock is not granted
after 10 seconds, DB2 tries to figure out if there is another job that holds a
required lock and that, in turn, waits for a lock that a current job is holding.
If so, one of the jobs will be "booted" out. Again, this works only for two-way
conflicts; all other cases are resolved by the timeout values. There is no
parameter that you could use to override the simple deadlock detection
behavior.
The best strategy to cope with deadlocks is to avoid them
altogether. Several application development strategies achieve this goal. I
favor the method in which all processes request database resources in a
particular order. This effectively prevents any cyclic use of resources. Let's
illustrate these concepts with an example.
Handling a DB2 Deadlock
I use the three Java programs to create a deadlock
scenario. The tables accessed in the scenario are contained in the DB2 sample
schema. The sample schema can be created with an SQL stored procedure call shown
below:
Here's a short description of the Java program's
functionality:
DeadlockExampleConnection1.javaThis class updates a
row in a table called DEPARTAMENT and then reads all rows from the EMPLOYEE
table for that department (A00). The program pauses between the UPDATE and the
SELECT to allow the other program to catch up. The SQL statements executed by
this program are shown below:
WHERE deptno = 'A00'
SELECT empno, firstnme, lastname, workdept FROM employee
WHERE workdept = 'A00'
DeadlockExampleConnection2.javaThis class updates a row in the
EMPLOYEE table. The updated employee belongs to the department A00. So this row
is contained in the result set to be returned by the SELECT executed in the
DeadlockExampleConnection1 class. Then the program attempts to select a row from
the DEPARTMENT table that contains the department info for the updated row. This
happens to be the same row as the row updated by the first class. The SQL
statements executed by this program are shown below:
SELECT deptname FROM department WHERE deptno =
(SELECT workdept FROM employee WHERE empno = '000110')
ExecuteDeadlockExample.javaThis class contains the main
method that creates and starts the threads for the other two classes.
Both the DeadlockExampleConnection1 and DeadlockExampleConnection2
classes open with the isolation level set to READ COMMITTED. As previously
illustrated, a SELECT statement for this isolation level requires a short-lived
READ record-level lock. This lock is incompatible with the UPDATE lock obtained
by the other class for the UPDATE statement; thus, the SELECT request waits for
the UPDATE lock to be released.
In addition to the DSPRCDLCK command
discussed in the previous section, you can use the Lock Conflict Report to
monitor for excessive locking or deadlocks. For example, I used the following
steps to pin down the locking issues caused by the sample Java classes discussed
in this section:
- Start the performance trace from a CL prompt: STRPFRTRC
- Re-create the problem (run ExecuteDeadlockExample).
- End the performance trace (by default, the performance data is stored in the QPFRDATA library; you need to choose a member name, for example DLCKTST): ENDPFRTRC MBR(DLCKTST)
- Print lock conflict report: PRTLCKRPT MBR(DLCKTST) RPTTYPE(*TOD)
The Lock
Conflict Report shown in Figure 3 can be used to analyze the deadlock caused by
the cyclic use of the resources.
Figure 3: This example lock conflict report
shows a deadlock scenario.
In Figure 3, the highlighted report section
shows that job 214774 was waiting 5.2 seconds for row 1 in the table DEPARTMENT.
The lock was held by the job 214773. Conversely, job 214773 waited 3.7 seconds
for row 9 in the EMPLOYEE table. This row was locked at that time by job 214774,
hence the deadlock. DB2 resolved the deadlock by returning SQL code -913 to the
job 214774. The Java class running over the JDBC connection serviced by job
214774 reacted properly by rolling back, thus releasing all locks held. At that
point, job 214773 could finish the transaction, because row 9 in the EMPLOYEE
table became available. After a random amount of time, job 215774 repeated its
requests. This time, it succeeded, because job 214773 was done and no lock
conflicts were encountered.
As stated, you can avoid the deadlocks by
acquiring the resources in a consistent order. The conflict situation described
in this section can be easily eliminated by modifying the order of the SQL
statements in one of the classes. For example, you could change the
DeadlockExampleConnection2 class so that the SELECT statement is executed before
the UPDATE. That way, DeadlockExampleConnection2—similar to
DeadlockExampleConnection1—accesses the DEPARTMENT table first and then
the EMPLOYEE table. This eliminates the possibility of cyclic use of resources.
In fact, the Lock Conflict Report for the modified application proves that no
deadlocks were detected and there was no need to restart one of the
transactions. This is a desired behavior because less system resources are used
and the application response time also improves.
Avoiding Conflict
I hope that this article will help you develop more robust, lock-conflict-resilient Java applications that use DB2 for i5/OS as the back-end database server. Here are some things to keep in mind when designing and implementing such applications:
- DB2 for i5/OS implements ANSI isolation-level specs by locking. The higher the isolation level, the more restrictive are the locks obtained by the transaction manager.
- Additional locks may be obtained by the database to implement add-on functionality such as updateable cursors.
- In a complex application, lock conflicts will occur. A well-written application handles the exceptions caused by lock conflicts by restarting the failing transaction.
- Deadlocks are eliminated by avoiding cyclic use of resources.
To learn more about the DB2 for i5/OS topics covered
in this article, see "Using JDBC
in an iSeries Linux Partition."
Jarek
Miszczyk is the Senior Software Engineer, ISV Solutions Enablement, IBM
Rochester. He can be reached by email at
LATEST COMMENTS
MC Press Online