In today's high-tech IT world, it's common for shops to share
data on multiple platforms. For those trusted with the task of writing the
interfaces to share the data, there are many annoyances. For example, who hasn't
sent data from one platform to an export file, updated flags in the database to
signal the data has been sent, and then discovered that the export file never
made it to its final destination on the remote system? Worse, even when the data
does make it to the remote system, it's common to encounter an error resulting
in partial data updates, which often makes restarting the entire process a
mess.
Fortunately, for those sharing data between Microsoft's SQL Server
and the iSeries, there's a splendid aid at your disposal: distributed
transactions (DTs). DTs are functionally similar to local database transactions
in that they have a beginning boundary, data modification statements, and an
ending boundary whereupon the data changes that occurred are either committed or
rolled back. However, distributed transactions extend the concept further by
allowing data modification statements to occur against databases on multiple
platforms.
Think of how moving data between disparate systems would be
simplified with distributed transactions:
1. A transaction boundary is
created.
2. Data is moved from the source platform to the destination
platform.
3. The source platform marks its data as sent.
4. If
everything is successful, all of the changes are committed on both systems
5.
If there is a failure, all of the changes are rolled back. When the error
condition is fixed, the process can easily resume.
Since transactions
involve the "all or nothing" concept, the programmer is assured the data is
successfully changed on both platforms or on neither. Never again need we fuss
over where to pick back up in the multiplatform processing cycle or reset flags
to send data again!
The SQL
Server documentation gives good a good introduction to DTs and explains how
they work. This article covers the basics of performing a DT using SQL Server's
Transact SQL (T-SQL).
Setup
To set up a SQL Server/iSeries environment capable of performing a DT, you need to configure the following items:
- iSeries files to participate in a DT must be journaled.
- The Client Access V5R1 ODBC driver must be installed on the SQL Server machine.
- A linked server to the iSeries must be configured.
- The SQL Server Distributed Transaction Coordinator (DTC) must be started.
The first requirement is that the iSeries physical
files to be modified must be journaled. Tables created in a schema (library)
created by the CREATE SCHEMA statement are automatically journaled. To verify if
a physical file is journaled, use the Display File Description (DSPFD) command.
If it is not journaled, use the Start Journal Physical File (STRJRNPF) command
to start journaling. If you need help with iSeries journaling concepts
(journals, receivers, etc.) see chapters 19 and 20 of the Backup
and Recovery Guide.
The second requirement involves installing the
Client Access ODBC Driver (V5R1 or higher with the latest service pack) on the
SQL Server machine. OS/400 has to be at V5R1 or higher as well. (Starting with
V5R2, Client Access has been renamed to iSeries Access, but I will refer to it
as Client Access here.) In case you're wondering, the Client Access OLE DB
provider IBMDA400 does not currently support distributed transaction processing
and therefore cannot be used.
Once the Client Access ODBC Driver is
installed, configure an ODBC data source to the iSeries under the System Data
Source Name (DSN) tab. For this article, I named my DSN "ISERIES" and used the
default options.
The third requirement is to configure a linked server
(requires SQL Server 7.0 and above--SQL Server 2000 is used here). A linked
server definition allows SQL Server to access tables from a remote database as
though they were part of its own local database.
To configure the linked
server, start the SQL Server Enterprise Manager. Navigate the tree hierarchy and
select the server you want to work with. Expand the server and then expand the
"Security" node. Right-click on "Linked Servers" and choose "New Linked Server."
In the linked server name box, enter ISERIES again, for consistency with the
ODBC DSN. This linked server name will be used to refer to iSeries tables when
working with T-SQL.
Under server type, choose "Other Data Source" and
select the "Microsoft OLE DB Provider for ODBC Drivers" in the provider name
combo box. Under "Product Name" enter "DB2 for iSeries." In the "Data Source"
box, enter a valid iSeries DSN (if following along with this example, enter
ISERIES.) In the "Provider String" box, you may optionally enter any DSN
overrides. For example, to make the iSeries library TESTDATA the default
library, enter DBQ=TESTDATA, where DBQ is the Client Access ODBC Driver's
keyword to override the library list.
Next, you need to establish the
security credentials for the linked server. Click on the Security tab of the
"Linked Server Properties" window. In this window, SQL Server gives the option
to define a login cross-reference to link the credentials of a specific SQL
Server user to a specific iSeries user, but for simplicity, I will not use this
feature in this example. In the bottom half of the window, there are options for
login definitions not specified in the cross-reference list. Choose the "Be made
using this security context" option (SQL Server 7.0's option is "They will be
mapped to") and enter a valid iSeries user name and password in the boxes below.
Whenever SQL Server attempts to talk to the iSeries linked server, it will use
the login information specified here. The linked server has now been configured.
Click the OK button.
The last step involves starting the Microsoft SQL
Server DTC service. The DTC, which can be started from the SQL Server Service
Manager utility, is responsible for handling DT processing across multiple
database servers.
Accessing Data on a Linked Server
To verify that the linked server is set up correctly,
run a distributed query (DQ). A DQ is a T-SQL query that accesses data on a
linked server. One way to run a DQ is to specify a four-part table name in the
FROM clause of a SELECT. Specifically, for an iSeries-linked server, the
four-part table name is specified as follows:
FROM linked server.RDB name.schema name.table name
For example, if your linked server name is called "ISERIES," your
iSeries' relational database (RDB) name is S1024000 (it's usually the same as
your system name), your schema (library) is LIVEDATA, and your table is ORDERS,
you would enter the following to retrieve the table's data:
FROM ISERIES.S1024000.LIVEDATA.ORDERS
This will allow SQL Server to query the ORDERS table on your iSeries as
though it were local to SQL Server. Start the Query Analyzer utility, and try
it! In fact, using the four-part syntax shown above, you can place an iSeries
table in the FROM, JOIN, subquery, or nested select portion of a SELECT
statement. The better news is that linked server tables can also participate in
UPDATE and DELETE statements (provided the linked server's ODBC or OLE DB
drivers are capable, which is the case with the Client Access ODBC
driver.)
Another way to run a DQ is to use the OPENQUERY function.
OPENQUERY submits a passthrough query to the backend database engine for
processing and returns the results as though it were a SQL Server table.
OPENQUERY requires two parameters: a linked server name and an SQL statement.
The following is an example of how to use OPENQUERY:
FROM OPENQUERY(ISERIES,'Select * From LiveData.Orders')
The main difference between the two examples is that, with the four-part
table name syntax, SQL Server queries less efficiently than with OPENQUERY.
OPENQUERY avoids much of SQL Server's overhead by submitting a SQL statement
directly to the linked server's database engine. To do this, however, the SQL
statement supplied to OPENQUERY must conform to the linked server's SQL dialect.
In other words, you can't submit a T-SQL statement to an iSeries linked
server.
Many DQ performance considerations are beyond the scope of this
article. For some of the iSeries-specific performance considerations, see
"Running Distributed Queries with SQL/400 and SQL Server 7.0" in the
September/October 2000 issue of AS/400 Network Expert. For more
information on DQs, see the SQL Server T-SQL documentation on the OPENQUERY,
OPENROWSET,
and distributed
query topics.
Running a Distributed Transaction
Now, we're at the heart of the topic. For this
demonstration, on the SQL Server side, I'll be using the NORTHWINDCS sample
database, which is included with Office XP (you could also use the sample
database called NORTHWIND that comes with Office 2000). I'll focus on a
particular table called Products, which is the Product Master table.
For
this example, assume that an identical Products table exists on the iSeries and
that these two tables need to be synchronized at five-minute intervals. The
structure of the Products tables for each platform is shown in Figure 1.
|
Figure 1: These are the Products tables from the NORTHWIND database as
they exist within SQL Server and the iSeries. The Synchronized column was added
to both for tracking an item change.
For simplicity, assume that the
synchronization will flow in only one direction. The Products table on the SQL
Server side is the "master"--that is, changes to the Products table have to be
done through a SQL Server application. Further, changes to the iSeries table
will only be those resulting from the synchronization process.
To try
this scenario, open Query Analyzer, select the NorthwindCS database, and issue
the following SQL statement to add a "synchronized" flag to the Products
table:
Next, create a schema (library) on your iSeries called NORTHWIND using
the "CREATE SCHEMA NORTHWIND" SQL statement. Create the Products table in schema
NORTHWIND using the second CREATE TABLE statement shown in Figure 1 (remember to
use the appropriate SQL naming convention). This table will be journaled
automatically. Finally, copy the Products table data from SQL Server to the
iSeries using the distributed query shown in Figure 2.
|
Figure 2: This distributed query will insert data into the iSeries
Products table from the SQL Server Products table.
Look at Figure 2's
INSERT STATEMENT. The four-part table name syntax is specified as the table to
receive the data. The SELECT portion consists of the SQL Server Products table
with a subselect to the iSeries Products table again, to make sure a duplicate
record isn't inserted (of course, all records will be inserted the first time
through.)
In the subselect, though, the iSeries Products table is
embedded in the OPENQUERY function instead of the four-part table name syntax.
In this case, the reason for using OPENQUERY instead of the four-part table name
has to do with performance.
Now that the tables are synchronized,
subsequent inserts, changes, and deletes to the SQL Server table have to be
tracked and moved to the iSeries table. Figure 3 shows a complete T-SQL stored
procedure to do this.
|
Figure 3: This stored procedure will propagate adds, updates, and deletes
from the SQL Server Products table to the iSeries Products
table.
Notice that the XAct_Abort is set to On. This is done to
prevent nested transactions, which the iSeries ODBC driver does not allow. By
default, SQL Server processes all statements inside a default transaction so
that partial rollbacks can occur. Starting another explicit transaction using
BEGIN TRANSACTION actually starts a nested transaction, which will cause the CA
ODBC driver to error out. Setting XAct_Abort to On turns off the default initial
transaction boundary. By implication, this setting will also prevent SQL Server
from doing partial rollbacks.
The first code section is a repeat of the
code already shown in Figure 2. An INSERT statement is used to move all new
records from the SQL Server Products table to the iSeries table.
The
second section involves reflecting all changes to the products in the SQL Server
table on the iSeries. A cursor is opened against the local Products table to
select all products that have changed. Inside the loop, the BEGIN DISTRIBUTED
TRANSACTION statement is executed to start a transaction for each item. In this
case, each product update will be treated as a single transaction. If your
situation requires either all or none of the Product updates to occur, you can
specify the BEGIN and COMMIT transaction boundaries outside of the
loop.
Inside the loop, an UPDATE is issued against the iSeries table for
each field. After the update is completed, SQL Server's SYNCRHONIZED column is
set to true to indicate that the two tables are in sync for the given ProductID.
After the second update is completed, the transaction is committed or rolled
back, depending on whether an error occurred. This is where the power of the DT
shines: The SQL Server synchronized flag will not be set to True unless the data
is successfully placed on the iSeries.
The third and final section
deletes all products from the iSeries table that no longer exist in the SQL
Server table. Again, the four-part table name is specified, and an EXISTS clause
is used to see if the ProductID on the iSeries still exists in the SQL Server
Products table. You probably realized that the INSERT and DELETE statements were
not embedded inside of a BEGIN DISTRIBUTED TRANSACTION block. This is because DT
processing isn't required here, since data is being updated on only one
platform.
Writing that stored procedure was relatively painless--it's
hardly different from a procedure written to synchronize two local tables!
However, there are still two additional requirements to make the synchronization
take place. The first requirement is to set the Synchronized flag to False (0)
whenever a product is changed. You can do this through either the application
program or an update trigger. The second necessity is to schedule this stored
procedure to run at regular intervals using SQL Server Agent or some other
scheduling mechanism.
Does It Really Work?
If you're still following along in this example, you
can now see for yourself how this works. Open the NorthwindCS.ADP Client/Server
sample database with Microsoft Access. Go to the database window, choose the
Tables tab, and double-click on the Products table to open it. Delete a few
records, insert a few new records, and change a few records. For the changed
records, set the synchronized flag to False (0). (To delete existing records,
you will have to remove the referential integrity constraint between the Order
Details and Products table.) Issue the CREATE PROCEDURE statement shown in
Figure 3, then execute it as follows:
Go
Exec spSynchronizeProductsTable
When you query the data on the iSeries, all of your modifications to the
SQL Server table should be reflected.
iSeries-Side Cursors
In case you need to create a DT involving a cursor
on the iSeries, Figure 4 shows how to do this.
|
Figure 4: This T-SQL shows how to use an updateable cursor on the iSeries
within a distributed transaction.
The major difference between this
code and the code in Figure 3 (other than the table reversal) is that the
transaction boundary has to be placed before the cursor declaration. This means
that all of the records will be involved within the transaction boundary. To
have an updateable cursor on a linked server, SQL Server requires that the
isolation level be set to repeatable read or serializable. These locking levels
are restrictive in terms of record locking, so use updateable cursors sparingly.
The one other thing to be aware of is that I had to modify the ODBC DSN
with a default commitment control level of *NONE. Without this setting, I would
erratically get error messages stating that the required transaction isolation
level could not be achieved.
Trials and Tribulations of New Technology
Even though DTs are extremely useful and will continue to grow in popularity, there are still pitfalls. While the end product looks easy enough, it takes quite a bit of fiddling to get everything to work correctly. Listed below are some of the major things I battled with:
Linked Server Errors Cause Processing to Halt
Even though the code shows a tidy Commit and Rollback, the fact is that, when a linked server error occurs, the entire procedure stops with an error severity of 16. As far as I can tell, there is no way to trap these errors. (If someone knows a way around this, please let me and everyone else know by posting a note to the forum associated with this article.) If, for example, a record on the iSeries is locked so that it can't be changed, the procedure will just stop instead of allowing a programmatic response to the condition. This is the worst drawback I encountered.
Case-Sensitive Names
Be careful when entering four-part table names because the RDB name, schema, and table names should be entered in uppercase. In a few cases, when I used an iSeries side cursor, column names seemed to be case-sensitive as well.
Unique Indexes
If you need an updateable iSeries side cursor, SQL Server requires that the table have a unique index. If for some reason your base table isn't able to have a unique index, you can use a read-only cursor with individual UPDATE statements to change the data.
Service Pack Levels
This is the real killer. I toyed around with various Client Access levels and service packs and received varied results. Here is the exact configuration I tested with:
- OS/400 V5R1 with Group Database Fix SF99501-04
- Windows XP Professional with Service Pack 1
- SQL Server 2000 (with no service pack and Service Packs 1 and 3)
- Client Access V5R1 SI05361 and SI06804
- iSeries Access V5R2 SI07675, SI06631 (SI05853 didn't work)
Things are a little too fragile for my liking.
Unfortunately, it seems that the CA ODBC driver's ability to work with DQs and
DTs changes from service pack to service pack. For instance, I had complete
success with everything shown in this article using CA V5R2 SI07675. However,
SI05853 was a complete flop. The V5R1 SI06804 did everything except for the
iSeries-side updateable cursor.
My only reason for sharing this
information is that it was frustrating trying to find the right combination of
software levels to make the thing work!
Ensuring the Veracity and Timeliness of Shared Data
As the requirements for sharing data between
platforms in real time increases, so will the popularity of DTs. Their ease of
use and ability to guarantee the "all or nothing" concept among multiple
database servers make them an ideal candidate for fulfilling many of the
cross-platform interface requirements.
Michael
Sansoterra is a programmer/analyst at SilverLake Resources, an IT services
firm based in Grand Rapids, Michigan. You can reach him at
LATEST COMMENTS
MC Press Online