02
Thu, Jan
0 New Articles

Accessing i Data from MS SQL Server

SQL
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

It's one thing to access SQL Server data from the System i, but it's another thing to access System i data from SQL Server.

 

Heterogeneous data is one of the hallmarks of today's IT. Whether due to disparities in disk price or processing power, data in many shops has become fragmented across not only multiple physical machines, but multiple platforms and databases. One of the primary jobs of any IT department is allowing seamless access to those different databases.

 

If you're a programmer, you can usually find the tools you need to access data on any machine in your network. It's a bit harder to do that without programming--that is, by using the user-oriented data access tools that are prevalent on today's desktops. In fact, as you'll read here, sometimes its just plain impossible.

Getting at Data Is Easy for a Programmer

Whether it's a Type 4 Java Database Connectivity (JDBC) driver or an Object Linking and Embedding Database (OLE DB) provider, objects typically exist to allow a programmer on one platform to access databases on another. This means that, with just a bit of effort, a competent programmer can access data from multiple data sources in the network and provide meaningful information to the user.

 

Perhaps the simplest route is to use Java and JDBC. JDBC has progressed significantly since its original incarnation. When JDBC first appeared, it was primarily as a wrapper around Open Database Connectivity (ODBC). ODBC was Microsoft's API for allowing standardized access to all databases, and it involved a "native" piece written for the operating system on which the client ran. In the Microsoft world, that meant primarily Windows, so Windows drivers for various databases soon became available. This in turn meant JDBC access for those databases.

 

This, however, didn't translate to access from the i. While there was plenty of support for ODBC drivers to access DB2 data, it was hard to find someone interested in writing OS/400- or i5/OS-specific ODBC drivers. Thus, it was easy to access DB2 data from Windows yet hard to access SQL Server data from the System i. This changed with JDBC Type 4 drivers, in which the entire driver is written in Java, with no native driver required. With this type of driver, all you need is a JVM, and the i has a very good one. So, now it's relatively easy to access SQL Server data and DB2 data using a Java program.

 

However, by definition this involves someone writing a program; it doesn't provide the sort of ad hoc access that many users have come to expect. The ability for a user to be able to easily write an SQL statement and import the results of that statement into a desktop document has become a commonplace requirement. I'll resist the urge here to expound on why I don't like direct user access to rows and columns in the database, because that's outside the scope of this article. Instead, I'll assume that you understand the risks, and I'll instead focus on how to make this happen.

Non-Programmatic Access

What is needed is a way for users to write simple queries that access data on two databases simultaneously. Two ways exists to do this: either use a tool that knows how to join multiple databases, or convince the SQL engine on one platform to talk with data on another platform.

 

Tools for the former exist, although they're usually relatively expensive. For example, IBM has its DB2 Information Integrator. This is a powerful tool that allows you to create queries over widely disparate data sources (not just RDBMS), but it's not cheap. The last pricing I saw was in the realm of $20K per server and $10K per database. That was a long time ago, but it gives you an idea of the value placed on these tools.

 

Another option is to get a Windows-based package that understands the various databases, such as UniDirect .NET. I haven't used this product; I only bring it up because it really highlights the biggest issue when dealing with System i data: most people in the "outside" world, if they've heard of DB2, only know of it from the small-server world (that would be DB2 LUW, the Linux/UNIX/Windows version). Occasionally, they know the mainframe version (DB2 for z/OS). In any case, they tend to provide support for "DB2," but that doesn't always mean DB2 for the i.

DB2 Data and SQL Server Data Side by Side

The other option is to somehow convince one SQL engine to look at data from another data source. For example, to get SQL Server and DB2 to work together at the SQL engine level, you have two choices: get DB2 to talk to SQL Server or get SQL Server to talk to DB2.

 

For the first option, IBM uses Distributed Relational Database Architecture (DRDA). DRDA was originally an IBM architecture. It's been around for a decade or so and is currently under the auspices of The Open Group. However, while a number of RDBMS products will participate in a DRDA environment, Microsoft SQL Server is not one of them (surprise, surprise).

Linked Servers in SQL

The SQL Server approach is called a linked server. Michael Sansoterra wrote about this in some detail, and while the article is a couple of years old now, it still has great information. But the basic concept can be boiled down pretty quickly: if your database has an ODBC or OLE DB connector, then it can (probably) be used as a linked server. The "probably" indicates one of two cautions: first, not every feature may be supported by your driver, and second, even if something is supported, the performance may be so bad that you need to find a different approach after all.

 

Ignoring those caveats for the time being, let's take a quick look at the linked server approach. As I said, Michael's article goes into this subject in depth, and it really covers the topic well from a technical perspective, so all I really have to do here is hit the high points. First, you define your server using some special Transact-SQL (T-SQL) statements that identify the server type and location.

 

Once that's done, you can use the new server (and its schemas and tables) in your basic SQL statements using a four-part naming syntax that includes the server definition name and schema in the table name. This isn't a perfect solution. The operations you can perform are pretty limited, especially by today's standards. Basically, you're stuck with what I consider to be first-generation SQL statements: SELECT, UPDATE, INSERT. You can also execute basic database management functions like DROP TABLE and SELECT INTO. The latter is a powerful way to move data from SQL Server to another database (or vice versa).

 

However, linked servers have no support for parameter markers, and stored procedure support is nonexistent for remote databases other than SQL Server. To use those advanced SQL features, you need to use either the EXEC AT feature or Data Transformation Services (DTS), which actually has been replaced by SQL Server Integration Services (SSIS).

DTS Is Itself Transformed, to SSIS

If your data needs include access to stored procedures, Data Transformation Services (DTS) was the route that had been available since version 7 of SQL Server. DTS basically allowed you to create sort of an SQL batch job that was designed to move data from one database to another. Because it was the only way to do certain things, DTS was often used for jobs other than data transformation, but the idea remained the same: you created a set of SQL statements and then invoked them as needed. Note that these statements depend on the capabilities of the driver used, but they usually provide enough capability to make DTS the tool of choice for transformations between different databases.

 

However, with the introduction of SQL Server 2005, DTA was replaced by SQL Server Integration Services (SSIS), which was available only in the enterprise edition of the product. With the release of SQL Server 2008, SSIS has been broken up into a variety of components with differing availabilities. So, if you plan to use SSIS, you need to take a careful look at your server edition to see if the required features are available.

 

It's not surprising that Microsoft doesn't include SSIS in all editions of its database (it seems that Redmond is taking a page from IBM and unbundling technology with new releases); the functionality is pretty extensive, including support for things like XML and allowing complex transformations and scheduled data conversions. It really is now much more of a data mining and warehousing utility, even a rudimentary form of business intelligence. However, that unbundling does mean that you have to budget for which components you will need in your environment.

EXEC AT

As of SQL Server 2005, the other option is to use the EXEC AT capability of T-SQL. This is sort of like RCMD in FTP; it passes a statement directly through to the host database and executes it there. In its base form, the EXEC (or EXECUTE) operation in T-SQL lets you execute an arbitrary character string. Similar to prepared statements in embedded SQL languages, this is the T-SQL method of executing dynamic SQL. You can build an entire SQL statement character by character and then execute it.

 

One of the lesser-known capabilities of this technique is to take that dynamic statement and transmit it for execution at a remote database that has been previously defined as a linked server. The data is literally sent to the other machine and executed there. This very powerful capability has some caveats, however. First, the syntax for the call has to be in the syntax of the remote database. While this makes sense, it can be somewhat confusing to see non-Microsoft syntax in a T-SQL program. Also, you have to take into account character-coding issues; character parameters are subject to translation.

Performance

So it's not impossible to manipulate databases remotely and to copy data from one database to another using simple SQL syntax. The linked server approach in SQL Server makes that syntax relatively easy, and the enhancements to the EXEC AT operation allow even more complex vendor-specific instructions to be passed through to the remote database.

 

The last issue is performing operations that actually join two databases and, specifically, the performance issue that can occur. One of the primary reasons that SQL is becoming so prevalent is the fact that the query optimization capabilities of the SQL engines make them very fast. Because an SQL engine has a deep understanding of its database, including statistics on the data itself, the engine has the ability to make decisions that improve performance.

 

Such understanding is lost when two databases are joined, especially two completely different databases such as SQL Server and DB2. The performance statistics gathered by the DB2 database are not available to SQL Server, so it must take the most pessimistic options for every decision. Add to that the pure overhead of having to go across the wire just to get a record, and it's clear that queries can quickly become very complex. As important as database and query design are to standard single-database applications, they are that much more important in multi-database environments.

 

Another hurdle to the multiple-database environment is the fact that it is often very hard to find appropriate test data. It's difficult enough to get a realistic snapshot of data on one machine, much less get snapshots on two machines, especially if the data needs to be somehow related. With cross-database queries, you need to test using data whose keys are going to coincide in order to properly exercise your join criteria. You have to understand both databases in order to extract the appropriate test information as well as have a place to put it. These issues are magnified in smaller shops with limited resources; it's hard enough finding extra disk space for Windows machines, much less extra room on an i.

 

Because of the difficulty of setting up proper test data, this is the type of situation where programmers may be less stringent in their testing procedures. However, testing is more rather than less important in this situation, especially load testing. More issues factor into the equation. In addition to raw data size, you have to take into account system load on both of the machines as well as network congestion. Before putting any query into production, it is essential to try it out on production quantities of data when network traffic will be the same as the time that the statements will be executed.

 

Multi-database queries are not to be taken lightly. For online transactions, they can add a measurable delay to response time. For batch operations, they can add an unwelcome level of overhead. Depending on the business requirement, you may actually be better off with an alternate approach, such as staging data from one machine to another during off-peak hours.

Final Note and Denouement

I noted above that both the EXEC AT and the DTS approaches require that you use DB2 syntax for your SQL code rather than SQL Server's T-SQL syntax. This allows me a quick reference to the fact that SQL dialects differ from one vendor to another; the differences can be subtle and a little tricky. Syntactical differences in stored-procedure syntax can be even trickier. So moving back and forth is not always easy.

 

Back in my early days of SQL, I used a tool called SwisSQL to convert between various flavors of SQL query syntax. The tool is one of the best I've ever used, although the pricing has risen considerably over the years. I think the basic query converter is $395, while the stored procedure converter is $995. For awhile, AdventNet had free versions of the tools available, but now I think they offer a 30-day, limited-use trial.

 

Still, you might get a kick out of looking at what happens. What's important is that you realize that while it's probably reasonable to think about allowing users to create ad hoc queries that join disparate databases, as soon as you start talking about more advanced features such as stored procedures, or even trying to do large-scale inserts or updates, you may be better off attacking the problem programmatically.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: