Data Access, Database Engines, and Persistence

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

Remember when the biggest problems of database design were how to convert internally described files to externally described files and how to pack dates into the smallest possible space? Yeah, that last one bit us, but back in the '70s and '80s, who knew? We didn't worry about Y2K; we worried about whether to key our physical files or whether to use OPNQRYF instead of logical files. Life was much simpler then.

A lot of that was due to the fact that the AS/400 was a tightly knit platform, and all of its languages accessed the same database (which we now know as DB2/400) via the same sorts of operations: RPG opcodes like SETLL, READE, and CHAIN or their COBOL counterparts. But as the AS/400 has transmogrified to the iSeries and now the i5, it has become ever more open and more supportive of other standards, including other languages and other database access methodologies.

In this article, I'm going to compare the various database access techniques, including ODBC and its Java cousin, JDBC (including a little bit about some of the major JDBC vendors). Next, I'll delve into the Java world more deeply and give you a look at some of the acronyms currently being touted as the "next great thing" in persistence. And finally, I'll introduce you to a few other "nontraditional" persistence packages (although as recent a participant as Java is in the persistence game, I find it a little questionable to label things as traditional and nontraditional).

As Usual, a Little History

You know me. I can't pass up an opportunity to put things into historical perspective. This subject is no different. A long time ago I bought a TI-99/4A for my son. It was a pretty barebones machine; it didn't even have a tape drive. In order to play a game, he had to type in the entire program in hexadecimal. It was pretty cool the first time he did it, but when he shut the computer off, that meant that the next time he turned it on, he had to re-key the entire program.

In order for a business computer to perform any meaningful purpose, the information you enter must be accessible in the future, even if you shut off the computer. The TI's answer to that was a cassette tape drive. However, tape drives of any form had an inherent problem in that you could only read data off the tape sequentially; finding a specific block of data on a tape meant physically reeling or unreeling the tape until you got to the right point. Remember all those awful sci-fi pictures of the '60s and '70s that showed the giant cabinets full of whirling tape drives stuttering back and forth? That behavior was due to the tape drive trying to find blocks of data, even when they were fairly close together on the tape.

And thus the disk drive was born. First came floppy disks, or diskettes (for the record, there were non-Winchester "hard disk" drives in the '50s and '60s, but those technologies were quite expensive and have long since gone the way of Betamax and 45 RPM records). For the TI, you could get a single-sided, single-density disk controller that supported diskettes with up to 90 KB of storage. Next came a revolutionary concept from IBM, the Winchester hard disk drive. The first one I ever saw was an 8-inch Winchester that was the size of a DeskJet printer and held a whopping 5 MB of data and sold for about $2,000.

Well, we know what happened next: mass production, innovation, manufacturing technology. All of these combined to move the price point of disk drives from $400/MB to today's cost of less than a dollar per gigabyte--a 400,000-fold decrease in price, even before inflation. And with more data came more choices.

Data Access

To start, we had relatively simple data access. Typically, data was stored in fixed-length files that we sorted into the appropriate sequence and then processed. For example, we might sort data by customer and then by item in order to print a report showing what items were purchased by a particular customer. On the other hand, we might sort by item and then by customer to print a report showing which customers purchased specific items. In either case, the data was physically sorted, which meant you needed not only the space for the original file, but also enough space on the disk for a temporary copy of the file containing the sorted records.

For a while, we used network and hierarchical databases, in which records pointed to one another using specific record numbers. While these designs provided excellent performance, they were complex, fragile, difficult to maintain, and prone to corruption. Losing one link in a hierarchical database could result in a serious loss of database integrity and a lengthy rebuild process.

Key Files (Native DB2/400)

Later, someone realized it might be quicker to create "key files." A key file (which in OS/400 terms is a "logical" file) was a list of the key values for a given record and its record number in the actual file (or as we call it, the "physical" file). This way, when a record was added, you simply updated the key file. You could keep multiple "views" of the same physical file without having to sort it over and over. I won't get into a deep discussion about this, but in general if you wanted to access a file in a given sequence, you did it by reading the appropriate key file and then getting the record identified by the key file. While the names have changed to protect the innocent, including terms like ISAM and VSAM, the concept of accessing data through a separately maintained key file is still the most powerful technique known to programming.

Examples of databases using key files include everything from mainframe systems (VSAM, IMS) to midrange (DB2/400) to PCs (dBase, Clipper, Paradox, Btrieve). And as we'll see in a moment, in order to provide reasonable performance, even so-called relational databases in reality rely on key files.

Relational Models

But what about SQL? Why is SQL so prevalent? Well, for one thing, it's relatively easy to use, especially for simple queries. In fact, that's what the name stands for: Structured Query Language. Second, SQL is the de facto standard for accessing relational databases (also know as relational database management systems, or RDBMSs).

"Whoa, Joe! You mean SQL isn't the only language for accessing an RDBMS?"

"That is correct, Grasshopper! Relational databases are defined by the 12 rules of Dr. E.F. Codd. And in his 12 rules, he simply states that there must be a common language for all operations, from creating databases to updating data. He did not specify what that language is, although SQL has been the standard ever since it was spawned from the original RDBMS language, IBM's SEQUEL."

So while SQL is certainly a standard, that's not because it was the language created by the designers of the RDBMS concept or because it was the chosen language for the first true RDBMS. In fact, there is no language for "true" RDBMSs; a little known fact is that there are no true relational databases, at least none that support Dr. Codd's 12 rules.

Not only that, but as I stated in the previous section, most RDBMSs make use of key files in order to perform adequately. If you are regularly using a given index over a file, SQL will tell you to create an index in order to boost performance. What is an index? It's a logical file without any record selection criteria. To select specific records, record selection parameters are used to create views.

Comparing SQL and DDS

In order to get the functionality of a logical file in DDS, you must combine a view and an index. For ad hoc inquiries, SQL does have benefits, much like those we have enjoyed in the past using OPNQRYF. On the other hand, SQL has more flexibility, especially when joining multiple tables. In general, I would say SQL has more benefits in situations where you would normally use OPNQRYF but less benefits where you would use a regular logical file. Logical files also have the added benefit of being able to seek on them, positioning your cursor anywhere within the file by key; this lack in SQL is almost impossible to overcome in some instances.

Object Databases

In order to round out the list, I need to include object databases (object database management systems, or ODBMSs). In general, the two distinctions between ODBMSs and RDBMSs are that ODBMSs tend to support more complex data, especially when it comes to BLOBs, and that the objects are easier to use in the data retrieval process; you can actually call methods in the objects themselves.

While not prevalent today, the ever-growing popularity of object-oriented languages is likely to make OO databases the Holy Grail of database design, at least for certain applications. And while I think we can assume that the language will bear a strong resemblance to SQL, I think you should realize that when it comes to advanced database access, SQL may not be all you need in the long run.

ODBC and JDBC

Here I have a real hot button. I'll try to be as impartial as possible, but this idea simply makes me crazy. ODBC, or Open Database Connectivity, is a standard originally published by a consortium led by Microsoft back in the early 1990s. The SQL Access Group (SAG) begat the X/Open Company, which in turn merged into the Open Group. SAG produced the SQL/CLI standard, upon which ODBC is based. SQL/CLI is really one of five standards of the ANSI/ISO SQL standards committee. So while we often think of ODBC as a Microsoft standard, it is really an ANSI/ISO standard that defines how any ODBC-compliant application can communicate with any ODBC-compliant database.

And I have no problem with a standard linkage that allows a program to access a database. Personally, I think RPG's use of the externally described file is the best way to do it, because defining such things at compile time leads inevitably to performance advantages. But I can see situations where it's nice to be able to simply send a structured statement to the database engine and have it return some data. The problem to me is when this standard is used in place of distributed processing, which is what has happened. Since programmers can use ODBC in place of having to actually define their own message protocol, what happens is that client applications send ODBC requests to database engines on servers. The problem here is in the binding of table and column names between the client and the server.

I find it especially ironic that people who support ODBC on the one hand at the same time purport that one of the distinct advantages of SQL is that you don't have to change your applications when your database changes. This is perhaps true when you have to add a field, but it's certainly not true when you have to change a field type, rename a field, delete a field, move a field from one file to another, or change how a field works. Those changes require changes to the original application. This dependence on the database layout is multiplied when ODBC is used as a transaction distribution agent, because now, rather than simply having to change the applications on the host, you have to change the client code and then distribute it to all the clients. It is this cost that I think is particularly difficult to justify in order to save a few bucks on creating your own messaging mechanism.

And since JDBC is simply a Java layer over ODBC, the same issues apply. If the Java application is running on the same machine as the database, I have no real problems with it, but if the program is running on a client machine, then you have bound the client and server--to the detriment of your long-term architectural growth.

Wait, Wait, Back Up a Moment

Is this meant to imply that there is another way? Absolutely. The communication between client and server should be at the business level, not the database level. Messages should be passed between the two boxes whose relationship with the physical data on disk is coincidental at best. How these messages are formatted is secondary to the fact that the database can change completely without affecting the application at all. In fact, the message itself can change without the application being affected by simply including a version number in the message. Older messages can be supported for as long as needed and can even indicate to the application that they are getting "stale" (for example, a warning message might pop up saying that the user needs to update the application within the next 30 days). This handles the problem of "all at once" distribution of client changes.

EJB, JDO, and SDO

Take a good look at the Java standards and you'll see that, outside of the actual definition of the language itself (which is quite good), the rest of the standards lack much substance. It's as if the Java committee decided that they would do only 80% of any given standard, leaving the hard part to the vendors. Not only that, but many portions of these standards are "optional," meaning the standards are really little more than recommendations. Part of this may have been pragmatic in order to keep the interest in Java high by allowing companies to avoid certain things, while at the same time allowing them to incorporate technologies that might be less than perfect, but the result is that the Java committees haven't produced anything useful for application developers for some time now.

Maybe this is backlash from the horrible experience of Enterprise Java Beans (EJBs). This standard was so over-engineered yet so poorly implemented that it can't be used by any but the most seriously Java-centric of shops. Yet it was acclaimed by many an early adopter of Java as a business application development language. When it became clear that EJB was a failure, Java Data Objects (JDO) became the next great standard. JDO was an attempt at a trimmed-down version of EJB, but once again the theoretical standard was never really run through the basic practical design requirement of determining how it would help existing developers. With no practical way to retrofit existing databases into the JDO methodology, JDO became one more Java-only exercise, all but useless for the real-world goal of developing applications in multiple languages and combining legacy code with new build.

Service Data Objects (SDO) is one more entry that just misses the point. The basic concept is absolutely phenomenal: Objects are generated from the database and can be either strongly or loosely typed. You can create queries with very little specific information about the target database by returning very loosely typed data. At the same time, in a business application, your information can be as strongly typed as you need, allowing you to perform required business functions. All that is needed is the detailed information about the object itself. Brilliant! The problem with the SDO standard is that the source of that information is the JDBC metadata, which means you're once again tied to the rows and columns of the database. ARGH!

I'll need some more time to analyze this particular buzzword to see if there's any meat behind it. It would be nice if the SDO folks were smart enough to create an abstract object information class that didn't know what JDBC was and then provide an adapter to convert JDBC metadata to object information. Then you could supply other adapters, such as for XML documents. Then, two applications could sync up with one another through a quick exchange of XML and be up and running.

Persistence Engines

I promised an introduction to persistence engines. I won't bore you with a ton of details, but instead just an overview. Like so many things in Java, the developers come upon something that we've been doing for decades, but it's new to them, so they rechristen it. What we call "updating the database" Java developers call "persistence." And many of the things that we've taken into account over the years are such radical concepts that the Java folks tend to write them into their database engines manually.

In persistence engines, a good example of this is write caching. While OS/400 has evolved to the point where write caching is handled by the operating system in such a way that we don't even think of it, performance on individual write operations using JDBC is so poor that the Java folks have to come up with their own write caching. This is one of the things that makes the Hibernate database so fast. Unfortunately, one of the problems with this approach is that since the caching is entirely in the Hibernate software, if the Java Virtual Machine (JVM) gets canceled, you stand an unacceptably high chance of losing data.

Other methods have their own issues. You can find a number of persistence engines here. You'll see that they fall into a couple of basic types. The older engines simply provide O/R (object-to-relational) mapping, often XML-driven. Another significant segment of the solutions are JDO implementations. One particularly interesting one I found was something called JDBCPersistence, which claims to be able to create the required objects on the fly, generating the appropriate Java bytecode. That's something I'd like to see!

Next Steps

Now that you've looked into the world of persistence, especially as seen from the Java perspective, it's time for you to review your own data access needs. I highly recommend that you create some layer that exists between your application objects (whether they're Java objects or arrays of data structures in RPG) and your database. This layer can be used either in local applications on your host or in client applications. By doing this, you'll be able to better handle changes to your database.

Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been working in the field since the late 1970s and has made a career of extending the IBM midrange, starting back in the days of the IBM System/3. Joe has used WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. Joe is also the author of E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. You can reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..

Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been extending the IBM midrange since the days of the IBM System/3. Joe uses WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. He has written several books, including Developing Web 2.0 Applications with EGL for IBM i, E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. Joe performs onsite mentoring and speaks at user groups around the country. You can reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..

MC Press books written by Joe Pluta available now on the MC Press Bookstore.


		Developing Web 2.0 Applications with EGL for IBM i Joe Pluta introduces you to EGL Rich UI and IBM’s Rational Developer for the IBM i platform. List Price $39.95 Now On Sale

		WDSC: Step by Step Discover incredibly powerful WDSC with this easy-to-understand yet thorough introduction. List Price $74.95 Now On Sale

		Eclipse: Step by Step Quickly get up to speed and productivity using Eclipse. List Price $59.00 Now On Sale

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

As Usual, a Little History

Data Access

Key Files (Native DB2/400)

Relational Models

Comparing SQL and DDS

Object Databases

ODBC and JDBC

Wait, Wait, Back Up a Moment

EJB, JDO, and SDO

Persistence Engines

Next Steps

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2

Book Review: The Chief Data Officer Handbook for Data Governance

Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Book Review: 5 Keys to Business Analytics Program Success

Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile