14
Thu, Nov
2 New Articles

The Road to ClusterProven

Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

The first step on the road to becoming ClusterProven is understanding what ClusterProven means. ClusterProven is an IBM registered trademark, not a generic description. IBM grants a license to use the trademark to independent software vendors, or ISVs, whose products conform to rigorous clustering specifications.

ClusterProven and Advanced ClusterProven programs exist for each of the IBM server lines. And, for the iSeries 400, it is the PartnerWorld for Developers/iSeries 400 group that is responsible for administering the program.

For ClusterProven branding, iSeries 400 applications must meet the following criteria:
• Switch to backup resources automatically if a primary system becomes unavailable

• Provide sufficient information to the cluster environment to enable automatic application configuration and resilient resources

• Supply an exit program that can coordinate the application’s restart using resilient data

IBM also awards a higher-level branding trademark, Advanced ClusterProven, to ISVs whose iSeries 400 products meet all of the ClusterProven criteria and then go even farther toward ensuring continuous operations. One essential criterion of these Advanced ClusterProven apps is that they must use commitment control or some internal application checkpoint process to recover to the last transaction, as well as minimize user disruption in the event of a system failure.

Value Added

The primary value of these two trademarks is the confidence they instill in customers who purchase a high-availability clustering solution. While an application that does not bear either brand may, obviously, still meet all of the same cluster-enabling criteria, the ClusterProven and Advanced ClusterProven marks are proof that clustering capabilities were verified by IBM.


The trademarks also benefit software buyers by removing the need to verify the cluster-readiness of an application under evaluation. The certification tests have already proven it. And, of course, ISVs receive benefits from the brands.

Candidate Applications

Some applications are more ready for clustering than others. The amount of work required to achieve ClusterProven status for an existing application depends on the application’s functionality and design. There are several issues that must be considered.

Do You Use Journaling?

Journaling is a prerequisite for clustering. In traditional environments, many organizations have feared turning journaling on because of the potential impact on system performance. However, the magnitude of that impact has not always been clear, and in addition, IBM continues to make significant improvements in journaling performance.

Because the journal keeps a record of all changes applied to the database, journaling can cause a dramatic increase in the amount of disk I/O. This may or may not impact overall system performance, depending on where bottlenecks occur.

For example, if a system is primarily CPU-bound, adding a significant volume of disk I/O may not affect overall performance. Therefore, before implementing journaling, you should test its impact on overall system performance by turning it on. If performance is unacceptable, consult IBM’s documentation for journaling tips and techniques. If performance is still inadequate, some application tuning may be required before using journaling in a production environment. With proper configuration and tuning, it is possible to journal in virtually any application environment with minimal overhead.

How Does the Application Recover Data?

If replication software from an IBM AS/400 High Availability Business Partner (HABP) is already used to maintain replicated data on a backup system, part of the road to ClusterProven status has already been traveled.

AS/400 clustering uses a “shared nothing” architecture. This means nodes in the cluster do not share resources. Instead, the cluster uses HABP replication software to maintain duplicates of all resources—data, applications, user profiles, and other system objects—on all backup nodes.

How Does the Application Recover Users?

A simple “litmus test” question can set the stage for determining how well prepared the application is for clustering. Simply ask yourself, “How well does the application recover itself and its users after an abnormal failure in a single system environment?” If an on-site visit by a software engineer is required to recover/repair data, then, most likely, a significant amount of work will be required to add robustness to the application design. If, on the other hand, the application comes back online and can quickly determine where it left off and can automatically deal with lost or damaged data, then the work to cluster-enable the application will be greatly diminished.

At a conceptual level, clustering on the AS/400 is not new. For some time, and to varying degrees, the HABPs have offered system-monitoring capabilities and facilities to automatically failover or manually switchover to a backup system where the application is simply restarted. The application is not involved in the process and is said to be “cluster unaware.”

The difference with iSeries clustering for high availability is that some of these functions are now performed by the AS/400 cluster engine, some are performed by the HABP middleware, and some are performed by the application-supplied exit program and related application resiliency changes. The biggest difference in this new clustering architecture is the active participation of a highly resilient cluster-aware application.


Exploited, Leveraged, or Enhanced?

Once an application is enabled to behave properly in a high-availability cluster, the degree to which user impact can be mitigated (or eliminated) is typically a function of how much (if any) application state information is maintained, how that information is used in a recovery/restart scenario, and the use of such techniques as commitment control. Ideally, when these are used collaboratively, the highest level of cluster-enablement—Advanced ClusterProven—can be achieved.

Which One Do You Need?

Early in your clustering journey, you must answer a question: Is ClusterProven sufficient, or does your application require Advanced ClusterProven status?

A ClusterProven application can automatically restart on a backup node when the primary node becomes unavailable. However, it is not necessary to reposition users at the point of failure. Instead, they may be taken back to the last main menu screen they encountered. Users may then have to recreate some of their work.

An Advanced ClusterProven application goes further. All transactions managed by the application should ideally use commitment control. In the event of a failover to a backup system, host-centric applications must reposition users to the last transaction commit boundary or to a checkpoint boundary.

Advanced ClusterProven client/server applications are even more resilient than host- centric applications. Because the client manages the user interface, the user experiences a seamless failover with minimal service interruption when a primary server fails.

The best way to think about this is to view it as an “availability journey.” It’s kind of a quest for the Holy Grail. Just about any application availability enhancement can deliver customer value, so it is quite reasonable to increment your way to a state-of-the-art application, from an availability perspective. Maybe it makes sense to simply enable the application to function properly in the clustering environment as a first step. Next, adding some simple application checkpointing (state information) so that you can reposition users after an application restart might be beneficial, and so on. The best advice is to listen to what your customers are asking for. They probably can’t articulate it in terms such as ClusterProven or Advanced ClusterProven, but they can usually tell you how long they can afford to be down.

Batch Applications

Traditionally, batch applications did not use journaling or commitment control, because the easiest and most-efficient recovery strategy was to take a backup before the batch job started. If the job failed, the database was restored from the backup and the job was rerun.

This batch recovery strategy is not acceptable in a clustered environment that requires around-the-clock availability. Journaling is a minimum requirement for ClusterProven and Advanced ClusterProven status. Advanced ClusterProven applications should also use commitment control. Batch programs that are write-intensive can present special problems when journaled, so IBM addressed this by providing the Batch Journal Cache PRPQ.

One option for batch programs is to create an arbitrary commit point after every x number of records that the batch job processes. The most appropriate value of x depends on the nature of the batch application and the nature of the other applications that may be running simultaneously.

Replication

Clustering in the AS/400 environment is a cooperative solution that includes replication software from an HABP. Because the backup nodes must be ready to take over at any time,


replication and journaling must be active at all times, including when batch jobs are running.

For clustering to succeed, all of the data, programs, security objects, and other system objects used by an application must be replicated. A complete inventory of these items is, therefore, critical. An HABP can provide services and tools to help you identify all relevant resources.

To receive ClusterProven status, you could write all the required software on your own, but the easiest, and probably the safest, solution is to establish a relationship with an HABP. Even if you plan to leave the choice of HABP up to your customers, a replication solution and a cluster management interface from one of the HABPs should still be used in the testing process that leads to ClusterProven certification.

Exit Program

Each application must supply an exit program to be called when a cluster event occurs. Examples of cluster events are a node failure, the addition of a node to a cluster, or the removal of a node.

One generic exit program may serve multiple applications, and it can be written in any ILE programming language. When a cluster event occurs, the exit program is initiated on all nodes in the recovery group. It must be able to handle all of the relevant action codes shown in Figure 1 (page 51).

A generic exit program may be sufficient to achieve the lowest level of recoverability required for ClusterProven status. As part of its service to cluster-enable your application, an HABP may use an automated tool to help you create the program. However, a more sophisticated exit program might be required. It may be necessary, for example, to start a sequence of programs rather than just the failed program. Or the exit program may play a role in repositioning the user in the restarted application.

Data for Automated Installation

AS/400 clusters share application information in a standard format through the automated installation data area. You must set up and initialize this area for each application that runs on the cluster. It must exist on every node in the application’s recovery domain.

The data for automated installation consists of three components: the input data area (QCSTHAAPPI), the output data area (QCSTHAAPPO), and the object specifier file.

The input data area contains information about the application, the application’s resiliency requirements, and its data and object replication requirements. This includes the application name, release level, and identification information; the associated exit program and information required by it; information about the cluster resource group; and information about any associated data areas and journals.

The output data area reflects the results of setting up the application resiliency environment. This includes an IP takeover name (if appropriate), the participating data resource group names, and the various status indicators.

The object specifier file describes the format used to identify objects replicated by the HABP solution. This includes information such as country and language codes and path information for replicated objects. Because this information is stored in a standard format, once set up, it can be used by any of the HABP solutions.

If you work with an HABP to cluster-enable your application, it may employ software tools that can help build the data for automated installation.

Certification Process

The project team involved in ClusterProven testing may vary. However, the application vendor, the HABP, or a combination of the two usually manages the project. IBM typically plays a monitoring role.


The certification process begins with the HABP loading the necessary objects on the backup node and populating the data in the automated installation data area. A series of scripted tests are then run to simulate system failures. The failover activity and resulting application processing on the backup node are observed to ensure that they conform to the ClusterProven or Advanced ClusterProven specifications.

The testing process generally lasts less than one week. The paperwork must then pass through the legal process before the ISV is allowed to use the ClusterProven or Advanced ClusterProven trademark.

It’s All About Credibility

The IBM ClusterProven and Advanced ClusterProven trademarks help ISVs gain immediate credibility for their products’ clustering capabilities—even before they have any reference accounts. The length and difficulty of the journey to ClusterProven status can vary greatly, depending on both the nature of the application and the level of application resiliency required.

When you start the journey, you cannot travel alone. AS/400 clustering is a cooperative venture among the ISV application, OS/400, and replication and cluster management software from an HABP. Since you must prove that your application is cluster-enabled before receiving a license to use the ClusterProven trademark, a relationship with at least one HABP is a mandatory step in the process.

When choosing an HABP to partner with, consider three factors: its replication and cluster management solution, its experience in cluster-enabling applications, and the tools it can bring to bear on the necessary tasks. An HABP that is strong in all three areas can greatly accelerate your trip on the road to ClusterProven or Advanced ClusterProven. Good luck on your journey!

INITIALIZE

START, RESTART

END

DELETE, REJOIN, CHANGE,

DELETE COMMAND

FAILOVER, END NODE

Ensure that the related program objects exist on the relevant nodes in the cluster. Validate the existence of the required data. Set the exit program success indicator.

Ensure that the required data areas are available on the primary and backup nodes. Initiate handlers for exception and cancel conditions on primary node. Set the exit program success indicator if a failure occurs on primary node. Set the exit program success indicator on the backup node.

Set the success indicator on the primary and backup nodes. End any jobs that were started by the exit program on the primary node.

Set the exit program success indicator on the primary and backup nodes.

Initiate handlers for exception and cancel conditions on first backup node. Validate the existence of the required data on first backup node. For a failure condition, set the exit program success indicator on first backup node. Validate the existence of the required data on other backup nodes. For a failure condition, set the exit program success indicator on other backup nodes.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: