08
Tue, Oct
2 New Articles

Unlocking Your ROI for Power Systems Disaster Recovery

High Availability / Disaster Recovery
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Organizations often view an investment in DR as insurance, but that perspective may blind them to the returns available from investments in advanced DR solutions.

Expenditures on disaster recovery (DR) solutions are frequently considered a cost of doing business, not an investment. Or they may be viewed as insurance policies that, hopefully, will never be called on to pay out claims. From this perspective, it's difficult to justify more than the minimum expenditure that will provide "good enough," but not necessarily optimal, protection against losses due to a disaster.

 

If it were true that DR solutions are merely insurance policies that pay out only when a disaster strikes, then the "good enough" strategy might, indeed, be appropriate. To understand why, first consider the definition of "disaster."

The "Insurance" Value of DR

From the DR insurance viewpoint, a disaster is an event that causes a level of destruction that forces you to restore data and applications from backup media and/or transfer system operations to another site. These situations are exceptionally rare. Most companies will go for many years without experiencing such a calamity, and some organizations may never experience one.

 

Economists generally suggest that the best way to value an uncertain outcome is with expected value theory. Put simply, the expected value is the value of the possible outcome times the probability of it occurring. Consider, for example, the following scenario (all numbers are hypothetical and likely bear little relationship to your circumstances):

 

  • Disaster Angst Corp. (DAC) is considering an investment in DR technologies. With the technology in place, the improvement in disaster recovery time and completeness will reduce the cost of a disaster by $1 million compared to the status quo.
  • DAC uses a five-year planning horizon for its technology investments.
    The probability of one and only one disaster occurring within DAC's planning horizon is 0.1 percent (= 0.001).
  • The probability of two and only two disasters occurring is 0.05 percent (= 0.0005).
  • The probability of three and only three disasters occurring is 0.005 percent (= 0.00005).
  • The probability of more than three disasters occurring within DAC's planning horizon is small enough to be ignored.


Under the above scenario, the expected value of the DR technology is ($1,000,000 * .001) + ($1,000,000 * .0005) + ($1,000,000 * .00005) = $1,550, minus the cost of the technology. Using these numbers, an investment in DR technology would be a losing proposition if the technology costs more than $1,550.

 

Even this scenario overestimates the value because it does account for the time value of money. The hardware and software costs associated with implementing DR solutions are incurred up front, whereas the "insurance" benefits will be received only when and if a disaster occurs sometime down the road. A discounted cash flow calculation would, therefore, be more appropriate. This calculation would further reduce the expected value, but it is beyond the scope of this article.

 

As can be seen, even when the payout from the insurance aspect of DR is large, because the probability of a disaster is so low, expected value theory suggests that a large investment in a DR solution is unwarranted.

 

There are a few problems with the insurance view of DR. For one thing, the expected value calculations are dubious. Their accuracy depends on the accuracy of two component forecasts: the probability of disasters and the cost of disaster-related downtime and data losses. Both of these estimates are typically fraught with error.

 

History generally provides the only readily available estimate of the probability of disasters, but history is, at best, a weak predictor. The problem is a paucity of data points. Disasters happen at random intervals, and they occur only very rarely. Thus, historical averages are not statistically significant.

 

What's more, the historical data used must be restricted to companies in similar circumstances. Some geographic areas never see hurricanes, but others are at a high risk during hurricane seasons. The presence of nearby tectonic fault lines determines the probability of earthquakes. Forest fires, tornadoes, and wars are, likewise, more prevalent in some locations than in others. Clearly, a worldwide disaster frequency average would produce a poor disaster frequency prediction for a specific company. Yet restricting the data to just companies in similar circumstances as yours forces you to base your forecast on a small subset of an already small set of data.

 

The costs of downtime and data losses can be forecast much more accurately than the disaster frequency. Nevertheless, most companies significantly underestimate these costs. Unless organizations perform a rigorous analysis, the true costs are likely multiples of their "back-of-the-envelope" estimates.

 

What's more, even when companies undertake a comprehensive analysis of their potential disaster-related costs, they often overlook one gloomy statistic. A sizeable proportion of companies that incur a cessation of operations lasting more than a couple of days go bankrupt within a few years or never reopen at all. Thus, unless you have DR solutions that allow you to recover effectively and rapidly, the true cost of a disaster may be the full value of the business.

 

And, in some industries, such as the financial sector, the choice may be taken out of the hands of the business. Regulations in critical industries make an investment in business continuity technologies a minimum cost of doing business.

The "Insurance" Legacy of DR

DR became viewed as insurance because the traditional DR technology, which is still the primary or only DR technology used in many companies, was incapable of acting as any more than that. Tape-based backups are a very cumbersome and time-consuming way to recover data, particularly if the tapes have been sent to an offsite location. As a result, backup tapes are typically used as only a last resort.

 

Yet tape-based backups don't even provide an especially good insurance policy. In addition to being a comparatively slow medium, they are more fallible than disk. Hence, it is possible that, when you try to recover data from tape, you will find that the most recent version is unusable, forcing you to rely on backup tapes that are up to 48 hours old.

 

Even when the most recent backup tapes are usable, they do not allow complete data restoration. Backup tapes are typically created once every 24 hours, usually in the middle of the night. Data that are added to or updated on a company's databases during the next day are not represented on any backup tapes. Consequently, if a disaster destroys the data center, including any online journals, data recovery from the backup tapes will omit up to 24 hours worth of data—and possibly more if the most recent backup tapes have not been sent offsite at the time of the disaster.

 

Furthermore, it can take several hours or even a few days to fully recover a data center from backup tapes, particularly if those tapes have to be recovered from a vault some distance from the recovery site. As noted above, many companies would not survive such lengthy business outages.

 

Beyond Insurance

Because of the difficulty in justifying large expenditures based solely on DR's insurance value, many companies don't move beyond tape-based backups, despite the considerable liabilities of this approach. Nevertheless, moving DR beyond tape unlocks significant ROI potential in addition to what's available from the insurance value of DR. And, unlike "insurance," the realization of that potential is assured, which makes these advanced solutions easier to justify to a cautious CFO.

 

The options for moving beyond tape are many, but they can be distilled into two broad categories: geographically distributed high availability (HA) and continuous data protection (CDP). To be clear, most, if not all, companies that adopt technologies in one or both of these categories will not abandon tape as a back-up medium. However, tape will be relegated to a last line of defense, to be used only when all else fails.

Geographically Distributed HA

HA technologies create and maintain real-time replicas of production servers, including fully redundant copies of all data. Because HA solutions can replicate data and objects over any distance, a backup server can be located far enough away from the primary site that a single disaster will almost certainly not affect both sites.

 

This can be classified as a DR solution because of the protection it offers from the IT-related consequences of disasters. However, thinking of it as such requires a mind shift because, unlike tape-based backup technologies, there are no data or objects to recover before normal operations can resume after a disaster. Instead, users are simply switched to the remote, hot-standby, backup server. Then, when the primary site becomes available again, the HA software can automatically resynchronize the two sites.

 

Unlike tape-based backups, which provide only an insurance benefit, the ROI of an HA investment is much easier to predict. More importantly, you don't need to incur a disaster to earn the return.

 

Because switching to a redundant backup server can be done fairly quickly, you can use this option in a much broader range of circumstances than you can use tape-based recoveries. For example, when you need to perform maintenance on the primary server, rather than shut down operations until the maintenance work is finished, you can switch users to the backup system so they can continue their normal activities with minimal interruption.

 

Like tape-based DR, this option offers an insurance value, but it no longer derives its ROI primarily from its insurance value. Some maintenance is performed regularly, on a well-planned schedule. Other types of maintenance, such as hardware and software upgrades, occur at more random intervals, but they are sufficiently recurring that their frequency is predictable with a fair degree of accuracy.

 

In addition, companies have experience with most of the types of maintenance that will be required in the future. Thus, unlike disasters, with which most organizations have scant or no experience, organizations can accurately compute the cost of maintenance downtime. By measuring past costs and projecting them forward, it is possible to predict the value that will be contributed by this benefit with a high degree of precision.

 

Geographically dispersed HA delivers other benefits that can also be forecast with reasonable accuracy. For example, in the past, it was necessary to take applications offline while backing up related data. Save-while-active technologies make this unnecessary, but backup jobs usually consume much of the available disk I/O bandwidth, while also hogging processor resources. As a result, while business applications may, technically, be able to run while performing backup tasks, response times may slow unacceptably.

 

Geographically distributed HA can eliminate the burden that backup jobs place on production operations. Because HA maintains a complete, up-to-date replica of all data and applications, backup tapes can be created on the remote server, thereby eliminating the impact on the production system.

 

The cost of the productivity that is lost when backup jobs are run on the production system is calculable. Furthermore, backup jobs run at very regular intervals, typically exactly once every 24 hours. Thus, when considering an investment in geographically distributed HA, the value that will be received from moving backup jobs off the production system can be forecast with considerable accuracy.

 

What's more, because the backup server contains a current copy of all data, it can also be used to run other read-only tasks, such as batch reporting. Shifting processing off the primary server in this way may defer the need for a server upgrade.

CDP

The need to recover data most often arises not from disasters, but from much more common and, seemingly, less significant events. An operator accidentally deletes an important file. A user corrupts data with an inappropriate update. A simultaneous disk failure overcomes RAID protection and destroys a portion of the company's data. A computer virus deletes or corrupts data. A disgruntled employee destroys some data before departing. The list of such occurrences is almost endless.

 

In these circumstances, the job of the IT department is not to recover the entire data center but, rather, to restore the individual file or data item in question, preferably to the point immediately prior to when it was deleted or corrupted. HA software alone can't do this because the software will immediately copy the deletion or corruption to the backup server to ensure that it is always an exact replica of the production server.

 

Tape-based backups offer a partial solution, but they force you to restore data to its state as of when the backup tape was created, probably the previous night. Doing so may discard several updates that were performed on the data between then and when the corruption or deletion occurred.

 

In addition, recovering a single data item from tape can be a labor-intensive, lengthy process—particularly if the tape has already been sent offsite. Because these are relatively common occurrences compared to disasters, many companies keep the most recent backup tapes onsite so they won't have to be recalled when they're needed for these sorts of recovery operations. However, doing so reduces the insurance value of tape-based backups. Should a disaster destroy the data center, including the most recent backup tapes, the company will lose up to two full days' worth of data rather than only one.

 

CDP provides a solution by copying data inserts, updates, and deletes either continuously (True CDP) or batched at intervals (Near CDP), such as when a file is closed or saved, to an online data store that is usually some distance from the production server. Unlike HA, CDP does not attempt to maintain a replica of the production server. Instead, it stores information about each individual update. This way, this information can be used to restore one or more individual data items to their state at a time of an administrator's choosing, likely immediately before they were corrupted.

 

Depending on the vendor, CDP may be sold as a standalone product or bundled with HA software. Typically, CDP stores data in a simple file structure, and, as a result, the CDP server usually does not have to use the same platform as the production server. Instead, it can often be a low-cost Windows- or Linux-based server.

 

The problems that CDP resolves happen randomly, but they are frequent enough that past history provides a reasonable forecast of their frequency. In addition, it is easy to measure how much operator time is consumed in recovering data from tape as opposed to how long that task will take when using a CDP solution. The product of these two values (frequency and avoided cost) can be compared to the cost of the CDP solution to provide an estimate of the return on an investment in CDP.

 

CDP also provides value when a disaster occurs.

 

The CDP backup server does not contain a complete copy of all of an organization's data. Thus, when CDP, but not HA, is in place, the IT department begins a disaster recovery operation by first restoring data from the most recent backup tapes. The CDP database is then used to bring data up to date by applying the data updates that were made after the backups were created and up to the point of the disaster.

 

Because tape-based recovery is a necessary element in this scenario, IT's role in the recovery operation will probably take slightly longer than it would if recovery were from only backup tapes. However, because true CDP can be used to recover data right up to the point of failure, the organization will not have to manually restore data that is not on the backup tapes. Thus, the total recovery process across the whole organization usually takes considerably less time with CPD than without it.

Recognizing the Full ROI of DR

The above discussion is not intended to belittle the insurance value of DR. One buys insurance when the cost of the insured risk would be greater than one can bear should the threat come to pass. DR definitely fits this bill. And, in some industries, regulations demand that companies acquire DR technologies for this purpose.

 

The point is that, by viewing DR as only an insurance policy, organizations may blind themselves to the additional value that can be achieved through investments in advanced DR solutions. Those investments are well worth investigating, and they can often unlock returns that are far larger and more assured than the returns available through DR "insurance" alone.

JEFF ASHMAN
Jeff Ashman is the director of software development for Vision Solutions, Inc., the world's leading provider of high availability, disaster recovery, and systems/data management solutions for IBM Power Systems. With a portfolio spanning the industry's most innovative and trusted HA technologies from iTERA, MIMIX, and ORION Solutions, Vision keeps critical business information continuously protected and available. Affordable and easy to use, Vision products ensure business continuity, increase productivity, reduce operating costs, and satisfy compliance requirements. Vision also offers advanced cluster management and systems management solutions along with support for IBM i, Windows, and AIX operating environments. As IBM's largest high availability Premier Business Partner (NYSE: IBM), Vision Solutions oversees a global network of partners and professionals to help customers achieve business goals. Privately held by Thoma Bravo, Inc., Vision Solutions is headquartered in Irvine, California, with offices worldwide. For more information, visit www.visionsolutions.com or call 800.957.4511.
BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: