23
Sat, Nov
1 New Articles

The Linux Letter: Regular Expressions for Everyone

Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times
I took the Linux leap in December of 1999. Actually, I didn't leap. I was pushed--shoved--by Barry Kline. A couple of years, dozens of Linux installs, and a published book later, here I am, standing in for Barry on his "Linux Letter." Barry decided to take a long Christmas break and bypass this month's column, so, while he stays in an expensive place getting room service, I'm filling in for him.

I certainly am a Linux advocate. I use it on my laptop, and several of my iSeries clients have it on their iSeries. I started out first with a dual boot laptop, but now my machine only boots to Linux. I do use Windows 98 (using Win4Lin from Netraverse), but I have it available only so I can test my Web application delivery to Internet Explorer.

I'm the Java guy, or so I try to be when I present at COMMON, write articles and books, and do consulting. I use Linux as my development platform. And even though I have WebSphere Studio Application Developer (WSAD, the new Java GUI from IBM), I prefer to use the Linux standard editor--vi. Actually, that's vim, or vi improved. I use Ant to automate my compiles and to do other things like creating JavaDocs, FTPing files, testing units, and signing applets. For source control, I use the open-source Concurrent Versioning System (CVS), which is also used to track the code for most open-source products themselves. But there's another technology, one with a Unix heritage, that I've been using heavily lately, and that is the subject of this month's column--regular expressions.

Today, regular expressions (which, you'll see, are anything but regular) are used all over the place, including on non-Unix platforms. Regular expressions are used in JavaScript in Internet Explorer and Netscape Navigator. They are used in Jakarta Struts (the leading Java-based Web application framework). The Apache Foundation's Jakarta project has a Java package called Jakarta ORO that provides regular expressions for JDK 1.2 and 1.3. In fact, regular expressions are so heavily used that Sun saw fit to add support for regular expressions to JDK 1.4.

On the Linux side, regular expressions can be used from the vi editor, the ubiquitous Perl programming language, and the Unix sed utility.

Regular Expressions in Five Minutes

Regular expressions are arguably a complete language. They are comprised of a string of special characters interspersed with sets of characters that are used as a mask against strings in files and HTML entry fields. The regular expression engine compares a line of text with your regular expression mask. The regular expression engine can either simply return a Boolean saying your text string did not match the mask or, optionally, update characters in that string. The following, for instance, is a regular expression that can be used to compare phone numbers.

/^(ddd) ddd-dddd$/


That regular expression can be used in JavaScript to test an input test:

function checkPhoneNumber(phoneNo) {
  var phoneRE = /^(ddd) ddd-dddd$/;
  if (phoneNo.match(phoneRE)) {
    return true;
  } else {
    alert("The phone number entered is invalid!");
    return false;
  }
}


But that regular expression expects a space after the area code (if given) and a hyphen between the exchange and the four-digit number. The following accepts an optional area code (with optional parentheses), a three-digit exchange with one space or no space after the area code, and a four-digit number with a single space, a hyphen, or no space between it and the exchange:

/^((|)(d{3})?()|)( |-|)(d{3})( |-|)(d{4})$/


Regular expressions have a number of special characters in them to control how the mask works. The caret (^), for instance, if at the beginning of the string, says to match the following mask from the beginning of the string. The dollar sign ($) says to match the preceding mask from the end of the string. The escape-d, identified with the forward slash () and the lowercase letter d says to match a digit. The vertical bar symbol (|) is the regular expression Boolean "or" character. The caret control character (^), if not used at the beginning of the mask, is the Boolean "not" character. The backward slash (/) is the commonly used delimiter for the complete mask. It can be replaced with another character if necessary--say, for instance, if you are validating a URL, which, itself, contains back slashes.

If this is your first exposure to regular expressions, I'm sure I've lost you. Just be aware that regular expressions are cryptic yet very powerful. You could do the same checks with code, but your code would become lengthy and far more error-prone than regular expressions.

Updating with Regular Expressions

But regular expressions can do more than simple check strings. The language supports updating strings as well. Like I mentioned earlier, I use vim as my Java editor rather than an editor from IBM's Eclipse Java GUI product (WDSc, WSSD, WSAD) even though I make money training people to use IBM's Java IDE. One of my biggest reasons for using vim is regular expressions. Regular expressions give vi essentially scan/replace but with superior capabilities. For instance, as I wrote this document in vi, I noticed that I uppercased the first letters of the string "regular expression." To change them to lowercase, I used the following in vim:

:% s/(sRegular)(sExpression)/L1L2/g 


Let me explain that vi command. The percent symbol (%) says to operate on the whole file. The first s says to search. The search mask
"/(sRegular)(sExpression)" is followed by a replacement mask of "L1L2," which says to lowercase (with the slash L) the string that matches the first parenthetical expression ((sRegular , as identified with the shorthand slash-1 notation). The trailing g says the replace operation is to be global. Such obscure syntax often scares programmers away from using regular expressions, but, more often than not, you can go mining the Internet for regular expressions that fit your need and, after playing around with them for awhile, you begin to really appreciate the mini-language.

Sed Again


Although this example use of regular expressions within vi was contrived, I regularly use the same strategy when I do my Java programming. I became so accustomed to the use of regular expressions that I wanted a way of globally replacing Java code in all source files of my app. That's where the Unix sed utility comes into play. The sed utility takes an input file and runs all its text through a regular expression. What I do is write a quick shell script
that runs files in a directory (or directories) recursively through sed. I used the following, for instance, to convert a client's Java ServerPages from the syntax of JSP 0.91 to JSP 1.1:

# JSP 0.91 to 1.1 converter

# create="yes/no" to blank
# <%@ import to <%@ page import
# <%@ isErrorPage= to <%@ page isErrorPage=
# type="com. to  class="com.
# > to />
#    JSP 0.91 to JSP 1.1 Converter
mkdir convertedjsp
for jsp in $(ls *.jsp);
  do
    cat $jsp | sed 's/
    sed 's/
    sed''s/create="no"//'' 
    sed ''s/create="yes"/g'' 
    sed ''s/type="com./class="com/g'' 
    sed ''s/<%@ import/<%@ page import/g'' 
    sed ''s/<%@ isErrorPage=/<%@ page isErrorPage=/g'' 
    sed ''s/>/>/g'' 
    sed ''s/>/g'' > 
    convertedjsp/$jsp
  done
# end of sed script


Note, however, that I've recently begun to use Perl, with its integrated support for regular expressions, rather than shell scripts and sed. Here's a handy Perl script that I use to verify my regular expressions (regardless of where I will be using them--Java, JavaScript, Perl, or otherwise):

#!/usr/bin/perl -w
use strict;
while (<>) {
    chomp;
    # replace the regular expression with the 
    # one you want to test
    if (/^(ddd) ddd-dddd$/) {
print "Matched: |$`<$&>$'| ";
    } else {
print "No match. ";
    }
}

Strut'in My Regular Expressions Stuff

I've been using regular expressions in JavaScript for a while now, but, with the advent of Struts 1.1, I now use them in my server-side Java Web applications. Struts 1.1 added the ability to use declarative edits for HTML input fields. The declarations are placed in an XML file called validator.xml. The following is a validator.xml snippet that declares edits for the input form called visits:



    
      
        mask
        ^[a-zA-Z]*$
    

 
    
    
        mask
        ^((|)(d{3})?()|)( |-|)(d{3})( |-|)(d{4})$
    


    
    
        mask
        ^.+@.+..{2,3}$
    


Note that Struts will automatically edit the qualified fields on the server. But Struts will also, as an option, add JavaScript code in the JSP input form that performs the same regular expression edits that are performed on the server, via Java. By selecting that client-side edit option, performance is enhanced because there isn't a round trip to the server. And you didn't even have to write the JavaScript code.

Regular Expressions, Linux, and Windows

This column may not have talked you into loading Linux at your shop, but I hope it persuaded you to look into using regular expressions for your Web applications. If you are dabbling with Linux, it behooves you to use regular expressions, if not in vi, at least with shell scripts via the sed utility or in Perl programs. Java programmers should look into using Struts (for far more than just the benefit of regular expressions). As I said earlier, regular expressions are directly supported in JDK1.4. But don't wait until you are using JDK1.4--you can use Jakarta's ORO package today with JDK1.2 and 1.3.

If you want to learn more about regular expressions, try the following books. Each has several chapters on them: JavaScript: The Definitive Guide, 4th Edition by David Flanagan, O'Reilly; Learning Perl, 3rd Edition by Randal Schwartz and Tom Phoenix, O'Reilly; or if you really want to get into depth, Mastering Regular Expressions, 2nd Edition by Jeffrey Friedl, O'Reilly.

Don Denoncourt is the co-author of Understanding Web Hosting on Linux, along with Barry Kline. He can be reached by email at This email address is being protected from spambots. You need JavaScript enabled to view it..

Don Denoncourt

Don Denoncourt is a freelance consultant. He can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..


MC Press books written by Don Denoncourt available now on the MC Press Bookstore.

Java Application Strategies for iSeries and AS/400 Java Application Strategies for iSeries and AS/400
Explore the realities of using Java to develop real-world OS/400 applications.
List Price $89.00

Now On Sale

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: