The Linux Letter: Frying Spam

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

Spam. Also known as unsolicited commercial email (UCE). We all hate it and despise those who create it, but as sys admins, we're all forced to contend with it. I receive more than 2,000 email messages every day (mostly from mailing lists), and my statistics show that fully 25 percent of them are spam. That's not surprising, since my email address appears on public Web sites and in USENET newsgroups, where it has undoubtedly been harvested many times over. Furthermore, my address has been the same since AT&T purchased IBM's Internet business, which makes it at least a decade old, so who knows how many spamming lists it appears on.

When the spammers started filling my mailbox, I was determined not to change my address (the standard method many use to curtail spam) but to instead investigate means to cut it off at the pass. Most of you already have anti-spam measures in place, but for those who don't or for those who want additional protection, I'm going to describe some basic theory and some open-source tools you can employ to build a spam/virus filtering appliance. In fact, some commercial appliances use the techniques and open-source software that I'm going to describe herein, but they hide that fact behind a custom interface. You can either purchase one of these or cut out the middleman and roll your own. Even if you decide not to build an appliance, you may garner some ideas that can be applied to your existing email scrubber to enhance its effectiveness. Let's dive in.

Bare Bones First

I build all of my mail servers and smart host appliances (which sit in front of your mail servers) with Red Hat Enterprise Linux or CentOS, a Red Hat derivative. I chose these two because I want my systems to be built for an extended lifetime, which both of these provide with long-term support, and I want popular distributions so that I can readily find pre-built software packages bundled in RPM format. If you're a Novell shop, you will find similar capabilities with SuSE Linux. This isn't to say that any of the other multitudes of Linux distros (or *BSDs) aren't capable, only that, for me, an enterprise-quality Linux distro is preferred.

As for the mail transport agent (MTA), I eschew Sendmail and instead use Postfix. Both can be loaded during installation, and by using the system-switch-mail command, I can configure the system to use Postfix. Without getting into a great religious argument, I chose Postfix because its author, Wietse Venema, successfully designed Postfix to be extremely secure and extensible. Postfix has UCE controls built in, and that Wietse chose to use plain-text configuration files gave him extra points during my selection, too. Sendmail gurus are equally adept at doing what I propose using their favorite MTA and ancillary software, but if you compare a Sendmail configuration file with a Postfix one, you'll quickly see why I chose the latter.

There is plenty of documentation on how to install Linux as well as how to do a basic configuration of Postfix, so I won't repeat that information here. I would, however, encourage you to configure a bare bones Postfix instance and get it successfully accepting/sending and forwarding mail prior to tweaking it for sentry duty. Doing so will make your life so much easier.

Barbarians at the Gate

Having configured our basic Postfix server, we can now turn our attention to the customizations that will turn it into a high-performance, spam-eating machine.

The best way to fight spam is simply not to accept it in the first place, thus stopping the barbarians at the gate and minimizing the impact on your mail server. But how does one accomplish this? Separating spam from ham (the term for legitimate email) involves a delicate balancing act. On the one hand, you want to be as aggressive as possible in eliminating spam so that your users don't have to wade through excessive amounts, yet on the other hand, you don't want ham messages getting erroneously rejected, thus potentially costing you business. Fortunately, many spammers make it easy to tip the balance in our favor by ignoring the specifications laid out in the Internet Request for Comments (RFC) RFC 2821 - Simple Mail Transfer Protocol. (For those unaware, the internet RFCs are the blueprint for the protocols that make the Internet go. You can do a quick Google search for further information).

Try Again Later

We all know that the email system was designed to be forgiving of unreachable servers and that an email sent today may be held by your MTA until it can be delivered. The first line of defense that I implement is called "greylisting." For that, I use a package called Postgrey. I'm sure that everyone is familiar with the concepts of blacklisting (if you're on the list, I don't accept your email) and whitelisting (if you're on the list, I accept your email). Greylisting is a technique that takes advantage of the fact that a properly configured MTA will make multiple attempts to deliver an email before returning it as undeliverable.

The process is very simple. An MTA connects to my Postfix instance and starts the transmission, giving Postfix the email header information. Postfix hands this information off to Postgrey, which checks to see if it has ever seen the triplet client_ip/sender/recipient before. If it has, or if the sender or domain is Postgrey-whitelisted, then Postgrey returns an "Okay" message back to Postfix, instructing it to accept the message. If the triplet is new, Postgrey will cache the triplet along with a time stamp and then tell Postfix to reject the message with a temporary failure. Postgrey will continue to reject the message until the admin-configurable length of time has passed. A legitimate MTA will hold the email and try again later, but an illegitimate (or improperly configured) MTA will not. This is the case with the majority of home users' computers that have been turned into spam-sending zombies. The simple MTA running on these machines will simply move along to the next victim on the list, leaving your system alone.

What's nice is that Postgrey will automatically whitelist email addresses that it frequently sees, so messages from people with whom you correspond regularly will be delivered without delay. Should such a person not contact you for an extended (user-configurable) period of time, that person will be removed from your whitelist. You also may add users or domains to a permanent whitelist so that mail from them will be accepted immediately, if you so desire.

I'm sure that as this technique gains in popularity, the zombie writers will make their programs more sophisticated so that they act like real MTAs (as required by the RFC). Until that happens, I revel in the relative calm that greylisting has given me.

RFC Violations

If the SMTP server trying to deliver spam to my server is persistent enough to get past Postgrey, it still has some hoops to jump through before the message will be delivered. Postfix has a bevy of anti-UCE capabilities that can easily be switched on under the configuration option "smtpd_recipient_restrictions." Through their use, I can ensure that Postfix will thoroughly inspect the transaction between the two servers and that the transaction, and subsequent envelope information, doesn't "smell fishy." To that end, I configure Postfix to ensure...

that the envelope sender and recipient (From and To) information contains fully qualified domain names. That allows it to verify that the recipient is a valid user for the domain(s) that we serve and that if there is a problem, there is a way to bounce the message back to the sender. This is done via the "reject_non_fqdn_recipient" and "reject_non_fqdn_sender" arguments.
that the domain names provided are valid, through a DNS lookup. This is done via the arguments "reject_unknown_sender_domain" and "reject_unknown_recipient_domain." A spammer can use bogus domain names (and many do) to pass the first test. This one at least requires them to make an effort to use real domain names.
that any client connecting provides a fully qualified and valid host name using the "reject_non_fqdn_hostname" and "reject_invalid_hostname" arguments. While I can't rely on the host name having a DNS record (thus thwarting any attempt to validate the host), an absence of the host name could be indicative of a spambot, or an improperly configured mailer. The same applies to a host name that contains invalid characters. Since implementing the anti-UCE measures, I've had only one instance of legitimate mail being blocked because of a lack of a host name. I had a chat with that sender's system administrator, who fixed his configuration.

Most SMTP servers (and by default, Postfix) are tolerant of some garbage envelope entries as long as they can deduce what was requested. In addition to the aforementioned RFC requirements, I made the decision to enable the Postfix "strict_rfc821_envelopes" switch, which makes it intolerant of mailers whose email envelope (From, To, etc.) information doesn't strictly conform to the RFC. The Postfix documentation warns that many mailers produce junk envelopes and that enabling strict envelopes will cause their mail to get rejected. I considered the implications and decided that I shouldn't have to tolerate junk mail as a result of poor design, thus my decision to adopt this draconian policy. I have yet to have any complaints from my users. My guess is that as a result of the UCE scourge, I'm not the only one taking this posture, thus authors of email software are finally toeing the line and following the specs.

It's amazing to me how many spam messages get blocked using just these techniques. I extracted a couple of hours worth of email logs and did some quick analysis. Of the 25,258 entries, 515 were rejection messages. Of those, 283 were greylist rejections, and 139 were rejected because of RFC violations. Some of the sources for the remainder are discussed later.

Real-Time Black Listing

Even if the email envelope conforms to the RFCs, it doesn't mean that I'm willing to accept the message. Many emails pass the RFC tests but are delivered by known spam hosts and can therefore be rejected. The use of blacklists can be controversial (at least for those who have ended up on the list), but I find the one run by the SpamHaus Project to be very good.

Configuring Postfix to avail itself of those resources is simple. Just add another argument, "reject_rbl_client sbl-xbl.spamhaus.org," to the "smtpd_recipient_restrictions" directive, and Postfix will check the inbound server's IP against the list. If it's found, the message will be rejected. If not, the message will pass unmolested. All of this for the cost of a single line in my configuration file and one DNS lookup. Since I also run a caching DNS server on my spam appliances, subsequent attempts to send mail from the same IP will be rejected without any DNS requests even leaving the box. In my impromptu statistics, 98 messages were rejected because they were coming from known spammers.

That Personal Touch

Postfix allows you to use regular expressions and hash tables to further define what is acceptable or not. In other words, you can fine-tune Postfix's behavior to your company's requirements. For example, you may not want some or all of these rules to apply if you're sending outbound or intra-office mail. For instance, you may not want to spend the CPU cycles and network bandwidth to do DNS lookups if the email is outbound.

Furthermore, you may want to inspect subject lines or domain names and cut messages off at the pass if you find them objectionable. As an example, we have a couple of companies that continue to send us unwanted email, in spite of being notified that it's unwanted. Sure, there's the CAN-SPAM (pronounced Can Spam!) act that I suppose I could invoke to get the mail stopped, but quite frankly, it's much easier to add one line to a config file than it is to jump through hoops to get the government to take action.

There's More

Much of this article touched on Postfix's UCE control options, but there's more. As I earlier mentioned, Wietse did a great job designing this MTA to be extensible, so you can add functionality to it.

Even with all of the controls I have in place, the spam still sometimes gets through. To combat it, I took advantage of the extensibility. Once Postfix has given its blessing to an email, it pipes it to DSPAM (another open-source project), which categorizes the mail as ham or spam. DSPAM was one of the first to use heuristics to categorize spam. It doesn't simply look at headers or content for obvious giveaway terms but, instead, looks at all of the terms and their relationships. DSPAM learns what to look for based on the initial training you give it (with a corpus of ham and spam) and by subsequent corrections you provide to it.

Once the message has been accepted and categorized, it has one final trial: It's piped through two anti-virus packages (F-Prot and the open-source ClamAV) before finally being delivered to the user. You can customize this ad infinitum, both from within Postfix and with external programs, providing as much content-filtering as your situation requires.

A Great Payoff

I have been very pleased with the performance of our email system. The cost to provide all of this functionality was minimal—and certainly less than the cost of a commercial appliance. What is more important to me than the initial cost, however, is my ability to reconfigure and tweak this thing as our needs change. Right now, the system we're running is more than enough to handle our load, and the nature of all of the software makes it possible to distribute the workload across many servers, if need be.

You're welcome to visit all of the links I've provided to see what you're getting into. For those who want to hop into the express line, I recommend The Book of Postfix: State-of-the-Art Message Transport by Ralf Hildebrandt and Patrick Koetter. It contains all the information you'll need, with much of it in cookbook form.

If you want a fun project with a great payoff to your company, I'd suggest that you consider building one of these appliances. You'll learn quite a bit about how the Internet email system is supposed to work, and you'll end up with a versatile system that will support your electronic correspondence needs for the long-term. Enjoy!

Barry L. Kline is a consultant and has been developing software on various DEC and IBM midrange platforms for over 23 years. Barry discovered Linux back in the days when it was necessary to download diskette images and source code from the Internet. Since then, he has installed Linux on hundreds of machines, where it functions as servers and workstations in iSeries and Windows networks. He co-authored the book Understanding Linux Web Hosting with Don Denoncourt. Barry can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

Barry L. Kline is a consultant and has been developing software on various DEC and IBM midrange platforms since the early 1980s. Barry discovered Linux back in the days when it was necessary to download diskette images and source code from the Internet. Since then, he has installed Linux on hundreds of machines, where it functions as servers and workstations in iSeries and Windows networks. He co-authored the book Understanding Web Hosting on Linux with Don Denoncourt. Barry can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

Bare Bones First

Barbarians at the Gate

Try Again Later

RFC Violations

Real-Time Black Listing

That Personal Touch

There's More

A Great Payoff

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2

Book Review: The Chief Data Officer Handbook for Data Governance

Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Book Review: 5 Keys to Business Analytics Program Success

Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile

Book Review: Flexible Input, Dazzling Output with IBM i

Book Review: Advanced Guide to PHP on IBM i

Book Review: Java for RPG Programmers