Practical SQL: Listing Based on Group Conditions

SQL

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

SQL has many features to group and summarize, but you need a little know-how to find the detail under the summaries.

If you know me, you know I love Common Table Expressions (or CTEs). In today's article, we're going to take advantage of CTEs and aggregate functions to learn how to find data that matches group conditions. Group conditions come up in many situations, whether it's looking for duplicate items, finding completed orders, or identifying mismatches on denormalized data. I'll show you how to use those group conditions to retrieve the underlying data.

Why Is This an Issue?

Think of it this way: in order to find group conditions, you group your data rows by key values. You then use aggregate functions such as COUNT and SUM to find the group conditions. In so doing, you by definition lose any other values in those rows. Let's give a simple example and the traditional solution:

select CUSTNO from CUSTMAST

group by CUSTNO having COUNT(*) > 1

order by CUSTNO

Not much to it, but it really underscores the issue. If I give the results of this query to users in order to help ascertain the problem, they're going to immediately want to at the very least know the name of the customer that's duplicated. (And yes, I know that you can avoid this sort of thing with unique keys, but not all systems have their data models set up that completely, and not all group conditions lend themselves to database constraints.) In the past, we've been able to easily enough give them at least one of the names this way:

select CUSTNO, max(CUSTNAME) from CUSTMAST

group by CUSTNO having COUNT(*) > 1

order by CUSTNO

Ah, very clever! We use another aggregate function, MAX, in order to at least select one of the customer names. Quick quiz question: what happens if you don't use MAX and instead just try to include the CUSTNAME field in the list? Answer: you get a "Column or expression in select list not valid" error! But you knew that already. If you think about it, though, the technique above is pretty limited. It only gets you one value, and if you want to add more fields, you have to do MAX on all of them, which means you might be getting values from different rows. All in all, not an optimal solution. Enter the CTE:

with CustDups as

(select CUSTNO from CUSTMAST

group by CUSTNO having COUNT(*) > 1)

select * from CUSTMAST

where CUSTNO in (select CUSTNO from CustDups)

order by CUSTNO, CUSTNAME

There we go! Now I'll show all columns of every row that has a duplicate. I can, of course, limit the rows for the user by specifying a list of fields rather than the asterisk in my SELECT list.

Other Syntaxes for Other Uses

I could have done this with a JOIN as well:

with CustDups as

(select CUSTNO from CUSTMAST

group by CUSTNO having COUNT(*) > 1)

select * from CUSTMAST join CustDups using (CUSTNO)

order by CUSTNO, CUSTNAME

It's a little more concise, but has its pros and cons. The biggest benefit of the JOIN approach makes itself felt when you're doing more analytical types of inquiries:

with OrderSum as

(select ORDNO, SUM(QTYORD) Ordered, SUM(QTYRCV) Received

from ORDDTL group by ORDNO)

select * from OrderSum join ORDDTL using (ORDNO)

where Received >= Ordered

order by ORDNO, ORDLIN

This is a pretty simple case: I'm showing the detail lines for every order where the total received quantity is greater than or equal to the total ordered quantity. Note that this may or may not be a particularly useful query, since an over-received line might cancel out an under-received one. Rather than check an aggregate condition, you might want to test each line individually. For example, you might want to show all the lines for every order that has at least one over-received line. That would be cool!

You could do that with this query:

with OverOrders as

(select ORDNO from ORDTL where QTYRCV > QTYORD

group by ORDNO)

select * from ORDDTL

where (ORDNO) in (select ORDNO from OverOrders)

order by ORDNO, ORDLIN

Slick, right? Although I know some of you might be saying we could replace the SELECT..GROUP BY with a simple SELECT DISTINCT, like this (I'm only going to show the CTE definition, which is the part that defines the temporary table OverOrders):

with OverOrders as

(select distinct ORDNO from ORDTL where QTYRCV > QTYORD)

You'd be right. Just using the SELECT DISTINCT would allow me to reduce a little of the complexity. I love DISTINCT; it allows me to do quite a few things, one of which I'll show you in a second. But the first syntax allows you a little finer-grained control. Let's imagine, for example, you needed to know situations where more than one line was over-received. You could do that with just a simple extension of our first query:

with OverOrders as

(select ORDNO from ORDTL where QTYRCV > QTYORD

group by ORDNO having COUNT(*) > 1)

Like before, the clause above is just the definition of the CTE that identifies the orders that meet the criteria, not the subsequent SELECT that brings in the detail data. You can see that using the GROUP BY syntax allows you to use the COUNT aggregate function and would, by extension, allow you to use any aggregate function, such as SUM or MAX. It's a very powerful technique.

A Last Challenging Problem

OK, maybe it's not all that challenging, but it took me quite a while to figure it out. Now that I know the secret, though, I find myself using it all the time. The generic situation is simple: give me a list of detail for groups of records where some field in the row has more than one value within that group. The generic description is a little vague, so let me give you a more concrete example: give me a list of all the PO lines for items that we buy from multiple vendors. Yes, there may be a good reason to buy from multiple vendors, but there could also be an opportunity for savings by using a single source. So how do we do that? Simple:

with ItemVendor as

(select distinct POITEM, POVEND from PODTL),

MultiVend as
(select POITEM from ItemVendor

group by POITEM having COUNT(*) > 1)

select * from PODTL

where POITEM in (select POITEM from MultiVendor)

This is a multi-step process. First, I use DISTINCT to create a list of item/vendor combinations. Then, I use GROUP BY to narrow that initial list to only those items purchased from more than one vendor. Why do I have to do this? Because I can't just get a list of all items with more than one line; the lines could all be from the same vendor. I first have to coalesce all the lines for each item/vendor and then use those to identify the items that truly are purchased from multiple vendors. Finally, I take that list and use it to select all the detail lines from the original detail table. At the end, we have a list of all the PO lines for every item that we purchase from different vendors. And because we're going back to the underlying detail, the query includes things like date and price. That's the sort of analytics that can really help you make business decisions—and without having to go to an expensive BI solution!

This should give you some ideas on how using aggregation can get you the data you need for your business. Happy data-diving!

Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been extending the IBM midrange since the days of the IBM System/3. Joe uses WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. He has written several books, including Developing Web 2.0 Applications with EGL for IBM i, E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. Joe performs onsite mentoring and speaks at user groups around the country. You can reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..

MC Press books written by Joe Pluta available now on the MC Press Bookstore.


		Developing Web 2.0 Applications with EGL for IBM i Joe Pluta introduces you to EGL Rich UI and IBM’s Rational Developer for the IBM i platform. List Price $39.95 Now On Sale

		WDSC: Step by Step Discover incredibly powerful WDSC with this easy-to-understand yet thorough introduction. List Price $74.95 Now On Sale

		Eclipse: Step by Step Quickly get up to speed and productivity using Eclipse. List Price $59.00 Now On Sale

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2

Book Review: The Chief Data Officer Handbook for Data Governance

Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Book Review: 5 Keys to Business Analytics Program Success

Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile

Book Review: Flexible Input, Dazzling Output with IBM i

Book Review: Advanced Guide to PHP on IBM i

Book Review: Java for RPG Programmers

Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Book Review: Developing Web 2.0 Applications with EGL for IBM i

Book Review: Advanced Integrated RPG

Book Review: Managing Without Walls

Book Review: Managing Without Walls

Book Review: The Remote System Explorer

Book Review: IBM System i APIs at Work, Second Edition