This article is an excerpt from Chapter 5, "Database Constraints," of Database Design and SQL for DB2 (MC Press, 2013).
Constraints, sometimes referred to as integrity constraints or integrity rules, are restrictions based on business rules and other business policies and procedures that ensure data in a database are acceptable and accurate.
A DBMS can provide features that can be used to specify integrity constraints. The features usually include range checks (e.g., "between 1 and 50") or allowable values (e.g., "equal to A or B or C"). More-complex constraints may be specifiable as well; examples include relationships between columns within the same table (e.g., an employee's hire date must be greater than his or her birth date) or inter-row relationships (a customer number value in an order table must exist as a customer number value in exactly one customer table). The DBMS, which handles all updating of the database, checks these constraints when database rows are inserted, updated, or deleted. If a constraint is not met, the DBMS blocks the operation and signals an error.
Three integrity rule categories allow the specification of important constraints that a relational DBMS can enforce automatically whenever a database update occurs:
- Data integrity
- Entity integrity
- Referential integrity
These rules protect not only the specific values in columns but also the identity and interrelationships of rows.
Data Integrity
Data integrity defines the possible values of a column. In a database system, data integrity is defined by
- Data type and length
- Null value acceptance
- Allowable values
- Default value
For example, if an age column in an EMPLOYEES table is defined as an integer, the value of every instance of that column must always be numeric and an integer. If this column is defined so that it must always be positive, a negative value is forbidden. The value of the column being mandatory indicates that the column can be NULL.
All these characteristics form the data integrity of this column. This type of data integrity warrants the following:
- The identity and purpose of the column are clear, and all the tables in which the column appears are properly identified.
- Column definitions are consistent throughout the database.
- The values of the column are consistent and valid.
- The types of modifications, comparisons, and operators that can be applied to the values in the column are clearly identified.
Each column in the model should be assigned attributes that include the following:
- Data Type—Basic data types are integer, decimal, and character. Most databases support variants of these types plus special data types for date and time.
- Length—The length is the number of digits or characters in the value, such as a value of 5 digits or 40 characters.
- Date Format—The date format specifies the format for date values, such as dd/mm/yy, mm/dd/yyyy, or yy/mm/dd.
- Range—The range specifies the lower and upper boundaries of the values that the column may legally have.
- Constraints—Constraints are special restrictions on allowable values. For example, the retire_date for an employee must always be greater than the hire_date for that employee.
- Null Support—This attribute indicates whether the column can have null values.
- Default Value (if any)—The default value is the value that a column will contain if no value is entered.
Entity Integrity
The second category of integrity essential to the relational database model is entity integrity. This is a fairly straightforward concept: Every row in a table represents an entity (i.e., an instance of an entity type) that must exist in the real world; therefore, every row must be uniquely identifiable. It follows that no completely duplicate rows (all column values identical) can exist in a table; otherwise, the unique existence of entities is not represented in the database.
From this property of uniqueness is derived the principle that there exists in every table some set of columns (possibly the entire table's columns) whose values are never duplicated entirely in any two rows in the table. If the set of columns includes no superfluous columns, or ones not needed to guarantee uniqueness, the set of columns can serve as the table's primary key. More than one possible set of columns may meet the criteria for a primary key; each of these is referred to as a candidate key, and one is picked arbitrarily as the primary key.
The primary key is a set of columns whose values are unique for all rows in a table. Because of this fact, the primary key forms the only means of addressing a specific row in the relational database model. A consequence of the requirement for unique primary key values is that none of the values in a row's primary key columns can be null; that is, none can be missing or unknown. None of the primary key columns can be null because otherwise this row's primary key value could not be guaranteed to be unequal to some other row's primary key value. In the relational model, null is essentially a placeholder that means the value of this column is unknown. There is no way to know that the null value is not also present in the same primary key column of some other row. Thus, if employee_id is the column serving as the primary key in the EMPLOYEES table and there exists a row with employee_id = NULL, there is no guarantee that another row might have a null value for employee_id. A similar argument holds for primary keys made up of more than one column, known as composite primary keys, which require all column values to guarantee uniqueness and hence cannot have null for any of the columns in the primary key.
Referential Integrity
The third, and final, category of integrity fundamental to the relational database model is referential integrity. Referential integrity is a database concept that ensures that relationships between tables remain reliable. That is, when a table contains a foreign key associated with another table's primary key, the concept of referential integrity states that a row containing the foreign key may not be added to the table unless a matching value exists in the primary key column of the parent table. In addition, referential integrity includes the concepts of cascading updates and deletes, which ensure that changes made to the table with the foreign key are reflected in the primary table.
Constraint Types
In a DBMS, database constraints provide a way to guarantee that rows in a table have valid primary or unique key values, that rows in a dependent table have valid foreign key values that reference rows in a parent table, and that individual column values are valid. All constraints are enforced when rows are inserted or updated in the table. This chapter discusses five constraints, identified in Table 5-1.
Constraint |
Description |
PRIMARY KEY |
Identifies which column is the unique identifier or primary key for each row in the table. The values in the primary key column must be unique for every row in the table. Since the primary key must be unique, it cannot be NULL. |
FOREIGN KEY |
Is what makes the relational database work. For every one-to-many (parent-child) relationship in the database, a foreign key constraint is added to the child (many) table. The foreign key in the child (many) table links to the parent (one) table. Thus, if a row is added to the child table, the value entered in the foreign key column must already exist as a primary key in the parent table. |
UNIQUE |
Identifies a column as containing unique values for the UNIQUE column in each row in the table. The UNIQUE constraint differs from a primary key in that it allows NULL values. |
CHECK |
Enforces a business rule on a column. Before a value can be entered into a CHECK column, the condition (business rule) specified in the CHECK constraint must be true. |
NOT NULL |
Ensures that a column identified as NOT NULL will not contain a NULL value. |
TABLE 5-1: Constraint types
LATEST COMMENTS
MC Press Online