.. include:: Summary of Date's book Datebase in Depth ---------------------------------------- Tuples and Relations ~~~~~~~~~~~~~~~~~~~~ tuple or tuple value ^^^^^^^^^^^^^^^^^^^^ Definition: Let *T1, T2, ..., Tn* (*n* ≥= 0) be type names, not all necessarily distinct. Associate with each type *Ti* a distinct attribute name, *Ai*; each of the *n* attribute-name:type-name pairs that results is an *attribute*. Associate with each attribute a value *vi* of type *Ti*; each of the *n* attribute:value combinations that results is a *component*. The set of all *n* components thus defined, *t* say, is a *tuple value* (or just *tuple* for short) over the attributes *A1, A2, ..., An*. The value *n* is the *degree* of *t*; a tuple of degree one is *unary*, a tuple of degree two is *binary*, a tuple of degree three is *ternary*, ...., and more generally a tuple of degree *n* is *n-ary*. The set of all attributes is the *heading* of *t*. To summarize, a component is an attribute plus its value. The set of all n components is a *tuple* or *tuple value*. The set of all *n* attributes is the *Heading*. *n* is the *degree* of the tuple. In TUTORIAL D a *tuple type* name is written *TUPLE {H}*, where *{H}* is the heading. If we have, for example, two attributes [STATUS: INTEGER] and [SUPPLIER\_NAME: NAME], then the tuple type would be written as :: TUPLE { STATUS INTEGER, SUPPLIER_NAME NAME } Notice, we are putting the attribute's name first followed by its type. This is the practice throughout. A sample tuple (tuple value) might be :: { 20, NAME("General Electric Corporation") } An attribute's type does not have to be a SQL type (such as VARCHAR). In an attribute such as [EMPLOYEE: NAME], where NAME is the type and EMPLOYEE is the attribute name, NAME would no doubt be implemented in our SQL database as two VARCHAR columns, first\_name and last\_name, and NAME equality would involve comparing both first\_name and last\_name. Atomicity ^^^^^^^^^ The value of an attribute should be atomic: it should contain only one useful piece of information and must never require further parsing. Selectors ^^^^^^^^^ A tuple value is returned or instantiated from a tuple *selector* invocation: :: TUPLE { STATUS INTEGER, SUPPLIER_NAME NAME} // type declaration TUPLE { STATUS 20, SUPPLIER_NAME NAME('Smith') } // tuple selector instantiates the tuple The attribute type is omitted in the selector because it can always be inferred from the type of the expression denoting the attribute value. A note on notation ^^^^^^^^^^^^^^^^^^ The keyword TUPLE does double duty in Tutorial D. It's used both to declare tuple *type names* and in tuple selector invocations. Extracting attribute values from tuples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In Tutorial D the syntax from extracting an attribute value from a tuple is *attribute-name FROM tuple*. For example, :: TUPLE { STATUS 20, SUPPLIER_NAME NAME('Smith') } t; // selector invocation STATUS FROM t; // extraction of tuple attribute The empty tuple ^^^^^^^^^^^^^^^ A tuple with an empty heading has type TUPLE {} (it has no components). It is called the *0-tuple* to emphasize that it is of degree zero. It is also sometimes called an *empty tuple*. Tuple equality ^^^^^^^^^^^^^^ *Definition*: Tuples *t1* and *t2* are *equal* if and only if they have the same attributes *A1, A2, ..., An*. That is, they both are of the same tuple type. And for all *i (i = 1, 2, ..., n)*, the value *vi* of *Ai* in *t1* is equal to the value in *t2*. Relations and relation values ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *Definition:* Let *{H}* be a tuple heading and let *t1, t2, ..., tm* be distinct tuples with heading *{H}*. The combination, *r* say, of *{H}* and the set of tuples *{t1, t2, ..., tm}* is a *relation value* (or just a *relation* for short) over the attributes *A1, A2, ..., An,* where *A1, A2, ..., An* are the attributes in *{H}*; the *heading* of *r* is *{H}*; *r* has the same attributes (and hence the same attribute names, and types) and the same degree as that heading does. The *body* of *r* is the set of tuples *{ t1, t2, ...,tm}.* The value *m* is the *cardinality* of *r.* The relational model is so called because it deals with certain abstractions that we can think of as "tables" but are known, formally, as relations in mathematics. You define a relation's type in **TUTORIAL D** as **RELATION {H}**, where H is the heading. A *relation value* (or *relation* for short) is its set of unique tuples. An example of the definition of a relation type :: RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR} Every relation value is returned by some relation selector invocation; for example, the relation type above might have this value. :: RELATION { TUPLE {SNO SNO('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London'}, TUPLE {SNO SNO('S2'), SNAME NAME('Jones'), STATUS 15, CITY 'Miami'}, ... } Relations by definition never contain duplicate tuples, and tuples never contain NULLs. Relation are by definition in first normal form (1NF) because a tuple can have only one value for each of its attribute, which is the definition of 1NF. Both relations and their tuples contain no sense of ordering. Empty relations ^^^^^^^^^^^^^^^ A relation can be empty, not containing any tuples. For every relation type is there is exactly one empty relation (of that paricular relation type). Thus, two empty relations will both have equivalent empty bodies, but their Heading with be different. TABLE\_DUM and TABLE\_DEE ^^^^^^^^^^^^^^^^^^^^^^^^^ TABLE\_DUM and TABLE\_DEE are both relations of degree 0, meaning they have no attributes. They are of type RELATION {}. TABLE\_DUM has no tuples. TABLE\_DEE has one tuple, the 0-tuple or empty tuple. Relation Variables ~~~~~~~~~~~~~~~~~~ relation and relvar ^^^^^^^^^^^^^^^^^^^ A relation is shorthand for *relation value* in the same way that integer is shorthand for *integer value*. Relvar is shorthand for *relation variable*. A relvar is a holder for a representation of a relation value. Relvars are analgous to variables in a strongly typed language like, say, C++, and relations are likewise analogous to the values of those variables at a given point in time. Relvars can be characterised as *base relvars* or *derived relvars* also know as views. Here is the definition of a base relvars in Tutorial D. :: VAR S BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } Key { SNO }; Updating of relvars is a set-at-a-time operation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Unlike SQL, operations like INSERT, DELETE and UPDATE are, in the relational algebra, set-level operations not tuple-level opertaions: one relation value, say *r1,* is always entirely replaced by another relation value, say, *r2.* Candidate keys ^^^^^^^^^^^^^^ *Definition*: A subset of the heading of a **relvar** is a candidate key K if it satisfies two conditions: - The uniqueness property: no possible value of the relvar ever has two tuples with the same value for K. - The irreducibility property: no proper subset of the the candidate key has this uniqueness property. Keys are defined on relvars not relations. Keys imply certain integrity contraints, a uniqueness constraint in particular, and integrity contraints apply to variables not values. Key values are tuples. *which means what????*\ <-- Given this relation, which represents marriages :: RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE } if we assume no polygamy and that no couples marry each other more than once, then this relvar :: VAR MARRIAGE BASE RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE } would have three candidate keys: :: KEY {SPOUSE_A, SPOUSE_B} KEY {SPOUSE_A, DATE_OF_MARRIAGE} KEY {SPOUSE_B, DATE_OF_MARRIAGE} So if this relvar contained a TUPLE of :: TUPLE { SPOUSE_A NAME("Betty Smith"), SPOUSE_B NAME("John Smith") DATE_OF_MARRIAGE DATE("09-11-1955") } the value for the key {SPOUSE\_A, SPOUSE\_B } would be :: TUPLE { SPOUSE_A NAME("Betty Smith"), SPOUSE_B NAME("John Smith") } Primary key ^^^^^^^^^^^ Any one of the candidate keys can be chosen as the **primary key**. Usually, however, the simplest candidate key is chosen. See pp. 63-64. Foreign keys ^^^^^^^^^^^^ *Definition*: Let *R1* and *R2* be relvars, not necessarily distinct, and let *K* be a key for *R1*. Let *FK* be a subset of *R2* that, possibly after some attribute renaming, involves exactly the same attributes as *K*. Then *FK* is a *foreign key* if and only if, at all times, every tuple in *R2* has an *FK* value that is equal to the *K* value in some (necessarily unique) tuple in *R1* at the time in question. Using this example of two relvars BOOKS and BOOK\_CHAPTERS :: VAR BOOKS BASE RELATION { BOOK_ID ISBN, BOOK_TITLE CHAR, BOOK_PRICE FLOAT} KEY { BOOK_ID }; VAR BOOK_CHAPTERS BASE RELATION { CHAPTER_ID INTEGER, CHAPTER_NUMBER INTEGER, BOOK_ID ISBN, CHAPTER_NAME CHAR } KEY { CHAPTER_ID } FOREIGN KEY { BOOK_ID} REFERENCES BOOKS; **BOOK\_ID** in BOOK\_CHAPTERS is a foreign key. While the same BOOK\_ID value may occur in multiple tuples of BOOK\_CHAPTERS, there is one and only one tuple in BOOKS with that same BOOK\_ID value. The foreign key need not have the same name in the referencing relvar as in the referenced relvar. In Tutorial D this fact would be shown using the RENAME operator. For example :: VAR BOOKS BASE RELATION { BOOK_ID ISBN, BOOK_TITLE CHAR, BOOK_PRICE FLOAT } KEY { BOOK_ID }; VAR BOOK_CHAPTERS BASE RELATION { CHAPTER_ID INTEGER, CHAPTER_NUMBER INTEGER, BOOK_ISBN ISBN, CHAPTER_NAME CHAR } KEY { CHAPTER_ID } FOREIGN KEY { RENAME (BOOK_ISBN AS BOOK_ID) } REFERENCES BOOKS; The foreign key can even reference the same relvar. That is, *R1* and *R2* in the foreign key definition can be the same relvar. For example, :: VAR EMP BASE RELATION { ENO ENO, ...., MNO ENO, ...} KEY { ENO } FOREIGN KEY { RENAME ( MNO AS ENO ) } REFERENCES EMP ; Here attribue MNO denotes the employee number of the manager of the employee identified by ENO; thus the referencing relvar and the referenced relvar are the same. In order for the tuple equality comparison to be valid we rename the MNO attribute, so that it is of same type, since an attribute's name, like ENO, is part of its type. SQL supports certain referential actions such as CASCADE, which, though very useful, even essential, are not strictly part of the relational algebra. Base relvars and virtual relvars or views ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ base relation and base relvar. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A base relation is a relation of some type with an initial value. view or virtual relvar ^^^^^^^^^^^^^^^^^^^^^^ *Definition:* A *view* or *virtual relvar V* is relvar whose value at time *t* is derived by evaluting a certain relational expression at that time *t*. The expression is specified when *V* is defined and must mention at least one relvar. Virtual relvars can be used like base relvars because of *closure*. We can define further views on top of them. +---------------------------------------------+-------------------------------------------------------------------+ | Tutorial D | SQL | +=============================================+===================================================================+ | VAR LS VIRTUAL (S WHERE CITY = 'London'); | CREATE VIEW LS AS (SELECT S.\* FROM S WHERE S,CITY = 'London'); | +---------------------------------------------+-------------------------------------------------------------------+ Views can have candidate keys just like base relvars. They can have integrity constraints. We can update views (although SQL's support is weak for this). You can discover a MySQL VIEW's underlying SELECT statement with this command: :: mysql> CREATE VIEW LS AS (SELECT S.* FROM S WHERE S,CITY = 'London'); mysql> SELECT LS, is_updatable, view_definition FROM INFORMATION_SCHEMA.VIEWS; snapshot versus view ^^^^^^^^^^^^^^^^^^^^ A snapshot is not a view because it has its own separate copy of the data. Relvars and Predicates ~~~~~~~~~~~~~~~~~~~~~~ The heading of a relvar is meant to represent a certain predicate, a generic statement about the real world. It is the intended interpretation, the meaning, of the relvar, also called the *intention*. A predicate is a truth-valued function that returns TRUE or FALSE. This if relvar *R* has predicate *P*, then every tuple *t* in *R* at some given time can be regarded as a certain proposition derived by invoking *P* with the attribute values from *t* as arguments. Every proposistion *p* obtained by substituting a tuples's values always evaluates to TRUE. So a given relvar contains, at any given time, all and only the tuples that represent true propositions (instantiations that return TRUE) of the predicate. A database should be thought of as a collection of facts or true propositions, in which the heading of a relvar represents a description of something going on in the real world. This intended intrepretation, the real meaning of the relvar, is called the relvar's *Predicate*. The Predicate can be though of as a truth-value function that when given specific values returns TRUE or FALSE. So the heading of every relation should have an associated Predicate that describes the relation's real meaning. The Predicate is a function whose arguments are the names of the attributes in the heading of the relation, and a relvar's contents, its tuples, then become true propositions (at a given point in time). While the tuples may change over time, the Predicate does not. Formal Predicate Definition ^^^^^^^^^^^^^^^^^^^^^^^^^^^ *Predicate Definition:* Every relvar *R* has an associated Predicate *P*. *P* is the *intended interpretation* or *intension* for *R*. It does not change over time. When the values of a tuple *p* are substitued in the revlar's predicate *P*, the resulting proposition is always true. A relvar, at any given time, contains *all* and *only* the tuples that represent true propostions, true instantions of the predicate. *Extension of a Predicate* If *P* is the predicate of some relvar *R* whose value at some given time is *r*, then the body of *r* constitutes the *extension* of *P* at that time. The extension varies over time, but the intention, the Predicate, does not. Sample predicates ^^^^^^^^^^^^^^^^^ Types are things in the real world we can talk about; relations are true statements about those things. Types are to relations as nouns are to sentences. If we take, as an example, the relvar for the suppliers relation (on pp. 11, 61 in Date's book) :: VAR S BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY {SNO }; the Predicate would be the following sentence (recalling that the attribute's name comes first in the declaration above, then its type) : *SNO* is a **supplier part number** for **supplier name** *SNAME* with **status** of *STATUS*, which means ..., and is located in *CITY* **city** Here is another example of the predicate for an email contacts relvar. :: VAR EMAIL_CONTACTS BASE RELATION { CONTACT_ID INTEGER, NAME NAME, EMAIL EMAIL, OPT_OUT BOOLEAN } KEY {CONTACTID}; The Predicate, the sentence describing the meaning of each attribute, would be *CONTACT\_ID* is the **unique identifier** of someone **named** *NAME* whose **email address** is *EMAIL* and whose **opt status** is *OPT\_OUT*. The current value of EMAIL\_CONTACTS consists of all tuples that currently satisfies this Predicate. The principle of Orthogonality ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Database Constraints should not be easy to violate because duplicate information, dulicate tuples or subsets of tuples which duplicate part of a larger tuple, could result. See the `principle of orthogonal design <#orthogonality>`__ Relational Algebra ~~~~~~~~~~~~~~~~~~ The relational algebra operators are defined on relations and not strictly speaking on relvars, although their application to relvars is no different. Relation operators are *generic*: they apply, in effect, to all possible relations. They are also *read-only*. They read their operations and return a result. INSERT, UPDATE and DELETE (and relation assignment), while they are relational operators, aren't technically part of the relational algebra. Some differences between SQL and Tutorial D ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - When a relational algebra operations, like UNION or JOIN, requires a correspondence between operand attributes to be established, Tutorial D requires the attributes to have the same name. - Tutorial D sometimes needs to rename attributes to avoid naming clashes or mismatches. SQL usually doesn't - SQL requires most queries to conform to SELECT — FROM — WHERE template. Tutorial D has no analogous requirement. Tutorial D RENAME operator takes one relation as input and returns a relation identical to the input except that one of the attributes has a different name. For example +----------------------------+-----------------------------------------------------------+ | **Tutorial D RENAME** | **SQL equivalent** | +============================+===========================================================+ | S RENAME (CITY AS SCITY) | SELECT S.SNO , S.NAME, S.STATUS, S.CITY AS SCITY FROM S | +----------------------------+-----------------------------------------------------------+ RESTRICT ^^^^^^^^ Let *bx* be a boolean expression involving zero or more attribute names such that all of the attributes mentioned are atrributes of the same relation *r*. Then the *restriction* of *r* according to *bx*: :: r WHERE bx is the relation with a heading the sameas that of *r* and a body consisting of all typles of *r*\ for which *bx* evaluates to TRUE. For example, +--------------------------+---------------------------------------------+ | S WHERE CITY = 'Paris' | SELECT S.\* FROM S WHERE S.CITY = 'Paris' | +--------------------------+---------------------------------------------+ will not contain any duplicate tuples. PROJECT ^^^^^^^ Let relation *r* have attributex X, Y,...Z (and possibly others). Then the *projection* of *r* on *X, Y,...., Z*: :: r { X, Y, ..., Z } is a relation with (a) a heading derived from the heading of *r* by removing all attributes not mentioned in the set *{X, Y, ..., Z}* and (b) a body consistiting of all typles *{X x, Y y, ..., Z z}* such that a tuple appears in *r* with *X* values *x*, *Y* values *y*, ..., and *Z* value *z*. For example, +----------------------------+----------------------------------------------------+ | Tutorial D | MySQL | +============================+====================================================+ | S { SNAME, CITY, STATUS} | SELECT DISTINCT S.SNAME, S.CITY, S.STATUS FROM S | +----------------------------+----------------------------------------------------+ DISTINCT is needed in the SQL formulation to insure no duplicates occurs. The Tutorial D forumlation can also be expressed in terms of the attributes to be discarded. Thus :: S { SNAME, CITY, STATUS } is equivalent to :: S { ALL BUT CITY } PROJECT has a higher precedence than other (all?) operators, so :: S JOIN P { PNO, CITY } is equivalent to :: S JOIN ( P { PNO, CITY } ) JOIN ^^^^ *Definition:* Let relations *r* and *s* jave attributes *X1, X2, ..., Xm, Yq, Y2, ..., Yn, Z1, Z2, ..., Zp,* respectively; in other words, the *Y's* (*n* of them) are the common attributes, the *X's* (*m* of them) are the other attributes of *r*, and the *Z's* (*p* of them) are the other attributes of *s*. We can assume without loss of generality that none of the *X's* has the same name as any of the *Z's*, thanks to the availability of RENAME. Now let the *X's* taken together be denoted just *X*, and similarly for the *Y's* and the *Z's*. Then the *natural join*\ (*join* for short) of *r* and *s*: :: r JOIN s is a relation with (a) a heading that is the (set-theoretic) union of the heading of *r* and *s* and (b) a body consisting of of the set of all tuples *t* such that *t* is the (set -theoretic) union of a tuple appearing in *r* and a tuple appearing in *s*. In other words the heading is *(X, Y, Z)* and the body consists of all tuples *(X x, Y y, Z z)* such that a tuple appears in *r* with *X* value *x* and *Y* value *y* and a tuple appears in *s* with *Y* value *y* and *Z* value *z*. Less formally, JOIN first locates tuples with the same values for the shared attribues. It then concatenates, joins, the unshared attribute values of those same tuples to produce the result. The closest SQL equivalent to the Tutorial D expression P JOIN S would be SELECT \* FROM P NATURAL JOIN S, though not all SQL products support this syntax; otherwise, one would have to list all the attributes :: SELECT P.PNO, P.PNAME, P.COLOR, P.WIEGHT, P.CITY, S.SNO, S.SNAME, S.STATUS FROM P, S WHERE P.CITY = S.CITY; *Intersection* and *cartesian product* are special cases of JOIN. If *m = p = 0* (meaning there are no *X's* and not *Z's*, and *r* and *s* are thus of the same type), then *r JOIN s* degenerates to *r INTERSECT s*. If *n = 0*, meaning there are no *Y's* and *r* and *s* thus have no common attributes, *r JOIN s* degenerates to *r TIMES s*. +-------------------+-------------------------------------+-------------------------------------------------------------------------------------+ | Tutorial D JOIN | SQL equivalent | SQL equivalent | +===================+=====================================+=====================================================================================+ | S JOIN SP | SELECT \* FROM S NATURAL JOIN SP; | SELECT DISTINCT P.PNO, P.NAME, P.COLOR, P.WEIGHT, P.CTY, S.SNO, S.SNAME, S.STATUS | | | | FROM S,SP WHERE S.SNO = SP.SNO; | +-------------------+-------------------------------------+-------------------------------------------------------------------------------------+ NATURAL JOIN does not require the use of DISTINCT because NATURAL JOIN will only show one SNO column, and since { SNO, PNO } is the primary key of SP, we know the tuples will be distinct, and for a given SNO value, the PNO values will always differ. SQL supports several types of join: NATURAL JOIN, JOIN, INNER JOIN, LEFT (OUTER) JOIN, and RIGHT (OUTER) JOIN. They are explained at `SQL JOINS `__. The MySQL JOIN syntax is explained at `MySQL JOIN SYNTAX `__. Here are some sample MySQL joins queries. SELECT \* FROM S NATURAL JOIN SP; +-------+---------+----------+----------+-------+-------+ | SNO | SNAME | STATUS | CITY | PNO | QTY | +=======+=========+==========+==========+=======+=======+ | S1 | Smith | 20 | London | P1 | 300 | +-------+---------+----------+----------+-------+-------+ | S1 | Smith | 20 | London | P2 | 200 | +-------+---------+----------+----------+-------+-------+ | S1 | Smith | 20 | London | P3 | 400 | +-------+---------+----------+----------+-------+-------+ | S1 | Smith | 20 | London | P4 | 200 | +-------+---------+----------+----------+-------+-------+ | S1 | Smith | 20 | London | P5 | 100 | +-------+---------+----------+----------+-------+-------+ | S1 | Smith | 20 | London | P6 | 100 | +-------+---------+----------+----------+-------+-------+ | S2 | Jones | 10 | Paris | P1 | 300 | +-------+---------+----------+----------+-------+-------+ | S2 | Jones | 10 | Paris | P2 | 400 | +-------+---------+----------+----------+-------+-------+ | S3 | Blake | 30 | Paris | P2 | 200 | +-------+---------+----------+----------+-------+-------+ | S4 | Clark | 20 | London | P2 | 200 | +-------+---------+----------+----------+-------+-------+ | S4 | Clark | 20 | London | P4 | 300 | +-------+---------+----------+----------+-------+-------+ | S4 | Clark | 20 | London | P5 | 400 | +-------+---------+----------+----------+-------+-------+ SELECT DISTINCT S.\* FROM S NATURAL JOIN SP; +-------+---------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+=========+==========+==========+ | S1 | Smith | 20 | London | +-------+---------+----------+----------+ | S2 | Jones | 10 | Paris | +-------+---------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+---------+----------+----------+ | S4 | Clark | 20 | London | +-------+---------+----------+----------+ SELECT \* FROM S LEFT JOIN SP USING(SNO); +-------+----------+----------+----------+----------+----------+ | SNO | SNAME | STATUS | CITY | PNO | QTY | +=======+==========+==========+==========+==========+==========+ | S1 | Smith | 20 | London | P1 | 300 | +-------+----------+----------+----------+----------+----------+ | S1 | Smith | 20 | London | P2 | 200 | +-------+----------+----------+----------+----------+----------+ | S1 | Smith | 20 | London | P3 | 400 | +-------+----------+----------+----------+----------+----------+ | S1 | Smith | 20 | London | P4 | 200 | +-------+----------+----------+----------+----------+----------+ | S1 | Smith | 20 | London | P5 | 100 | +-------+----------+----------+----------+----------+----------+ | S1 | Smith | 20 | London | P6 | 100 | +-------+----------+----------+----------+----------+----------+ | S2 | Jones | 10 | Paris | P1 | 300 | +-------+----------+----------+----------+----------+----------+ | S2 | Jones | 10 | Paris | P2 | 400 | +-------+----------+----------+----------+----------+----------+ | S3 | Blake | 30 | Paris | P2 | 200 | +-------+----------+----------+----------+----------+----------+ | S4 | Clark | 20 | London | P2 | 200 | +-------+----------+----------+----------+----------+----------+ | S4 | Clark | 20 | London | P4 | 300 | +-------+----------+----------+----------+----------+----------+ | S4 | Clark | 20 | London | P5 | 400 | +-------+----------+----------+----------+----------+----------+ | S5 | Admans | 30 | Athens | *NULL* | *NULL* | +-------+----------+----------+----------+----------+----------+ SELECT DISTINCT S.\* FROM S LEFT JOIN SP USING(SNO); +-------+----------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+==========+==========+==========+ | S1 | Smith | 20 | London | +-------+----------+----------+----------+ | S2 | Jones | 10 | Paris | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S5 | Admans | 30 | Athens | +-------+----------+----------+----------+ SELECT DISTINCT S.\* FROM S LEFT JOIN SP USING(SNO) WHERE SP.SNO IS NOT NULL; +-------+---------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+=========+==========+==========+ | S1 | Smith | 20 | London | +-------+---------+----------+----------+ | S2 | Jones | 10 | Paris | +-------+---------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+---------+----------+----------+ | S4 | Clark | 20 | London | +-------+---------+----------+----------+ SELECT DISTINCT S.\* FROM S LEFT JOIN SP USING(SNO); +-------+----------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+==========+==========+==========+ | S1 | Smith | 20 | London | +-------+----------+----------+----------+ | S2 | Jones | 10 | Paris | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S5 | Admans | 30 | Athens | +-------+----------+----------+----------+   SEMIJOIN ^^^^^^^^ Many queries that require JOIN really require an extended form of the operator called SEMIJOIN. The *semijoin* of *r* and *s* is the join of *r* and *s*, projected back on the attributes of *r*. So, for example, S SEMIJOIN SP is equivalent to (S JOIN SP) { SNO, SNAME, STATUS, CITY }. An application of SEMIJOIN would be the query "Get suppliers who supply at least one part". **Tutorial D SEMIJOIN** **SQL equivalents** S SEMIJOIN SP SELECT DISTINCT S.\* FROM S NATURAL JOIN SP; SELECT DISTINCT S.\* FROM S,SP WHERE S.SNO = SP.SNO; The output would look like this. +-------+---------+----------+----------+ | SNO | SANME | STATUS | CITY | +=======+=========+==========+==========+ | S1 | Smith | 20 | London | +-------+---------+----------+----------+ | S2 | Jones | 10 | Paris | +-------+---------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+---------+----------+----------+ | S4 | Clark | 20 | London | +-------+---------+----------+----------+ A more user-friendly spelling of SEMIJOIN is MATCHING: S MATCHING SP. Finally, observe that if the heading of *s* is a subset of *r* (that is, *p = 0*. See the definition of JOIN), then *r* JOIN *s* degenerates to *r* MATCHING *s*. Likewise, if *m = 0*, *r* JOIN *s* degenerates to *s* MATCHING *r*. TABLE\_DEE is the identity with respect to JOIN ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The concept pf JOIN can be extended to join several relvars, which would be written JOIN { r, s, ..., w} and means r JOIN s JOIN ... JOIN w. Thus *r* JOIN *s* can also be written JOIN { r, s }. The JOIN of no relations, written JOIN {}, is TABLE\_DEE. TABLE\_DEE is of type RELATION {}; i.e., it has no attributes; and it contains one tuple, the *0-tuple* or empty tuple. The join of any relation *r* with TABLE\_DEE is simply *r*. As a consequence, the join of no relations is TABLE\_DEE. Also *r* JOIN TABLE\_DEE is the same as TABLE\_DEE JOIN *r* because the result is simply the cartesian product. INTERSECT ^^^^^^^^^ Interest requires the relations or relvars to be of the same type. Tutorial D MySQL S { CITY } INTERSECT P { CITY } SELECT DISTINCT S.CITY FROM S INTERSECT SELECT DISTINCT P.CITY FROM P Note, DISTINCT is not strictly needed above. INTERSECT will remove duplicates. UNION ^^^^^ The types of the operands must be the same. +-------------------------------+---------------------------------+ | S { CITY } UNION P { CITY } | SELECT DISTINCT S.CITY FROM S | | | UNION DISTINCT | | | SELECT DISTINCT P.CITY | | | FROM P | +-------------------------------+---------------------------------+ Since the default for UNION is UNION DISTINCT, DISTINCT is not strictly needed. As a consequence, the DISTINCTs following the two SELECTs aren't needed (note: the default for SELECT is ALL rather than DISTINCT). MINUS and SEMIMINUS ^^^^^^^^^^^^^^^^^^^ Definition: If *r* and *s* are of the same type, then *r* MINUS *s* consists of all typles that appear in *r* but not in *s*. +-------------------------------+------------------+ | Tutorial D MINUS | SQL equivalent | +===============================+==================+ | S { CITY } MINUS P { CITY } | SELECT S.CITY | | | FROM S | | | EXCEPT | | | SELECT P.CITY | | | FROM P | +-------------------------------+------------------+ EXCEPT is the SQL equivalent of the relational algebra MINUS operator; however, not all SQL implementations support EXCEPT, including MySQL. An MySQL equivalent expression using a subquery is +-------------------------------+----------------------------------------------------------------------------+ | Tutorial D MINUS | MySQL equivalent | +===============================+============================================================================+ | S { CITY } MINUS P { CITY } | SELECT S.CITY FROM S WHERE S.CITY NOT IN (SELECT DISTINCT P.CITY FROM P) | +-------------------------------+----------------------------------------------------------------------------+ We extract those cities from the supplier relvar that are not in the parts relvar. MINUS can be done more efficiently in MySQL using LEFT JOIN, as `Doing Intersect and Minus in MySQL `__ explains. Using LEFT JOIN, the MySQL analogue of MINUS would be +----------------------------------------------------------------------------+ | MySQL equivalent of MINUS using LEFT JOIN | +============================================================================+ | SELECT DISTINCT S.\* FROM S LEFT JOIN SP USING(SNO) WHERE SP.SNO IS NULL | +----------------------------------------------------------------------------+ MINUS is just a special case of the more general SEMIMINUS. SEMIMINUS does not require both operands to be of the same type. The definition of *s* SEMIMINUS *r* is "*r* MINUS (*r* SEMIJOIN *s*)" which is the same as "*r* MINUS (*r* MATCHING *s*)". Consider the query "Get suppliers who supply no parts at all": +------------------------+------------------------+ | Tutorial D SEMIMINUS | SQL equivalent | +========================+========================+ | S SEMIMINUS SP | SELECT S.\* | | | FROM S | | | EXCEPT | | | SELECT S.\* | | | FROM S, SP | | | WHERE S.SNO = SP.SNO | +------------------------+------------------------+ Again, MySQL does not support EXCEPT. The MySQL analogue for SEMIMINUS would be MySQL analogues for SEMIMINUS | SELECT DISTINCT S.\* FROM S LEFT JOIN SP USING(SNO) | WHERE SP.SNO IS NULL | SELECT DISTINCT S.\* FROM S WHERE S.SNO NOT IN | (SELECT DISTINCT S.SNO FROM S NATURAL JOIN SP) Tutorial D does provide the alternative, more user-friendly syntax of  "S NOT MATCHING SP". Thinking in terms of "NOT MATCHING" makes the MySQL expressions above perhaps more understandable. DIVIDE ^^^^^^ In relational algebra, the divisor is another relation S, whose heading must be a subset of the heading of R. The division is over the common attribute(s), and the set of values used as the actual divisor are the values found in S. DIVIDE is perhaps easier to first illustrate. The figure below shows a simple example of dividing a binary relation R1 by a unary relation R2. The division is over the shared attribute I2. The divisor is the set {1,2,3}, these being the values found in the shared attribute in R2. Inspecting the tuples of R1, the value 'a' occurs in tuples such that their I2 values (that is, set of values for R1.I2) match the divisor (the set of values in the divisor). So 'a' is included in the result, but 'b' is not because there is no tuple with '3' as the I2 value and 'b' as the I1 value. |an image| *Definition*: Let *r* and *s* be such that the heading of *s* is a subset of the heading of *r*. Then the *division* of *r* by *s*, *r* DIVIDEDBY *s* is shorthand for the following: :: r { X } MINUS ( ( r { X } TIMES s) MINUS r ) { X } where X is the set-theoretic difference between the heading of *r* and that of *s*. Let's use a example from the part and suppliers database. Let's apply this definition to this example. :: SP { SNO, PNO } DIVIDEBY P { PNO } Using the definition, this would be re-written as :: SP { SNO } MINUS ( ( SP { SNO } TIMES P { PNO } ) MINUS SP { SNO, PNO } ) { SNO } If look at these steps one-by-one, we would see the following. SP { SNO } **TIMES** P { PNO } SP **{ SNO, PNO }** ( SP { SNO } TIMES P { PNO } ) **MINUS** SP { SNO, PNO } +-------+-------+ | SNO | PNO | +=======+=======+ | s1 | p1 | +-------+-------+ | s1 | p2 | +-------+-------+ | s1 | p3 | +-------+-------+ | s1 | p4 | +-------+-------+ | s1 | p5 | +-------+-------+ | s1 | p6 | +-------+-------+ | s2 | p1 | +-------+-------+ | s2 | p2 | +-------+-------+ | s2 | p3 | +-------+-------+ | s2 | p4 | +-------+-------+ | s2 | p5 | +-------+-------+ | s2 | p6 | +-------+-------+ | s3 | p1 | +-------+-------+ | s3 | p2 | +-------+-------+ | s3 | p3 | +-------+-------+ | s3 | p4 | +-------+-------+ | s3 | p5 | +-------+-------+ | s3 | p6 | +-------+-------+ | s4 | p1 | +-------+-------+ | s4 | p2 | +-------+-------+ | s4 | p3 | +-------+-------+ | s4 | p4 | +-------+-------+ | s4 | p5 | +-------+-------+ | s4 | p6 | +-------+-------+ +-------+-------+ | SNO | PNO | +=======+=======+ | s1 | p1 | +-------+-------+ | s1 | p2 | +-------+-------+ | s1 | p3 | +-------+-------+ | s1 | p4 | +-------+-------+ | s1 | p5 | +-------+-------+ | s1 | p6 | +-------+-------+ | s2 | p1 | +-------+-------+ | s2 | p2 | +-------+-------+ | s3 | p3 | +-------+-------+ | s4 | p2 | +-------+-------+ | s4 | p4 | +-------+-------+ | s4 | p5 | +-------+-------+ +----------------------+----------------------+ | SNO | PNO | +======================+======================+ | [STRIKEOUT:**s1**] | [STRIKEOUT:**p1**] | +----------------------+----------------------+ | [STRIKEOUT:**s1**] | [STRIKEOUT:**p2**] | +----------------------+----------------------+ | [STRIKEOUT:**s1**] | [STRIKEOUT:**p3**] | +----------------------+----------------------+ | [STRIKEOUT:**s1**] | [STRIKEOUT:**p4**] | +----------------------+----------------------+ | [STRIKEOUT:**s1**] | [STRIKEOUT:**p5**] | +----------------------+----------------------+ | [STRIKEOUT:**s1**] | [STRIKEOUT:**p6**] | +----------------------+----------------------+ | [STRIKEOUT:**s2**] | [STRIKEOUT:**p1**] | +----------------------+----------------------+ | [STRIKEOUT:**s2**] | [STRIKEOUT:**p2**] | +----------------------+----------------------+ | [STRIKEOUT:**s2**] | [STRIKEOUT:**p3**] | +----------------------+----------------------+ | s2 | p4 | +----------------------+----------------------+ | s2 | p5 | +----------------------+----------------------+ | s2 | p6 | +----------------------+----------------------+ | s3 | p1 | +----------------------+----------------------+ | [STRIKEOUT:**s3**] | [STRIKEOUT:**p2**] | +----------------------+----------------------+ | s3 | p3 | +----------------------+----------------------+ | s3 | p4 | +----------------------+----------------------+ | s3 | p5 | +----------------------+----------------------+ | s3 | p6 | +----------------------+----------------------+ | s4 | p1 | +----------------------+----------------------+ | [STRIKEOUT:**s4**] | [STRIKEOUT:**p2**] | +----------------------+----------------------+ | s4 | p3 | +----------------------+----------------------+ | [STRIKEOUT:**s4**] | [STRIKEOUT:**p4**] | +----------------------+----------------------+ | [STRIKEOUT:**s4**] | [STRIKEOUT:**p5**] | +----------------------+----------------------+ | s4 | p6 | +----------------------+----------------------+ <==> +-------+-------+ | SNO | PNO | +=======+=======+ | s2 | p4 | +-------+-------+ | s2 | p5 | +-------+-------+ | s2 | p6 | +-------+-------+ | s3 | p1 | +-------+-------+ | s3 | p3 | +-------+-------+ | s3 | p4 | +-------+-------+ | s3 | p5 | +-------+-------+ | s3 | p6 | +-------+-------+ | s4 | p1 | +-------+-------+ | s4 | p3 | +-------+-------+ | s4 | p6 | +-------+-------+   First, the cartesian product pairs the divisor P { PNO } — the set of unique PNO values from P — with each unique value of SNO in the dividend SP. Next MINUS finds potential attributes values that are in the final result. We can see this by noting that the set of tuples whose SNO value is 's1' have (collectively) a set of PNO values that matches the set of PNO values in the divisor. MINUS removes these 's1'-tuples completely. Next we project this intermediate result on SNO, which gives us. ( ( SP { SNO } TIMES P { PNO } ) MINUS SP { SNO, PNO } ) **{ SNO }** +-------+ | SNO | +=======+ | s2 | +-------+ | s3 | +-------+ | s4 | +-------+   These are the set of SNO values whose PNO values *do not* match those of the divisor. Finally, we substract these values from SP { SNO }, which gives us back the 's1' value in SP. SP { SNO } **MINUS** ( ( SP { SNO } TIMES P { PNO } ) MINUS SP { SNO, PNO } ) { SNO } +-------+ | SNO | +=======+ | s1 | +-------+   So we see that SP { SNO, PNO } DIVIDEBY P { PNO } is: +-------+ | SNO | +=======+ | s1 | +-------+   The result is loosely "supplier numbers for suppliers who supply all parts." This can be expressed SQL as :: SELECT DISTINCT SPX.SNO FROM SP AS SPX WHERE NOT EXISTS ( SELECT P.PNO FROM P WHERE NOT EXITS ( SELECT SPY.SNO FROM SP AS SPY WHERE SPY.SNO = SPX.SNO AND SPY.PNO = P.PNO ) ) We claim that this query finds suppliers who supply (ship) all parts. Notice we want suppliers who have shipped parts, i.e., those SP.SNO values *v*, such that there does not exist even one part, one P.PNO value, that is not found in those tuples of SP whose SP.SNO value is *v*. On the other hand, this query is loosely "supplier numbers for suppliers who do not supply all parts." :: SELECT DISTINCT SPX.SNO FROM SP AS SPX WHERE EXISTS ( SELECT P.PNO FROM P WHERE NOT EXITS ( SELECT SPY.SNO FROM SP AS SPY WHERE SPY.SNO = SPX.SNO AND SPY.PNO = P.PNO ) ) Notice here we want suppliers who have shipped parts, i.e., those SP.SNO values *v*, such that there does exist at least one part, one P.PNO value, that is not found in those tuples of SP whose SP.SNO value is *v*. Note, any query that uses DIVIDEBY can be reformulated using relational comparison, which has the advantage of making the query clearer. A query using relational comparison to find "supplier numbers for suppliers who supply all parts" would be :: WITH (SP RENAME ( SNO AS X ) ) AS R: S WHERE ( R WHERE X = SNO ) { PNO } = P { PNO }; WITH AS is explained below. How relational comparison, using WITH AS,is more straightforward (than the SQL analogue for) DIVIDEBY is explained. EXTEND and SUMMARIZE ^^^^^^^^^^^^^^^^^^^^ Loosely, *extend* supports computation across tuples and *summarize* supports computation down tuples. *Definition*: The *extension* of relation *r*, written :: EXTEND r ADD (exp AS X) is a relation with a heading equal to *r* extended with the attribute *X* and a body consisting of all tuples *t* such that *t* is a typle of *r* extended with a value for attribute *X* that is computed by evaluation *exp* on that tuple of *r*. *r* must not have an attribute name *X* and *exp* must not refer to *X*. For example, there are 454 grams to a pound. So we can extend a relation with a weight ( in lbs. ) attribute. +------------------------------------------+---------------------------------------------------+ | Tutorial D | SQL | +==========================================+===================================================+ | EXTEND P ADD { WEIGHT \* 454 AS GMWT } | SELECT P.\*, ( P.WEIGHT \* 454 ) AS GMWT FROM P | +------------------------------------------+---------------------------------------------------+ ADD is not addition. It means extend the heading of the relvar with an additional attribute, which follows AS. The value of the extended attribute is computed by evaluating the expression before AS. Some other examples using SQL. :: SELECT P.*, ( p.WEIGHT * 454 ) AS GMWT FROM P WHERE GMWT > 7000.0 Such a query can also be written closer in style to Tutorial D. :: SELECT TEMP.PNO, TEMPGMWT FROM ( SELECT P.PNO, ( P.WEIGHT * 454 ) AS GMWT FROM P ) AS TEMP WHERE TEMP.GMWT > 7000.0 SUMMARIZE ^^^^^^^^^ *Definition*: Let *r* and *s* be realtions such that *s* is of the same type as some projection of *r*, and let the attributes of *s* be *A1, A2, A3, ...*. Then the *summarization* :: SUMMARIZE r PER { s } ADD { summary AS X } is a relation whose heading is equal to the heading of *s* extended with the attribute *X*, and body consisting of all tuples *t* such that *t* is a tuple of *s* extended with a value for attribute *X*, where *X* is computed by evaluating *summary* over all tuples of *r* that have the same value for attributes *A1, A2, ..., An* as tuples *t* does. *X* cannot be an attribute of *s* and *summary* must not refer to *X*. The cardinality of the resulting relation is equal to that of *s*, and the degree is equal to that of *s* plus one. As an example take :: SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT() AS P_COUNT) whose result would be +-------+------------+ | SNO | P\_COUNT | +=======+============+ | S1 | 6 | +-------+------------+ | S2 | 2 | +-------+------------+ | S3 | 1 | +-------+------------+ | S4 | 3 | +-------+------------+ | S5 | 0 | +-------+------------+ While S { SNO } is not a projection of SP, it is of the same type as the projection SP { SNO }. The result will contain the same tuples as S { SNO }, the set of all suppliers, plus the attribute P\_COUNT, where P\_COUNT is the total number of suppliers in SP for that tuple's SNO value. SUMMARIZE thus goes down 'columns' doing *summary* per tuple value in *s*. Notice that this analogous SQL statement :: SELECT SP.PNO, COUNT(*) AS P_COUNT FROM SP GROUP BY SP.SNO will only return tuples for suppliers S1, S2, S3 and S4, so it is not equivalent to the SUMMARIZE expression. This SQL expression is equivalent :: SELECT SP.SNO, TEMP.PRODUCT_COUNT FROM S, LATERAL ( SELECT COUNT(*) AS PRODUCT_COUNT FROM SP WHERE SP.SNO = S.SNO ) AS TEMP does not contain a row ( tuple ) for S5, since S5 is not in SP. If *s* is a projection of *r* (not just "of the same type as" some projection of *r*) then the expression can be simplied slightly; instead of :: SUMMARIZE SP PER ( SP { SNO } ) ADD ( MAX { QTY } AS MAXQ, MIN ( QTY ) AS MINQ ) you can write :: SUMMARIZE SP BY ( SNO ) ADD ( MAX { QTY } AS MAXQ, MIN ( QTY ) AS MINQ ) Various types of summaries are supported in Tutorial D: COUNT, SUM, AVG, MAX, MIN, COUNTD, SUMD, AVGD ( where "D" stands for "eliminate redundant duplicate values before summarizing" ). :: SUMMARIZE SP PER ( SP { SNO } ) ADD ( MAX ( QTY ) AS MAXQ, MIN ( QTY ) AS MINQ ) In this example SUMMARIZE has no PER specification. :: SUMMARIZE ( S WHERE CITY = 'London' ) ADD ( COUNT ( ) AS N ) Since this summarize has no PER specification, the summarizing is done per TABLE\_DEE, i.e., it is shorthand for :: SUMMARIZE ( S WHERE CITY = 'London' ) PER ( TABLE_DEE ) ADD ( COUNT ( ) AS N ) Recall TABLE\_DEE is a relation with no attributes and one tupe (the 0-tuple). TABLE\_DEE fits the definitino of SUMMARIZE because it is a projection of the relation in question, S, on the empty set of attributes. The output of this SUMMARIZE therefore has on attribute and one type. Finally, there is a difference between :: VAR N INTEGER; N := COUNT ( S WHERE CITY = 'London' ); SUMMARIZE returns a relation, but the aggregate operator above returns a scalar. It's true that it might be thought of as "returning" one scalar value for each tuple in the PER relation, but that scalr value is then appended to that tuple to produce a tuple in the overall SUMMARIZE result. SQL does have something analogous to the BY form of SUMMARIZE, but not the more general PER form. The SQL analog of :: SUMMARIZE SP BY { SNO } ADD ( SUM ( QTY ) AS TQ ) is :: SELECT SP.SNO, SUM ( SP.QTU ) AS TQ Finally, here is an example of EXTEND expression that is logically equivalent to SUMMARIZE. The summarization expression of :: SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( ) AS NP ) is equivalent to this EXTEND expression, which uses the WITH AS operator introduced next. :: WITH ( SP RENAME ( SNO AS X ) AS R : EXTEND ( S { SNO } ) ADD ( COUNT ( R WHERE X = SNO ) AS NP ) WITH AS ^^^^^^^ On p. 104 the Tutorial D **WITH AS** operator's use is illustrated to "get suppliers who supply all parts." :: WITH (SP RENAME ( SNO AS X ) ) AS R: S WHERE ( R WHERE X = SNO ) { PNO } = P { PNO }; This is a restriction, S WHERE *exp1* = *exp2*, where *exp1* is, ( R WHERE X = SNO ) { PNO } and *exp2* is, P { PNO }. We examine each supplier, say, Sx in relvar S, comparing Sx's associated SNO value to the SNO values in SP. That is the meaning of :: ( R WHERE X = SNO ) For example, for the tuple in S with SNO value of 'S1', after doing :: ( R WHERE X = SNO ) the resulting set of PNO values from SP corresponding to the 'S1' value from S would be: { P1, P2, P3, P4, P5, P6 }. Next we project this result on PNO, which yields the same set of values, the set of unique part numbers supplied by 'S1'. Next we compare this result to P { PNO }, the set of all unique part numbers. :: ( R WHERE X = SNO ) { PNO } = P { PNO } Continuing the example using 'S1', the expression ( R WHERE X = SNO ) { PNO } = P { PNO } would evaluate to TRUE. So the tuple of S containing the SNO value of 'S1' would be in the final result. Thus, the resulting relation will be those tuples of S that represent suppliers who supply (have shipped parts to customers) all parts. In the sample database there is only one such suppliers. Thus, the result is :: RELATION { TUPLE { SNO SNO('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' } } So the forgoing query found "suppliers who supply all parts". And it is simplier than the earlier DIVIDEBY query, which was actually a relational formulation of the query, "Get supplier numbers for supplier who *supply at least one part and in fact* supply all parts." Here is another example a query using WITH AS that answers: "Get paris of supplier numbers, *Sx* and *Sy* say, such that *Sx* and *Sy* supply exactly the same set of parts." :: WITH ( S RENAME ( SNO AS SX ) { SX } AS RX, ( S RENAME ( SNO AS SY ) { SY } AS RY : ( RX JOIN RY ) WHERE ( SP WHERE SNO = SX ) { PNO } = ( SP WHERE SNO = SY ) { PNO } Again, this is a restriction of the result of a JOIN: the outermost operation is ( RX JOIN RY ) WHERE *exp1* = *expr2*. Appending "SX < SY" to the WHERE clause here would produce a slightly tidier result: it would eliminate pairs of the form (Sx, Sx) and ensure that the pairs (Sx, Sy) and (Sy, Sx) don't appear (since JOIN is effectively the cartesian product). Expression Transformation ^^^^^^^^^^^^^^^^^^^^^^^^^ Distributive, associate and commutative laws apply to various operations as mentioned on pp. 100-101. This allows an optimizer to rewrite queries in such a way the performance is optimal. INSERT, DELETE and UPDATE ^^^^^^^^^^^^^^^^^^^^^^^^^ Strictly speaking relational algreba has no updateing (assignment) or comparison operators and no notion relvars. Operations like UPDATE, INSERT and DELETE are shorthand for algebraic assigment operations. For example, given :: VAR PQ BASE RELATION { PNO PNO, QTY QTY } KEY { PNO }; Inserting into PQ :: INSERT PQ ( SUMMARIZE SP PER ( P { PNO } ) ADD ( SUM ( QTY ) AS QTY ) ); is equivalent to the longhand assigment of: :: PQ := PQ UNION ( SUMMARIZE SP PER ( P { PNO } ) ADD ( SUM ( QTY ) AS QTY ) ); A DELETE of :: DELETE S WHERE CITY = 'Athens'; is equivalent to :: S := S WHERE NOT CITY = 'Athens'; and an UPDATE of :: UPDATE P WHERE CITY = 'London' ( WEIGHT := 2 * WEIGHT, CITY := 'Oslo' ); has the longhand equivalent: :: P := WITH ( P WHERE CITY = 'London' ) AS R1, ( EXTEND R1 ADD ( 2 * WEIGHT AS NEW_WEIGHT, 'Oslo' AS NEW_CITY ) ) AS R2, R2 { ALL BUT WEIGHT, CITY } AS R3, R3 RENAME { NEW_WEIGHT AS WEIGHT, NEW_CITY AS CITY } AS R4, P MINUS R1 AS R5 : R5 UNION R4; First, R1 is the set of tuples to be updated, extended with the new weight and new city as R2. Then we throw away from R2 everything but the weight and the city. This is R3. Then we rename the new weight and city with the proper attribute names of WEIGHT and CITY. This is R4. Then we identify those tuples not to be updated (but to be retained). This is R5. Finally, the result is the union of R5 and R4. The basic SQL conceptual algorithm ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A SQL expression can be thought of as being implemented (at least conceptually) in three steps. #. The FROM clause ... #. Next, the WHERE clause ... .. todo:: To be done later...(See Simple SQL). Examples ^^^^^^^^ These are examples of Tutorial D and SQL analogues of various queries Get supplier numbers for suppliers who supply part P1. +--------------------------------------+------------------------------------------------------------+ | (SP WHERE SNO = PNO('P1')) { SNO } | SELECT DISTINCT SP.SNO FROM SP WHERE SP.PNO = PNO('P1'); | +--------------------------------------+------------------------------------------------------------+ Get suppliers with status in the range 15 to 25 inclusive. +---------------------------------------+-----------------------------------------------------------------------+ | S WHERE STATUS ≥ 15 AND STATUS ≤ 25 | SELECT DISTINCT S.STATUS FROM S WHERE S.STATUS ≥ 15 AND STATUS ≤ 25 | +---------------------------------------+-----------------------------------------------------------------------+ Get part numbers for parts supplied by a supplier in London. +-----------------------------------------------------+-------------------------------------+ | (SP MATCHING ( S WHERE CITY = 'London' )) { PNO } | SELECT DISTINCT SP.PNO FROM SP, S | | | WHERE S.CITY = CITY('London') | | | AND | | | SP.PNO = S.PNO; | +-----------------------------------------------------+-------------------------------------+ Comments: SP MATCHING ( S WHERE CITY = 'London' ) is SP JOIN S, with the result project back onto the attributes of SP. Since we only want PNO, JOIN would have worked just as well. Get part numbers for parts not supplied by any supplier in London. +---------------------------------------------------------+-----------------------------------------------+ | (SP NOT MATCHING ( S WHERE CITY = 'London' )) { PNO } | SELECT P.PNO FROM P | | | EXCEPT | | | SELECT SP.PNO | | | FROM SP, S | | | WHERE SP.SNO = S.SNO AND S.CITY = 'London'; | +---------------------------------------------------------+-----------------------------------------------+ Comments: NOT MATCHING is equivalent to SP SEMIMINUS (SP MATCHING (S WHERE CITY = 'London')), which is SP MINUS (SP MATCHING S). This query could also be expresse, if Tutorial D supports "not equal", as SP MATCHING (S WHERE CITY NOT = CITY('London'). Get city names for suppliers in which at least two two suppliers are located. +--------------------------------------------------------------+--------------------------------------------------------------------------+ | (SUMMARIZE S BY CITY ADD (COUNT() AS CNUM)) WHERE CNUM > 1 | SELECT TEMP.CITY | | | FROM (SELECT S.CITY, COUNT(\*) AS CNUM FROM S GROUP BY S.CITY) AS TEMP | | | WHERE TEMP.CNUM > 1 | +--------------------------------------------------------------+--------------------------------------------------------------------------+ Comment: For the Tutorial D, we do a restrict on the result of summarize. For the SQL analogue, we also do COUNT(\*) per city. GROUP BY specifies how the count should be done; withtout it, we would simply get a total count of cities in each row of the result. Get all pairs of part numbers such that some supplier supplies both of the indicated parts. +--------------------------------------------------------------------------+ | WITH SP { SNO, PNO } AS Z: | | ( (Z RENAME (PNO AS X)) | |      JOIN | |   (Z RENAME (PNO AS Y)) ) { X, Y } | | SELECT XX.PNO AS X, YY.PNO AS Y | | FROM SP AS XX, SP AS YY | | WHERE XX.SNO = YY.SNO; | +--------------------------------------------------------------------------+ Comments: From looking at relation SP it is clear that supplier S1 supplies every part listed in P; consequently, for any pairing of part numbers (Px, Py) there will always exist a supplier, namely S1, who supplies both part numbers. Now how does the query ensure this? The SQL query select pairs of part numbers, PNO, such that their suppliers are the same, XX.SNO = YY.SNO, i.e., these parts are both supplied by the same supplier. Note, DISTINCT is not needed because distinct values X and Y are being returned. In the Tutorial D query, we, in essence, join two copies of SP, SP1 JOIN SP2, where the PNO attribute has been renamed in each copy, so that the join occurs on SNO, the only remaining common attribute. The other two "PNO" columns are both included, as X and Y, giving pairs of part numbers (Px, Py) such that they share a common supplier value, a common SNO value. So for, say the SNO value of 'S2', we would have the pairs (P1, P1), (P1, P2), (P2, P1) and (P2, P2). For 'S4' we would have the pairs: (P2, P2), (P2, P4), (P4, P2) and (P4, P4). For SNO value of 'S1', we would have all possible combinations of P1 through P6. Finally, we take the join result and project it onto { X, Y }. This eliminates duplicate tuples, duplicate pairs. Get the total number of parts supplied by supplier S1. +-----------------------------------------------------+---------------------------------------------------------------------------+ | WITH (SP WHERE SNO = SNO('S1')) AS R: | SELECT SP.SNO, TEMP.TOTAL FROM SP, LATERAL | | SUMMARIZE SP PER R { SNO } ADD (COUNT() AS TOTAL) |    (SELECT COUNT(\*) AS TOTAL FROM SP WHERE SP.SNO = SNO('S1')) AS TEMP | +-----------------------------------------------------+---------------------------------------------------------------------------+ or with the count by itselfi +---------------------------------------------------------------+------------------------------------------------------------+ | EXTEND TABLE\_DEE ADD (COUNT(SP WHERE SNO = 'S1') AS TOTAL) | SELECT COUNT(\*) AS TOTAL FROM SP WHERE SNO = SNO('S1'); | +---------------------------------------------------------------+------------------------------------------------------------+ Comments: Tutorial D also allows for the later expression to be written :: COUNT (SP WHERE SNO = SNO('S1')) Get supplier numbers for suppliers with a STATUS lower than that of supplier S1. +----------------------------------------------------------------------------------+--------------------------------------------------------------------+ | (S WHERE STATUS < STATUS FROM ( TUPLE FROM (S WHERE SNO = SNO('S1') )) { SNO } | SELECT S.SNO FROM S WHERE STATUS < (SELECT S.STATUS FROM S WHERE | | | S.SNO = SNO('S1') ) )) | +----------------------------------------------------------------------------------+--------------------------------------------------------------------+ Comments: The Tutorial D "TUPLE FROM" operator extracts a tuple from the restriction. Then the FROM operator extracts the STATUS attribute from the tuple. So we restrict S to the tuples where STATUS is always less than the STATUS of any of the tuples whose SNO value is S1.Integrity Constraints An integrity constraint, or just constraint, is just a formal name for *business rules*. Contraints guarantee consistentency. They cannot quarantee correctness, which involves insuring that every relvar's predicate, it's fundamental meaning, has never been violated. Type Constraints ^^^^^^^^^^^^^^^^ Type constraints are simply the set of legal values for the type. They are checked when the type's selector is invoked. In Tutorial D a type contraint looks like this :: TYPE POINT POSSREP CARTESIAN { X NUMERIC, Y NUMERIC CONSTRAINT SQRT (X ** 2, Y ** 2) ≤ 100.0 }; POSSREP here stands for "possible representation". If omitted, the POSSREP defaults to the same name as the type. The constraint is that points must lie within circle whose radius is 100. SQL does not support type constraints. THE\_Operators ^^^^^^^^^^^^^^ The THE\_operators (there is one for each component) return the components of the type. Given this POINT :: POINT P(NUMERIC(5), NUMERIC(5)); THE\_X(P) returns the X component's value, and THE\_Y(P) returns the Y component's value. Database constraints ^^^^^^^^^^^^^^^^^^^^ Database constraints are constraints on the values that can appear in a given database. They should be checked at the end of any statement that assigns a value to some relvar in the database: in SQL at any INSERT or UPDATE. Attributes have a built-in type constaint because the attribute is always of a certain type. Together with type constraints they form the business rules of an application. A relvar constraint that involves just one relvar is a single-relvar constraint. In SQL database constaints are expressed by means of CREATE ASSERTION statements. This single-relvar constaints checks that "status values must be in the range 1 to 100 inclusive": +----------------------------------------------------------------+---------------------------------------------------------------------------+ | Tutorial D | SQL | +================================================================+===========================================================================+ | CONSTRAINT C1 IS\_EMPTY (S WHERE STATUS < 1 OR STATUS > 100) | CREATE ASSERTION C1 CHECK | | | (NOT EXISTS (SELECT S.\* FROM S WHERE S.STATUS < 1 OR S.STATUS > 100)); | +----------------------------------------------------------------+---------------------------------------------------------------------------+ This constraint involves just a single attribute of a single relvar. The constraint "supplies in London must have status 20" would be written +----------------------------------------------------------------------+---------------------------------------------------------------------------------+ | Tutorial D | SQL | +======================================================================+=================================================================================+ | CONSTRAINT C1 IS\_EMPTY (S WHERE CITY = 'London' AND STATUS ≠ 20); | CREATE ASSERTION C1 CHECK | | | (NOT EXISTS (SELECT S.\* FROM S WHERE S.CITY = 'London' AND S.STATUS <> 20)); | +----------------------------------------------------------------------+---------------------------------------------------------------------------------+ Although it involves two attributes, this is still a single-relvar constraint. Here is constraint that involves two revlars, a multi-relvar constraint: "no supplier with status less than 20 can supply part P6." +--------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------+ | Tutorial D | SQL | +================================================================================+========================================================================================================+ | CONSTRAINT C1 IS\_EMPTY ((S JOIN SP) WHERE STATUS < 20 AND PNO = PNO('P6')); | CREATE ASSERTION C1 CHECK | | | (NOT EXISTS (SELECT S.\* FROM S, SP WHERE S.SNO = SP.SNO AND S.STATUS < 20 AND SP.PNO = PNO('P6'))); | +--------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------+ All the constraints (AND'ed together) involving a relvar R is the "relvar constraint" for R. The databse constraint for a given database DB is the AND of all of the relvar constraints for the relvars in DB. Constraint examples ^^^^^^^^^^^^^^^^^^^ All red parts must weight less than 50 lbs. +---------------------------------------------------------+------------------------------------------------------------------------------------------+ | Tutorial D | SQL | +=========================================================+==========================================================================================+ | CONSTRAINT C1 IS\_EMPTY (P WHERE COLOR = COLOR('Red') | CREATE ASSERTION C1 CHECK (NOT EXISTS (SELECT P.\* FROM P WHERE P.COLOR = COLOR('Red') | | AND WEIGHT ≥ WEIGHT(50.0)); | AND P.WEIGHT ≤ 50)); | +---------------------------------------------------------+------------------------------------------------------------------------------------------+ Every London supplier must supply part P2. +-------------------------------------------------------+----------------------------------------------------------------------------------------------------+ | Tutorial D | SQL | +=======================================================+====================================================================================================+ | CONSTRAINT C1 IS\_EMPTY | CREATE ASSERTION C1 CHECK (NOT EXISTS | | (WITH (SP RENAME (SNO AS X)) AS R: | (SELECT \* FROM S WHERE S.CITY = 'London' AND NOT EXISTS (SELECT \* FROM SP WHERE SP.SNO = S.SNO | |  S WHERE CITY = 'London' |  AND SP.PNO = PNO('P2'))  ); | |  AND | | |  TUPLE {PNO PNO('P2')} ∉ (R WHERE X = SNO) {PNO} ); | | +-------------------------------------------------------+----------------------------------------------------------------------------------------------------+ For further examples, see the answers to 6-16. Database Design Theory ~~~~~~~~~~~~~~~~~~~~~~ The intent of DB design theory is to reduce redundancy so that the database never is in an inconsistent state. While database design is ultimately a subjective endeavor, there are certain formal principles to keep in mind. Functional Dependency ^^^^^^^^^^^^^^^^^^^^^ A functional dependency is a special type of single-relvar integrity constraint important in database normalization. If an attribute's value depends on the value of another attribute, it is functionaly dependent on the other attribute. The formal definition is: *Definition:* Let *A* and *B* be subsets of the heading of relvar *R*, then *R* satisfies the functional dependency (*FD*) *A* → *B* (read "*B* is functionally dependent on *A*" or "*A* functionaly determines *B*") if (and only if), in every relation that is a legal value for *R*, whenever two tulpes have the same value for *A*, they also have the same value for *B*. *A* and(or) *B* can be sets of attributes. As an example of a functional dependency integrity constraint suppose we require that if two suppliers are in the same city, then they must have the same status, or { CITY } → { STATUS }. +-------+----------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+==========+==========+==========+ | S1 | Smith | 20 | London | +-------+----------+----------+----------+ | S2 | Jones | 30 | Paris | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S5 | Admans | 30 | Athens | +-------+----------+----------+----------+ Table: S   In Tutorial D, this FD constraint can be expressed as: :: CONSTRAINT C1 COUNT(S { CITY }) = COUNT(S { CITY, STATUS }); That is, whenever two cities are identical, the pair will always identical. A functional dependency (FD) clearly continues to remains valid if you add attributes to *A* (to the left side of the functional dependency) or substract them from *B* (the right side of the functional dependency). Obviously, every relvar candidate key represents a functional dependency constraint (from the key to the set of all the attributes of *R*, as well as from the key to every subset of attributes). As just noted, this functional dependency remains valid if we add elements to the key. Adding attributes to a key creates what is called a *superkey*. A *superkey* has the uniqueness property (as only one distinct tuple contains the superkey) that all keys must have, but it does not have the irreducibility property that a key must also have. Boyce/Codd Normal Form ^^^^^^^^^^^^^^^^^^^^^^ *Definition:* Relvar *R* is in BCNF if and only if, for every non-trivial functional dependency *A* → *B* satisfied by *R*, *A* is a superkey for *R*. Trivial functional dependencies are the obvious sorts of FDs, in which the attributes on the left are a superset of the attributes on the right, like these: :: { CITY } → { CITY } { SNO }  → { SNO } { CITY, STATUS } → { CITY} The functional dependencies of a relvar in Boyce/Codd Normal Form (besides the trivial functional dependencies) are always *from* superkeys: a key or a set of attributes containing a key. To decompose a relvar *R* into BCNF, we decompose it into smaller relvars with fewer attributes. To ensure that no information is lost (which ultimatly means the predicate remains valid), we must restrict the decomposition to projections of *R* that when joined return the original relvar. So we use project as the decomposition operator for achieving Boyce-Codd NormalForm, and join as the recomposition operator. We can decomposed RS, with its FD { CITY } → { STATUS }, into the following two projections: +-------+----------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+==========+==========+==========+ | S1 | Smith | 20 | London | +-------+----------+----------+----------+ | S2 | Jones | 30 | Paris | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S5 | Admans | 30 | Athens | +-------+----------+----------+----------+ Table: RS | | => +-------+----------+----------+ | SNO | SNAME | CITY | +=======+==========+==========+ | S1 | Smith | London | +-------+----------+----------+ | S2 | Jones | Paris | +-------+----------+----------+ | S3 | Blake | Paris | +-------+----------+----------+ | S4 | Clark | London | +-------+----------+----------+ | S3 | Blake | Paris | +-------+----------+----------+ | S4 | Clark | London | +-------+----------+----------+ | S5 | Admans | Athens | +-------+----------+----------+ Table: SNC +----------+----------+ | CITY | STATUS | +==========+==========+ | London | 20 | +----------+----------+ | Paris | 30 | +----------+----------+ | Athens | 30 | +----------+----------+ Table: CS   SNC still has SNO as a key. Its only functional dependencies are from its key. The FD { CITY } → { STATUS } makes CITY the candidate key for CS. So both SNC and CS are in BCNF. We have eliminated the redundant STATUS values from RS, and we have not lost any information: RS = SNC JOIN CS. As another example, suppose we started with a suppliers relvar that included quanity: +-------+----------+-------+-------+ | SNO | STATUS | PNO | QTY | +=======+==========+=======+=======+ | S1 | 20 | P1 | 300 | +-------+----------+-------+-------+ | S1 | 20 | P2 | 200 | +-------+----------+-------+-------+ | S1 | 20 | P3 | 400 | +-------+----------+-------+-------+ | S1 | 20 | P4 | 200 | +-------+----------+-------+-------+ | S1 | 20 | P5 | 100 | +-------+----------+-------+-------+ | S1 | 20 | P6 | 100 | +-------+----------+-------+-------+ | S2 | 10 | P1 | 300 | +-------+----------+-------+-------+ | S2 | 10 | P2 | 400 | +-------+----------+-------+-------+ | S3 | 30 | P2 | 200 | +-------+----------+-------+-------+ | S4 | 20 | P2 | 200 | +-------+----------+-------+-------+ | S4 | 20 | P4 | 300 | +-------+----------+-------+-------+ | S4 | 20 | P5 | 400 | +-------+----------+-------+-------+ Table: STP Its key is still { SNO, PNO }. But this new suppliers relvar satisfies a FD of { SNO } → { STATUS }, and therefore it has duplicate STATUS values. To eliminate these redundant status values we decompose it into these two projections: +-------+----------+-------+-------+ | SNO | STATUS | PNO | QTY | +=======+==========+=======+=======+ | S1 | 20 | P1 | 300 | +-------+----------+-------+-------+ | S1 | 20 | P2 | 200 | +-------+----------+-------+-------+ | S1 | 20 | P3 | 400 | +-------+----------+-------+-------+ | S1 | 20 | P4 | 200 | +-------+----------+-------+-------+ | S1 | 20 | P5 | 100 | +-------+----------+-------+-------+ | S1 | 20 | P6 | 100 | +-------+----------+-------+-------+ | S2 | 10 | P1 | 300 | +-------+----------+-------+-------+ | S2 | 10 | P2 | 400 | +-------+----------+-------+-------+ | S3 | 30 | P2 | 200 | +-------+----------+-------+-------+ | S4 | 20 | P2 | 200 | +-------+----------+-------+-------+ | S4 | 20 | P4 | 300 | +-------+----------+-------+-------+ | S4 | 20 | P5 | 400 | +-------+----------+-------+-------+ Table: STP | | => +-------+-------+-------+ | SNO | PNO | QTY | +=======+=======+=======+ | S1 | P1 | 300 | +-------+-------+-------+ | S1 | P2 | 200 | +-------+-------+-------+ | S1 | P3 | 400 | +-------+-------+-------+ | S1 | P4 | 200 | +-------+-------+-------+ | S1 | P5 | 100 | +-------+-------+-------+ | S1 | P6 | 100 | +-------+-------+-------+ | S2 | P1 | 100 | +-------+-------+-------+ | S2 | P2 | 400 | +-------+-------+-------+ | S3 | P2 | 200 | +-------+-------+-------+ | S4 | P2 | 200 | +-------+-------+-------+ | S4 | P4 | 300 | +-------+-------+-------+ | S4 | P5 | 400 | +-------+-------+-------+ Table: SP +-------+----------+ | SNO | STATUS | +=======+==========+ | S1 | 20 | +-------+----------+ | S2 | 10 | +-------+----------+ | S3 | 30 | +-------+----------+ | S4 | 20 | +-------+----------+ Table: SS   STP was not in BCNF because the FD { SNO } → { STATUS } was not from a superkey. In relvar SS the functional dependency is now from a superkey (in fact, a key), so SS is in BCNF, as is SP because { SNO, PNO } is the key. The decomposition is lossless: STP = JOIN (SP, SS). Lossles Decomposition and the Heath Theorem ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Decomposition is, as shown above, the process of taking projections of the original relvar. We must ensure, though, that information is not lost: the join of the resulting relvars should be the original relvar. For example, consider this decomposition of RS which loses information: +-------+----------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+==========+==========+==========+ | S1 | Smith | 20 | London | +-------+----------+----------+----------+ | S2 | Jones | 30 | Paris | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+----------+----------+----------+ | S4 | Clark | 20 | London | +-------+----------+----------+----------+ | S5 | Admans | 30 | Athens | +-------+----------+----------+----------+ Table: RS | | => +-------+----------+----------+ | SNO | SNAME | STATUS | +=======+==========+==========+ | S1 | Smith | 20 | +-------+----------+----------+ | S2 | Jones | 30 | +-------+----------+----------+ | S3 | Blake | 30 | +-------+----------+----------+ | S4 | Clark | 20 | +-------+----------+----------+ | S5 | Admans | 30 | +-------+----------+----------+ Table: SNS +----------+----------+ | CITY | STATUS | +==========+==========+ | London | 20 | +----------+----------+ | Paris | 30 | +----------+----------+ | Athens | 30 | +----------+----------+ Table: CS   We have lost information: RS ≠ JOIN(SNS, CS). We no longer know the city of the supplier. Is S2 located in Paris or Athens? Just like the first example we decomposed the orginal relvar using project. What condition(s) *must* be met in order for our projections to always be equal to their join, ensuring our decompositions lossless. The Heath Theorem provides a sufficient condition for lossless decomposition. It tells us, if we meet the theorem's conditions, the decomposition will be lossless. The theorem states: *Heath Theorem*: If we take subsets of the heading of *R*, say, *A*, *B*, and *C* such that their union is equal to the heading, and if *R* satisfies the functional dependency constraint *A* → *B*, then *R* is equal to the join of its projections on *AB* and *AC*, where *AB* is the union of *A* and *B* and *AC* is the union of *A* and *C*. If you think about it, this theorem is just a formal statement of something obvious: if you decompose a relvar into two projections such that both of the projections contain the "left side" of the functional dependency (the *A* in *A* → *B*), and one of the projections also contains the "right side" (the *B* in *A* → *B*) and the other also contains the remaining attributes, those on neither side of the FD, then join always gives you back the original relvar. The Heath theorem gives us a sufficient condition to ensure the decomposition is lossles. If the conditions of the Heath Theorem are met, we know that our decomposition is lossless. It does not give a necessary condition: it does not tell us however if other decompositions will be lossey or lossless. What the necessary condition is for lossless decomposition is discussed in the answers section of chapter 7. The RS and STP decomposition examples above are both instances where Heath's theorem tells us the decomposition will be lossless; we will not lose information. In the case of relvar STP, for example, STP satisfies the FD { SNO } → { STATUS }. { SNO } corresponds to *A* in the theorem, { STATUS } corresponds to B, and { PNO, QTY } to C. The union of A, B and C is clearly the heading of STP. { SNO, STATUS } corresponds to AB, the union of A and B. { SNO, PNO, QTY } corresponds to AC, the union of A and B. Finally, we applying the theorem, we see that :: JOIN(STP { SNO, STATUS }, STP { SNO, PNO, QTY }) = STP; Likewise, in the case of relvar RS, we have the FD { CITY } → { STATUS }. { CITY } is A, { STATUS } is B, { STATUS, CITY } is AB, and { SNO, SNAME, CITY } is AC. Again, we see that :: JOIN(RS { SNO, SNAME, CITY }, RS { STATUS, CITY }) = RS; Here is another decomposition example. We have a relvar in first normal form: all values are atomic. But we have a lot of redundancy. +--------------------+--------------+-----------------------------+--------------+------+ | questionID | answertype | answer | answercode | id | +====================+==============+=============================+==============+======+ | experience1 | radio | Very experienced | very | 1 | +--------------------+--------------+-----------------------------+--------------+------+ | experience1 | radio | Somewhat experienced | somewhat | 1 | +--------------------+--------------+-----------------------------+--------------+------+ | experience1 | radio | Little experience | little | 1 | +--------------------+--------------+-----------------------------+--------------+------+ | experience1 | radio | No experience | none | 1 | +--------------------+--------------+-----------------------------+--------------+------+ | experience2 | radio | Very experienced | very | 2 | +--------------------+--------------+-----------------------------+--------------+------+ | experience2 | radio | Somewhat experienced | somewhat | 2 | +--------------------+--------------+-----------------------------+--------------+------+ | experience2 | radio | Little experience | little | 2 | +--------------------+--------------+-----------------------------+--------------+------+ | experience2 | radio | No experience | none | 2 | +--------------------+--------------+-----------------------------+--------------+------+ | experience3 | radio | Very experienced | very | 3 | +--------------------+--------------+-----------------------------+--------------+------+ | experience3 | radio | Somewhat experienced | somewhat | 3 | +--------------------+--------------+-----------------------------+--------------+------+ | experience3 | radio | Little experience | little | 3 | +--------------------+--------------+-----------------------------+--------------+------+ | experience3 | radio | No experience | none | 3 | +--------------------+--------------+-----------------------------+--------------+------+ | easeofuse1 | radio | Very simple to use | vsimple | 6 | +--------------------+--------------+-----------------------------+--------------+------+ | easeofuse1 | radio | Somewhat simple to use | simple | 6 | +--------------------+--------------+-----------------------------+--------------+------+ | easeofuse1 | radio | Somewhat difficult to use | diff | 6 | +--------------------+--------------+-----------------------------+--------------+------+ | usefulness1 | radio | Very useful | very | 7 | +--------------------+--------------+-----------------------------+--------------+------+ | usefulness1 | radio | Somewhat useful | somewhat | 7 | +--------------------+--------------+-----------------------------+--------------+------+ | usefulness1 | radio | Not that useful | little | 7 | +--------------------+--------------+-----------------------------+--------------+------+ | usefulness1 | radio | Not useful at all | none | 7 | +--------------------+--------------+-----------------------------+--------------+------+ | relevance1 | radio | Very relevant | very | 8 | +--------------------+--------------+-----------------------------+--------------+------+ | relevance1 | radio | Somewhat relevant | somewhat | 8 | +--------------------+--------------+-----------------------------+--------------+------+ | relevance1 | radio | A little relevant | little | 8 | +--------------------+--------------+-----------------------------+--------------+------+ | relevance1 | radio | Not relevant at all | none | 8 | +--------------------+--------------+-----------------------------+--------------+------+ | understand1 | radio | Yes | yes | 9 | +--------------------+--------------+-----------------------------+--------------+------+ | understand1 | radio | No | no | 9 | +--------------------+--------------+-----------------------------+--------------+------+ | recommend1 | radio | Yes | yes | 10 | +--------------------+--------------+-----------------------------+--------------+------+ | recommend1 | radio | No | no | 10 | +--------------------+--------------+-----------------------------+--------------+------+ | compare1 | radio | More useful | more | 11 | +--------------------+--------------+-----------------------------+--------------+------+ | compare1 | radio | Equally useful | equal | 11 | +--------------------+--------------+-----------------------------+--------------+------+ | compare1 | radio | Less useful | less | 11 | +--------------------+--------------+-----------------------------+--------------+------+ | review1 | textarea | NULL | | 12 | +--------------------+--------------+-----------------------------+--------------+------+ | useremotelabs | radio | Yes | yes | 4 | +--------------------+--------------+-----------------------------+--------------+------+ | useremotelabs | radio | No | no | 4 | +--------------------+--------------+-----------------------------+--------------+------+ | whynotremotelabs | textarea | NULL | | 5 | +--------------------+--------------+-----------------------------+--------------+------+ We first notice that this relvar has the following functional dependencies: | { id } → { questionID }. | { questionID } → { answertype }. | { id } → { answertype }. The last dependency is due to the transitivity of these first two dependencies. We can normalize survey\_answers by splitting it into two relations: survey\_questions and survey\_answers. +--------------------+--------------+------+ | questionID | answertype | id | +====================+==============+======+ | experience1 | radio | 1 | +--------------------+--------------+------+ | experience2 | radio | 2 | +--------------------+--------------+------+ | experience3 | radio | 3 | +--------------------+--------------+------+ | easeofuse1 | radio | 6 | +--------------------+--------------+------+ | usefulness1 | radio | 7 | +--------------------+--------------+------+ | relevance1 | radio | 8 | +--------------------+--------------+------+ | understand1 | radio | 9 | +--------------------+--------------+------+ | recommend1 | radio | 10 | +--------------------+--------------+------+ | compare1 | radio | 11 | +--------------------+--------------+------+ | review1 | textarea | 12 | +--------------------+--------------+------+ | useremotelabs | radio | 4 | +--------------------+--------------+------+ | whynotremotelabs | textarea | 5 | +--------------------+--------------+------+ Table: survey\_questions survey\_answers answer answercode id Very experienced very 1 Somewhat experienced somewhat 1 Little experience little 1 No experience none 1 Very experienced very 2 Somewhat experienced somewhat 2 Little experience little 2 No experience none 2 Very experienced very 3 Somewhat experienced somewhat 3 Little experience little 3 No experience none 3 Very simple to use vsimple 6 Somewhat simple to use simple 6 Somewhat difficult to use diff 6 Very useful very 7 Somewhat useful somewhat 7 Not that useful little 7 Not useful at all none 7 Very relevant very 8 Somewhat relevant somewhat 8 A little relevant little 8 Not relevant at all none 8 Yes yes 9 No no 9 Yes yes 10 No no 10 More useful more 11 Equally useful equal 11 Less useful less 11 NULL NULL 12 Yes yes 4 No no 4 NULL NULL 5   The functional dependencies for the new survery\_questions revlar are still the same: | { id } → { questionID }. | { questionID } → { answertype }. | { id } → { answertype }. And since the candidate keys for the new survey\_questions relvar are { questionID } and {id}, this means survery\_questions is in BCNF: all its functional dependencies are *from* attributes that contain candidate keys. Functional dependencies can be from sets of attributes (more than one), but here the FDs are from sets containing only one attribute. What about the relvar survey\_answers? First, we notice there is an entity integrity problem with id values of 12 and 5, but we can just delete those tuples because id is not a candidate key here. It is a foreign key referring to the candidate key for survey\_questions. As long as there is an existing id in survey\_questions for every id in survey\_answers referential integrety will not be violated. So there is no danger in deleting these two tuples. The new survey\_answers appears to have the function dependency { answer } → { answercode}. Boyce Codd Normal Form requires all functional dependencies to be *from* supersets, but { answer } contains neither one of the two candidate keys for survey\_answers, {answer, id} and {answercode, id}, meaning our new survery\_answers is still not in BCNF. We need to decompose this survey\_answers relvar further. Consider this decomposition of survey\_answers into survey\_answer and answer\_choices: +--------------+------+ | answercode | id | +==============+======+ | very | 1 | +--------------+------+ | somewhat | 1 | +--------------+------+ | little | 1 | +--------------+------+ | none | 1 | +--------------+------+ | very | 2 | +--------------+------+ | somewhat | 2 | +--------------+------+ | little | 2 | +--------------+------+ | none | 2 | +--------------+------+ | very | 3 | +--------------+------+ | somewhat | 3 | +--------------+------+ | little | 3 | +--------------+------+ | none | 3 | +--------------+------+ | vsimple | 6 | +--------------+------+ | simple | 6 | +--------------+------+ | diff | 6 | +--------------+------+ | very | 7 | +--------------+------+ | somewhat | 7 | +--------------+------+ | little | 7 | +--------------+------+ | none | 7 | +--------------+------+ | very | 8 | +--------------+------+ | somewhat | 8 | +--------------+------+ | little | 8 | +--------------+------+ | none | 8 | +--------------+------+ | yes | 9 | +--------------+------+ | no | 9 | +--------------+------+ | yes | 10 | +--------------+------+ | no | 10 | +--------------+------+ | more | 11 | +--------------+------+ | equal | 11 | +--------------+------+ | less | 11 | +--------------+------+ | yes | 4 | +--------------+------+ | no | 4 | +--------------+------+ Table: survey\_answers +-----------------------------+--------------+ | answer | answercode | +=============================+==============+ | Very experienced | very | +-----------------------------+--------------+ | Somewhat experienced | somewhat | +-----------------------------+--------------+ | Little experience | little | +-----------------------------+--------------+ | No experience | none | +-----------------------------+--------------+ | Very simple to use | vsimple | +-----------------------------+--------------+ | Somewhat simple to use | simple | +-----------------------------+--------------+ | Somewhat difficult to use | diff | +-----------------------------+--------------+ | Somewhat useful | somewhat | +-----------------------------+--------------+ | Not that useful | little | +-----------------------------+--------------+ | Not useful at all | none | +-----------------------------+--------------+ | Very relevant | very | +-----------------------------+--------------+ | Somewhat relevant | somewhat | +-----------------------------+--------------+ | A little relevant | little | +-----------------------------+--------------+ | Not relevant at all | none | +-----------------------------+--------------+ | Yes | yes | +-----------------------------+--------------+ | No | No | +-----------------------------+--------------+ | More useful | more | +-----------------------------+--------------+ | Equally useful | equal | +-----------------------------+--------------+ | Less useful | less | +-----------------------------+--------------+ Table: answer\_choices   Both relvars appear to be in BCNF. The only candidate key for survery\_answers is its entire heading { answercode, id }. But since survery\_answers only has trivial functional dependencies, it is in BCNF. The only candidate key for answer\_choices is { answer }. Since answer\_choices has the FD { answer } → { answercode}, which is a dependency *from* a key, it also is in BCNF. But we now have a different soft of problem. If we tried to join these two relations together on the common foreign key answercode, we would have no idea whether id 1 corresponds to answer "Very experienced" or answer "Very relevant". We have lost information, making this a lossy decomposition. So we try yet again. +-----------------------------+------+ | answer | id | +=============================+======+ | Very experienced | 1 | +-----------------------------+------+ | Somewhat experienced | 1 | +-----------------------------+------+ | Little experience | 1 | +-----------------------------+------+ | No experience | 1 | +-----------------------------+------+ | Very experienced | 2 | +-----------------------------+------+ | Somewhat experienced | 2 | +-----------------------------+------+ | Little experience | 2 | +-----------------------------+------+ | No experience | 2 | +-----------------------------+------+ | Very experienced | 3 | +-----------------------------+------+ | Somewhat experienced | 3 | +-----------------------------+------+ | Little experience | 3 | +-----------------------------+------+ | No experience | 3 | +-----------------------------+------+ | Very simple to use | 6 | +-----------------------------+------+ | Somewhat simple to use | 6 | +-----------------------------+------+ | Somewhat difficult to use | 6 | +-----------------------------+------+ | Very useful | 7 | +-----------------------------+------+ | Somewhat useful | 7 | +-----------------------------+------+ | Not that useful | 7 | +-----------------------------+------+ | Not useful at all | 7 | +-----------------------------+------+ | Very relevant | 8 | +-----------------------------+------+ | Somewhat relevant | 8 | +-----------------------------+------+ | A little relevant | 8 | +-----------------------------+------+ | Not relevant at all | 8 | +-----------------------------+------+ | Yes | 9 | +-----------------------------+------+ | No | 9 | +-----------------------------+------+ | Yes | 10 | +-----------------------------+------+ | No | 10 | +-----------------------------+------+ | More useful | 11 | +-----------------------------+------+ | Equally useful | 11 | +-----------------------------+------+ | Less useful | 11 | +-----------------------------+------+ | Yes | 4 | +-----------------------------+------+ | No | 4 | +-----------------------------+------+ Table: survey\_answers +-----------------------------+--------------+ | answer | answercode | +=============================+==============+ | Very experienced | very | +-----------------------------+--------------+ | Somewhat experienced | somewhat | +-----------------------------+--------------+ | Little experience | little | +-----------------------------+--------------+ | No experience | none | +-----------------------------+--------------+ | Very simple to use | vsimple | +-----------------------------+--------------+ | Somewhat simple to use | simple | +-----------------------------+--------------+ | Somewhat difficult to use | diff | +-----------------------------+--------------+ | Somewhat useful | somewhat | +-----------------------------+--------------+ | Not that useful | little | +-----------------------------+--------------+ | Not useful at all | none | +-----------------------------+--------------+ | Very relevant | very | +-----------------------------+--------------+ | Somewhat relevant | somewhat | +-----------------------------+--------------+ | A little relevant | little | +-----------------------------+--------------+ | Not relevant at all | none | +-----------------------------+--------------+ | Yes | yes | +-----------------------------+--------------+ | No | no | +-----------------------------+--------------+ | More useful | more | +-----------------------------+--------------+ | Equally useful | equal | +-----------------------------+--------------+ | Less useful | less | +-----------------------------+--------------+ Table: answer\_choices   This time, not only are both relations in BCNF, but if you join them together on answer, you get back the exact relation we had before. This is a nonloss decomposition, and we note that this decomposition complies with the criteria of Heath's theorem. To confirm that both the new survey\_answers and answer\_choices are in BCNF, we see that Join dependencies ^^^^^^^^^^^^^^^^^ A join dependency or **JD** is a set of projections on a relation which, when joined together, returns the original relation. The formal definition of a join dependency is: *Definition*: Let *A*, *B*, *C*, ..., *Z* be subsets of the heading of a relvar *R*. Then *R* satisfies the *join dependency* (JD) :: ∗ { A, B, ..., Z } if and only if every relation that is a valid value for *R* is equal to the join of its projections on *A*, *B*, *C*, ..., *Z*, or :: JOIN ( R { A }, R { B }, ..., R { Z } ) = R; To say a relvar *R* can be losslessly decomposed into certain projections on *A*, *B*, *C*, ..., *Z* is to say that it satisfies the JD ∗ { A, B, ..., Z }. Join dependencies are just another way of describing nonlossless decompositions of a relvar. It also follows from the the Heath theorem that every FD is a JD. Take the revlar RS above. It has the FD { CITY } → { STATUS }. We first decomposed it into the relvars SNC and CS, with attributes of { SNO, SNAME, CITY } and { CITY, STATUS }, respectively. RS has thus satisfied the join dependency ∗ { { SNO, SNAME, CITY }, { CITY, STATUS } }. Likewise, when survey\_answers had three attributes—{id, answer, answercode}— it really had only one join dependency:  ∗ {{id, answer}, {answer, answer\_code}}. The existence of FDs and multi–valued dependencies (MVDs) implies that certain decompositions have lossless join. That is, FDs and MVDs imply corresponding JDs. JDs, therefore, constitute a generalization of FDs and MVDs. As such, while every FD and MVD can be stated as an equivalent JD, there can be JDs which are not FDs or MVDs. :: ∗ { {id, answer}, {answer,answercode} } This means that { id, answer } and { answer, answercode } were the only two subsets in the original survey\_answers that could be nonlossly decomposed into two separate relations. So, in order to satisfy Boyce/Codd Normal Form, that's just what we did. Fifth Normal Form ^^^^^^^^^^^^^^^^^ Superkeys always imply certain functional dependencies (from the superkeys). Since every FD is also a JD (according to the Heath Theorem), superkeys always imply certain join dependencies. Take our suppliers relvar SS for example. { SNO } is a superkey (actually a key). SS satisfies the JD of :: ∗ { {SNO, SNAME}, { SNO, STATUS }, { SNO, CITY } } The definition of 5th Normal Form (5NF) is: *Definition*: Relvar *R* is in 5NF if and only if every non-trivial join dependency satisfied by *R* is implied by the superkeys of *R* A trivial join dependency is a join dependency in which one of the subsets of the heading is the entire heading. Obviously every relvar has trivial join dependencies because a subset consisting of the entire header of a relvar can be joined with any other subset of the relvar. Also the same subset consisting of all the attributes of the header can be joined with TABLE\_DEE. Most relvars that are in BCNF are also in 5NF. The exceptions are small. The following theorem explains that the only time when a BCNF relvar could possibly not be in 5NF, could possibly have a nontrivial JD that does not consist entirely of superkeys, is when there are keys with more than one attribute. *Theorem*: Let *R* be a relvar in BCNF (or even just a 3NF relvar) and let *R* have no composite keys (that is, keys consisting of two or more attributes), then *R* is in 5NF. So if you can get to BCNF (which is easy enough) and there aren't any composite keys in your BCNF relvar, you don't have to worry about the complexities of general JDs and 5NF: you know without have to think about the matter any further that the relvar is simply in 5NF. What does 5NF mean? ^^^^^^^^^^^^^^^^^^^ If a relvar is in 5NF, the only nontrivial JDs are those implied by superkeys. The only lossless projection decompostions are ones in which every projection is on the attributes of some superkeys; each such projection includes some key of *R*. As a consequence, the corresponding "recomposition" joins are all **one-to-one (from one tuple to one tuple)**, and no redunancies are or can be eliminated by the decompostions ("joins are all one-to-one" means: "the recomposition join is a `bijection `__. In other words A JOIN B means A and B have some superkey in common (not necessarily just one attribute) and each tuple in A corresponds to exactly one tuple in B and vice versa." See `Dbforums reply `__:) 5NF does not necessarily mean that all possible redundancy has been eliminated, however. It says that further nonloss decomposition of a 5NF relvar *R* into projections, while it might be possible, won't eliminate any redundancies. But this does not mean that *R* is redundancy free. Certain types of redundancies cannot be reduced through projections. Consider the following 5NF relvar SPJ which suffers from redundancy. Its predicate is: Supplier SNO supplies part PNO to project JNO in quanity QTY. The sole key is { SNO, PNO, JNO }. +-------+-------+-------+-------+ | SNO | PNO | JNO | QTY | +=======+=======+=======+=======+ | S1 | P1 | J1 | 200 | +-------+-------+-------+-------+ | S1 | P3 | J4 | 700 | +-------+-------+-------+-------+ | S2 | P3 | J1 | 400 | +-------+-------+-------+-------+ | S2 | P3 | J2 | 200 | +-------+-------+-------+-------+ | S2 | P3 | J3 | 200 | +-------+-------+-------+-------+ | S2 | P3 | J4 | 500 | +-------+-------+-------+-------+ | S2 | P3 | J5 | 600 | +-------+-------+-------+-------+ | S2 | P3 | J6 | 400 | +-------+-------+-------+-------+ | S2 | P3 | J7 | 800 | +-------+-------+-------+-------+ | S2 | P5 | J1 | 100 | +-------+-------+-------+-------+ Table: SPJ We see the fact that supplier S2 supplies part P3 repeated several times. There are other another redundancies. The fact that part P3 is supplied to project J4 (JNO stands for project number) is repeated. So is the fact that J1 is supplied by supplier S2. The only nontrivial functional dependency satisfied by this relvar is this functional dependency: :: { SNO, PNO, JNO } → { QTY } which is an arrow out of a superkey. In other words, QTY depends on all three of SNO, PNO and JNO, and it can't appear in a relvar with anything less than all three. Hence, there is no nonloss decomposition that can remove the redundancies. The only JD the relvar satisfies is the trivial JD ∗ { {SNO, PNO, JNO, QTY} }. It is always possible to decompose a non-5NF. Because 5NF is the final normal form with respect to projection as the decomposition operator, it is sometimes called *projection/join* normal form to stress that the point so long as we limit ourselves to projection as the decomposion operator and join as the recomposition operator. Finally, keep in mind to say that a revlar is in BCNF does not mean that it is not in 5NF (though it usually is). Join dependency means that an table, after it has been decomposed into two or more smaller tables, must be capable of being joined again on common keys to form the original table. Stated another way, 5NF indicates when an entity cannot be further decomposed using projections. If we examine survery\_questions again, which was in BCNF +--------------------+--------------+------+ | questionID | answertype | id | +====================+==============+======+ | experience1 | radio | 1 | +--------------------+--------------+------+ | experience2 | radio | 2 | +--------------------+--------------+------+ | experience3 | radio | 3 | +--------------------+--------------+------+ | easeofuse1 | radio | 6 | +--------------------+--------------+------+ | usefulness1 | radio | 7 | +--------------------+--------------+------+ | relevance1 | radio | 8 | +--------------------+--------------+------+ | understand1 | radio | 9 | +--------------------+--------------+------+ | recommend1 | radio | 10 | +--------------------+--------------+------+ | compare1 | radio | 11 | +--------------------+--------------+------+ | review1 | textarea | 12 | +--------------------+--------------+------+ | useremotelabs | radio | 4 | +--------------------+--------------+------+ | whynotremotelabs | textarea | 5 | +--------------------+--------------+------+ Table: survey\_questions we notice that it has these join dependencies: | ∗{ {id, questionID}, {questionID,answertype} } | ∗{ {questionID, id}, {id,answertype} } survey\_questions could easily be further nonloss decomposed into two separate relations—in two different ways! So why don't we need to decompose it? Well, not only is survey\_questions already in BCNF, but it's also in 5th Normal Form. This means that each join dependency is satisfied by superkeys: {id, questionID} is a superkey, {questionID,answertype} is also a superkey, and so are the two attribute sets in the second JD. More on the meaning of 5NF ^^^^^^^^^^^^^^^^^^^^^^^^^^ What does a JD really mean intuitively? Take this simplify version of SPJ without the QTY attribute? The predicate for SPJ is: Supplier SNO supplies part PNO to project JNO. The sole key is { SNO, PNO, JNO }. +-------+-------+-------+ | SNO | PNO | JNO | +=======+=======+=======+ | S1 | P1 | J2 | +-------+-------+-------+ | S1 | P2 | J1 | +-------+-------+-------+ | S2 | P1 | J1 | +-------+-------+-------+ | S1 | P1 | J1 | +-------+-------+-------+ Table: SPJ Let's attach a real-world meaning to SPJ. Suppose SPJ tells us that all three of the following are true propostions: #. Smith supplies monkey wrenches to some project (think of S1 as Smith and P1 as monkey wrenches). #. Somebody supplies monkey wrenches to the Manhattan project (think of P1 as monkey wrenches and J1 as the Manhattan project). #. Something is supplied to the Manhattan project by Smith (think of J1 as the Manhattan project and S1 as Smith) Then is the following proposition true? 4. Smith supplies monkey wrenches to the Manhattan project (S1 supplies P1 to project J1). Normally, propositions 1, 2 and 3 would not imply proposition 4. We know that Smith supplies wrenches to *some* project (say, project *z*) from proposition 1, that some supplier (say, supplier *x*) supplies monkey wrenches to the Manhattan project (from proposition 2), and that Smith supplies some part (say, part *y*) to the Manhattan project—but we cannot validly infer, just on the basis of the three propositions, that *x* is Smith or *y* is monkey wrenches or *z* is the Manhattan project. Here is how a join dependency helps supply the answer. SPJ satisfies the join dependency ∗ { { SNO, PNO }, { PNO, JNO }, { SNO, JNO } }, which we will abbreviate as ∗ { SP, PJ, SJ }. This JD tell us that SPJ satisfies the following obvious constraint: :: IF ∈ SP AND ∈ PJ AND ∈ SJ THEN ∈ SPJ For example, the tuples , and appear in SP, PJ, and SJ, respectively, therefore the tuple will obviously appear in their join SPJ. If we look at it from a different slightly different angle, we see that the tuple appears in SP if (and only if) there exists some value *z* such that appears in SPJ. Likewise, the tuple appears in PJ if (and only if) there exists some value *x* such that appears in PJ, and the tuple appears in SJ if (and only if) appears in SJ for some *y*. So we see that the earlier constraint is logically equivalent to this one: :: IF for some x, y, z ∈ SPJ AND ∈ SPJ AND ∈ SPJ THEN ∈ SPJ Given SPJ above, tuples <**S1**,\ **P1**,J2>, and <**S1**,P2,\ **J1**> all appear in SPJ, and therefore so does <**S1**,\ **P1**,\ **J1** >. The join dependecy ∗ { SP, PJ, SJ } tells us that, given propositions 1, 2 and 3, proposition 4 is also true. We can validly infer proposition 4 from propositions 1, 2, and 3. This constraint has a cyclic nature (IF *s* is connect to *p* and *p* is connected to *j* and *j* is connected back to *s* again, THEN *s* and *p* and *j* must all be directly connected, in the sense that they must all appear together in the same tuple). Such cyclic constraints are instances when we might have a relvar that is in BCNF but not in 5NF. Such cyclic constraints are, though, quite rare in practice. Order of Normalization ^^^^^^^^^^^^^^^^^^^^^^ Well, unless we get a data set into First Normal Form, we aren't dealing with relations at all—so it must go first. And if a relation is in 5th Normal Form, it's also in BCNF, so theoretically we could try to skip BCNF altogether. However, generally it's much easier to deal with functional dependencies than join dependencies (can you imagine figuring out all JDs for the original survey\_answers?), so going for Boyce/Codd Normal Form is the natural progression. And once relations are in BCNF, most of the time they will also be in 5th Normal Form—in fact, if a BCNF relation has a primary key which contains only one attribute, you can bet on it. But still, it's good to use 5th Normal Form as a final check. Normalization and Common sense ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Normalization theory formalizes, explicitly describes, what are certain commonsense principles of design. Normalization and Dependency Preservation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The objective of reducing redundancy can conflict with the objective of dependency preservation. Consider the following relvar: :: ADDR { STREET, CITY, STATE, ZIP } Assume ADDR has these FDs: :: { STREET, CITY, STATE } → { CITY } { ZIP } → { CITY, STATE } Since ZIP is not a key ADDR is not in BNCF. If we apply Heath's theorem and decompose it into BCNF projections: :: ZCS { ZIP, CITY, STATE } KEY { ZIP } ZS { ZIP, STREET } KEY { ZIP, STREET } The original FD of { STREET, CITY, STATE } → { CITY } is gone. It is satisfied by the join of ZCS and ZS, but not by either of those projections. As a result, ZCS and CS cannot be independently updated. For example, if ZCS and CS have these values: +---------+------------+---------+ | ZIP | CITY | STATE | +=========+============+=========+ | 10003 | New York | NY | +---------+------------+---------+ | 10111 | New York | NY | +---------+------------+---------+ Table: ZCS +---------+------------+ | ZIP | STREET | +=========+============+ | 10003 | Broadway | +---------+------------+ Table: CS   If we attempt to insert the tuple <10111, Broadway> into ZS, we will violate the "missing" FD, which says there can only be on zip code for any given street in the same city and state. However, this fact can't be determined without examining projection ZCS as well as projection ZS. For precisely this kind of reason, the dependency preservation objective says: *don't split dependencies acrosss projections*. A FD is a single relvar constraint. When a FD is eliminated during decomposition, it changes to become a multi-relvar constraint. The dependency preservation principle is revelant when you need the restraint to remain a single-relvar restraint. Orthogonality ^^^^^^^^^^^^^ Normalization is intended to reduce redundancy *within* relvars. Orthogonality refers to avoiding redundancy *across* relvars, avoiding duplicate tuples in two or more distinct relvars. Duplicate tuples can occur when constraints overlap. For example, if relvar SA below consists of suppliers who are in Paris, and relvar SB is suppliers who either aren't in Paris or who have STATUS of 30, this overlap leads to duplicate tuples. +-------+---------+----------+---------+ | SNO | SNAME | STATUS | CITY | +=======+=========+==========+=========+ | S2 | Jones | 10 | Paris | +-------+---------+----------+---------+ | S3 | Blake | 30 | Paris | +-------+---------+----------+---------+ Table: SA +-------+---------+----------+----------+ | SNO | SNAME | STATUS | CITY | +=======+=========+==========+==========+ | S1 | Jones | 20 | London | +-------+---------+----------+----------+ | S3 | Blake | 30 | Paris | +-------+---------+----------+----------+ | S4 | Clark | 20 | London | +-------+---------+----------+----------+ | S5 | Adams | 30 | Athens | +-------+---------+----------+----------+ Table: SB   Tuple S3 must appear in both relvars. If it didn't it would violate their constraints. Obviously, if we never have relvars of the same type, we will never have duplicate tuples. But we must also take into account subsets of the attributes of a tuple. For example, +-------+---------+----------+ | SNO | SNAME | STATUS | +=======+=========+==========+ | S1 | Smith | 20 | +-------+---------+----------+ | S2 | Jones | 10 | +-------+---------+----------+ | S3 | Blake | 30 | +-------+---------+----------+ | S4 | Clark | 20 | +-------+---------+----------+ | S5 | Adams | 30 | +-------+---------+----------+ Table: SX +-------+---------+----------+ | SNO | SNAME | CITY | +=======+=========+==========+ | S1 | Smith | London | +-------+---------+----------+ | S2 | Jones | Paris | +-------+---------+----------+ | S3 | Blake | Paris | +-------+---------+----------+ | S4 | Clark | London | +-------+---------+----------+ | S5 | Adams | Athens | +-------+---------+----------+ Table: SY   If we take just the two attributes [SNO, SNAME] from each tuple of SX and SY, we see they are identical. This type of redundancy also leads to update anomalies. The principle of orthogonal design can be put this way: *The principle of Orthogonal Design:* If *A* and *B* are distinct relvars in the same database, then there must not exist nonlossless decompositions of *A* and *B* into *A1*, *A2*, ... *Am* and *B1*, *B2*, ... *Bm*, respectively, such that the relvar constraints for some projection *Ai* in the set *A1*, *A2*, ... *Am* and some projection *Bj* in the set *B1*, *B2*, ... *Bm*, permit the same tuple to appear in both. That is, such decompositions "must not exist" because they introduce redundacy across the resulting relvars. Since one "nonloss decomposition" is the identity projection (the projection of a relvar on all its attributes), this definition also covers the case illustrated by the first example of SA and SB. Like the principles of normalization, the principle of Orthogonal Design is basically just common sense. Normalization is intended to reduce redundancy within relvars, and the principle of orthogonality is intended to reduce redundancy across relvars. The example of SX and SY illustrates a lossless decomposition of S. SX is the projection S { SNO, SNAME, STATUS}, and SY is the projection S {SNO, SNAME, CITY }. Both SX and SY are in 5NF. Yet the decomposition is bad because SX and SY contain duplicate tuples. TO DO: ====== Add key points from lesson 8 and 9, having to do with PHP/SQL implementation issues: #. protecting against injection attacks #. ???