The Relational Database Model and Predicate Logic

After a couple of months of working with MySQL Databases and cowardishly ignoring the fact that I did not have a clue why table-based databases are called "relational", I finally gave in and heroically opened the Wikipedia entry "Relational database". It turns out that the explanation of the name is actually quite interesting - if you are insane. Anyways, here we go.

The relational model for databases is a predicate logic. But what is predicate logic again? Predicate logic / first order logic is a collection of formal systems with certain allowed features. And the relational model is such a formal system. So what are the rules of predicate logic and what does a database with a table have to do with it?

First, let me introduce some logics terms:

Proposition: formally, a function from the set of possible worlds to {True, False}. Eg. "The sky is yellow" - true in one world, false in another.
Formula: well-formed set of symbols (eg. `x \Rightarrow (y \wedge z)` as opposed to `\Rightarrow \wedge w \wedge \Leftarrow`)
sentence: closed formula / formula without free variables (eg. `x \wedge y` is not closed, `\exists_x \forall_y (x \wedge y)` is closed)
Predicate P(.): Can be written as a set P of objects x for which P(x) is true (example predicates: "is blue", "is larger than 5")
Finitary relation: " A finitary relation over a sequence of sets `X_1, ... X_n` is a subset of the cartesian product `X_1 \times ... \times X_n`" [wiki:finitary relation]. Eg. the relation 'x is divisible by y and z' contains the set of 3-tuples that make the sentence true. This is the predicate logic model for a database table. Predicates can be interpreted as relations.

What is an n-th order logic?

0th order logic / propositional logic includes propositions and compound propositions (conjunction, disjunction, implication, biconditional, negation). Also, it does not allow non-logical objects yet (like numbers or biologists). There is no "domain of discourse".
1st order logic / predicate logic: Here, we also allow variables over non-logical objects (ie. a domain of discourse, like the real numbers), predicates as subsets of this domain (and relations - also sets) and quantifiers (for all/it exists).
2nd order logic: While 1st order logic allows sets, 2nd order logic allows sets of sets / relations of relations. It can also quantify over predicates and relations (formulas) (variables don't have to be individuals). Eg. `P(x)` is first order logic, but `\exists_P P(b)` is not 1st but 2nd order logic because it uses a quantifier over predicates.

Back to relational databases

This is the definition of a relational database:

A database follows the relational model iff all data is represented as tuples (rows/entries) grouped into relations (tables)

Just kidding, here is the cool version:

From now on I will call a table in a database a relation, and because we already defined (finitary) relation, this completes the definition.

In more detail: In the relational model of databases, we think of tables as relations, meaning that the formula R(a,b) is true iff (a,b) is in R, where R is a relation. This makes sense because as we saw above, a relation in logics is formally written down as a set of tuples for which the relation is true. Now instead of saying "The row (a,b,c) is an entry in my table R.", we can arrogantly proclaim "R(x,y,z) is true, where R is a infinitary relation". Note that a relation is a predicate, which is one order above a proposition (in the sense of 1st vs. 0st order logic).

We now have connected the language of logics with the language of databases via the word "relation", and from this connection point we can equate other terms of the two languages.

Table R contains entry (a,b,c,...) = relation R is true for (a,b,c…) / R(a,b,c,...) is true
Row = tuple
Heading = set of attributes = set of propositions
Data type: domain (think of the datatype int as the domain of natural numbers)
Table (without data) = predicate variable / relational variable. The data model (a number of attributes with given datatypes), without any data filled into it, corresponds to a relation over these domains (datatypes), but where we have not yet filled the relation with the information over which tuples are "true".
Table (heading+body) = relation. Now with content, not just the data model.
Table (just content/body without heading): list of tuples for which the Relation is true. List of true propositions R(a,b,c…)
Relational database = collection of relational variables
Queries/constraints: predicates
Nr. of entries = cardinality
Nr. of columns = degree
Key: a subset of attributes whose values uniquely identify a tuple
Foreign key: subset of attributes in a relation that correspond with a key of another relation, such that each element of the first relation has an entry in the second relation with the same values for the key.

Now we have a formalization for relational databases by which we can write down concepts like the JOIN operation in a compact and easy to communicate manner.

So why do we need predicate (1st order) logic exactly?

0th order logic is not enough because 0th order logic has no sets, so no relations and therefore no tables. Also, in 0th order logic there would not be any real world domain, so it cannot talk about actual data. A zero order logic formula is just a combination of propositions with no non-logical objects. So no integers or strings are allowed.

And why not 2nd order logic? Because first order is enough for what we need for data modelling. Also, I guess you wouldn't want to run into any "`A \iff A` cannot be proven" troll queries by some guy called Kurt Gödel, that ends up deleting your fancy 2nd order logic database.

I think knowing all this doesn't help you at all when dealing with MySQL databases. But I think this formalization allows very concise and clear definitions of more complex concepts. In my opinion people should seriously consider documenting their database operations in the language of relational algebra.

Sources:

https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems

https://en.wikipedia.org/wiki/Second-order_logic

https://en.wikipedia.org/wiki/First-order_logic

https://en.wikipedia.org/wiki/Finitary_relation

https://en.wikipedia.org/wiki/Sentence_(mathematical_logic)

https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators

Search This Blog

electron brian

The Relational Database Model and Predicate Logic

Comments

Post a Comment

Popular posts from this blog

Nyar the Scientist