Published on Thu Dec 24 2020
Handling SQL Nulls with Two-Valued Logic
See More ...
The design of SQL is based on a three-valued logic (3VL), rather than the
familiar Boolean logic. 3VL adds a truth value unknown to true and false to
handle nulls. Viewed as indispensable for SQL expressiveness, it is at the same
time much criticized for unintuitive behavior of queries and being a source of
programmer mistakes. We show that, contrary to the widely held view, SQL could
have been designed based on the standard Boolean logic, without any loss of
expressiveness and without giving up nulls. The approach itself follows SQL's
evaluation, which only retains tuples for which conditions in WHERE evaluate to
true. We show that conflating unknown with false leads to an equally expressive
version of SQL that does not use the third truth value. Queries written under
the two-valued semantics can be efficiently translated into the standard SQL
and thus executed on any existing RDBMS. These results cover the core of the
SQL 1999 Standard: SELECT-FROM-WHERE-GROUP BY-HAVING queries extended with
subqueries and IN/EXISTS/ANY/ALL conditions, and recursive queries. We also
investigate new optimization rules enabled by the two-valued SQL, and show that
for many queries, including most of those found in benchmarks such as TPC-H and
TPC-DS, there is no difference between three- and two-valued versions.