Big News In Databases — Fall 2019

Twice a year, I review the news articles from the past six months and report on the most interesting ones…it’s that time again! Subscribe the newsletter to receive future issues.

The Successor to SQL?

In the previous issue I reported on how the SQL standard is currently being extended with new features to query property graph data, and that it might even get stream-processing capabilities. This time I’ll look at an attempt to replace SQL all together.

In August 2019 Amazon announced their new query language PartiQL (pronounced: particle) as the “one query language for all your data”. When taking a closer look, it turns out to be more of a rebranding of SQL++ than a new invention.

The story starts in 2014 when the paper about SQL++ was published. The original version of the paper was a comparison of the capabilities of various query languages such as SQL-on-Hadoop, NoSQL and NewSQL. As a means to an end, SQL++ was created as a hypothetical super-language that supports all of the features of the surveyed languages. The comparison itself was basically a discussion about how the analyzed languages support the SQL++ features.

It was only in later versions of the paper that SQL++ itself moved into the focus. One reason might have been that some vendors where showing interest in implementing that language. One of them was Couchbase—they considered aligning their own language N1QL with SQL++. In the meantime, Couchbase announced their SQL++ implementation N1QL for Analytics. This is not to be confused with N1QL for Query’s—formally N1QL—, which is not SQL++ compatible. Relevant XKCD comic.

At the same time, one of the authors of the SQL++ paper was working at Amazon to establish this super-language as a new norm, i.e. PartiQL. The changes compared to SQL++ are not so much in the language but rather in the ecosystem around it. One example is that there is an (incomplete) open-source reference implementation of PartiQL.

The question whether or not PartiQL can be established as the one query language for all data cannot be simply dismissed by noting that others have failed at this before. PartiQL is not as naive as other attempts in this field. One important aspect is that PartiQL is SQL compatible. Even though PartiQL covers only a small part of SQL-92, there are still many valid SQL queries that are at the same time valid PartiQL queries. That simplifies the adoption. Further, PartiQL cannot be as easily merged into the SQL standard as the examples mentioned above. The essential difference between PartiQL and SQL—dynamic vs. static typing—goes to the heart of SQL. The general adoption of dynamic typing for SQL is unimaginable to me. The only realistic integration of PartiQL into the SQL standard I can see is the introduction of “dynamic types” that are accessed like SQL arrays and SQL object, but don’t apply static typing. A scoped dynamic typing so to say. That could be the best of both worlds.

Cloud Wars: Punitive Tariff for the Switch to the Cloud

In a recent forecast, Gartner sees a 17% revenue growth for public cloud offerings in 2020. No wonder software vendors are generally simplifying the migration of existing on-premises installations into the cloud. It is therefore quite surprising that Microsoft recently did the opposite by introducing a new migration hurdle.

New Training Dates

A lot has happened since SQL-92! Checkout my new modern SQL training dates.

The most important thing first: The new hurdle only affects a very specific scenario. It is about using dedicated hardware in the cloud environments of Alibaba, Amazon, Google and also Microsoft itself. In that case, on-premises licenses that were bought from October 2019 onwards are not automatically eligible to be used in a “bring your own license” setup. For that, you now need to additionally buy Microsoft Software Assurance, which increases the yearly cost by 25%.

Technology and Science

At long last, MySQL supports the hash join algorithm! This algorithm is often best for large amounts of data and has been used by other products for decades. However, today I don’t want to discuss hash join; instead, I would like to provide some background and cover of why it was only recently introduced to MySQL.

The story begins in the 1980s—long before the first MySQL release. Towards the end of this decade the so-called Volcano model for evaluating queries was presented (paper). The key idea was that all operations should have the same interface so that they can be arbitrarily combined into a greater whole, a little like Lego bricks. The system has established itself and is known to many database users as the execution plan.

Initially, MySQL—and thus MariaDB—didn’t implement the Volcano model. Instead, assumptions about how the various operations can be combined sneaked into the code and limited its flexibility. The introduction of new operations, for example hash join, was rather difficult. Work-List Item 11785 has further details. Eventually Oracle decided to implement the Volcano model for MySQL. The new implementation was released with MySQL 8.0.18 in October 2019.

Introducing new operations is comparably simple now. The hash join is just the one example. Another one is EXPLAIN ANALYZE, which uses the common interface between the execution plan operators for performance instrumentation. The syntax as well as the output format were copied from PostgreSQL.

Another, unrelated publication about the Volcano model came from CockroachDB in late October 2019: They extended the interface between the operations of the Volcano-Model to pass multiple rows in one go. Once they also changed the operations to take advantage of that, they observed a 4-fold speed increase for some queries. The same concept was also applied to SQL Server 2019: There it is called batch mode, which was originally introduced for the columnar storage system. Starting with Version 2019, SQL Server can also use batch mode when accessing tables stored in a row-by-row fashion.

Finally, there is one more paper I’d like to mention: APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. It aims to automatically detect performance regression between database versions. After running the algorithm for two months, it has identified a total of 11 bugs in SQLite and PostgreSQL. Two of them have already been fixed, and five more were confirmed by the respective vendor.

New Versions

MySQL 8.0.17 - 8.0.18 (July 2010 - October 2019): I covered the implementation of the Volcano model and the consequential introduction of hash join and EXPLAIN ANALYZE above. Another noteworthy improvement: InnoDB multi-valued indexes (aka inverted index) that allow better indexing of JSON documents.
PostgreSQL 12 (October 2019): With release 12 PostgreSQL has again got some quite interesting extensions. My personal picks are SQL/JSON Path, nondeterministic ICU collations (case insensitive!) and a change to the with clause so it is not always an optimizer fence.
SQLite 3.30.1 (October 2019): The filter clause can now be used on regular aggregations, not just window functions. At the same time, nulls first and nulls last were added to the order by clause.
SQL Server 2019 (v15, November 2019): SQL Server 2019 is not dominated by a few large new features. It is rather made up of many small improvements to existing features. One of them is that SQL Server on Linux is now nearly 100% feature compatible with the Windows version.

A more detailed analysis of the new SQL functionality of these releases will be published on modern-sql.com (Twitter, Email, RSS).

New Articles, Slides and Recordings

New recording (45 min, YouTube): Devoxx.BE: More Than a Query Language: SQL in the 21st Century
New article: My Interview in “The Art of PostgreSQL”
New article: Neues in Oracle-Datenbank Version 19c
New article in the German Java Magazin (1.2020): SQL im 21. Jahrhundert
A translation of this article, entitled “SQL in the 21st Century”, will be published on modern-sql.com soon (Twitter, Email, RSS).

From Twitter, Keeping it Brief (follow me on Twitter)

@MarkusWinand: Just noticed that CouchBase now uses NoEQUAL on their logo. https://couchbase.com Is this they way out of NoSQL if you add SQL?
@MarkusWinand: One persons microservice is another persons monolith.
@KeithWHare: “The History of Time in Data Models” is a timely blog and well worth reading.
I should finally make a Markus Winand FAQ. It has only one entry because there is only one question that I get frequently. Q: What is your favorite database
A: I don’t have a favorite. Depending on the projects requirements, there is a best fit for each project. The requirements also include existing experience in the team and budget. However, if you don’t know the requirements yet PostgreSQL is the best default to start with.
@MarkusWinand: The next MySQL release will have EXPLAIN ANALYZE, like PostgreSQL. They even mimic the output format.
@MarkusWinand: One of the reasons why I’m excited about tomorrow that I have new stickers and coasters ;)
@MarkusWinand: In the past two weeks, I’ve spent quite a lot of time testing SQL/JSON path functionality of seven SQL databases. The reason was the upcomming PostgreSQL 12 release, which will introduce SQL/JSON path.
I created 230 test cases. It took me 16 footnotes, some of them quite long, to describe all the limitations I found. Here is the result in one picture. PostgreSQL still “red” and “n/a” because this picture is based on PostgreSQL 11.
All implementations are utterly incomplete and some of them have interesting bugs and conformance issues (of which I reported some—those where I believe it makes sense). I wonder what spec they were working against.
An then there is PostgreSQL 12 (beta 3). There is only one missing feature (datetime) which is being worked on. The only conformance issue I found is also being worked on. That’s it. Everything else if fine. The most complete and correct SQL/JSON path implementation I’ve seen.
On the ANSI blog: The SQL Standard – ISO/IEC 9075:2016 - There is no such thing as “ANSI Standards,”...
@MarkusWinand: The one and only Hello world! in SQL VALUES('Hello world!')
@richardfoote: Oracle 19c Automatic Indexing: Methodology Introduction (After Today) https://richardfoote.wordpress.com/2019/07/24/oracle-19c-automatic-indexing-methodology-introduction-after-today/
@daniel_abadi: New post on my blog when I discuss correctness/time-travel anomalies in serializable database systems: https://dbmsmusings.blogspot.com/2019/06/correctness-anomalies-under.html

SQL Renaissance Ambassador

As the SQL Renaissance Ambassador, it’s my mission to make developers aware of the evolution of SQL in the 21st century. My book “SQL Performance Explained” has been published in five languages and can be read online free of charge at use-the-index-luke.com. My next book is currently in the works and can already be read online as it’s being written (modern-sql.com). I am available as a trainer, speaker and consultant for all companies and developers interested in SQL. You’ll find more info at winand.at.

Big News In Databases — Fall 2019

The Successor to SQL?

Cloud Wars: Punitive Tariff for the Switch to the Cloud

New Training Dates

Technology and Science

New Versions

New Articles, Slides and Recordings

From Twitter, Keeping it Brief (follow me on Twitter)

SQL Renaissance Ambassador

Connect with Markus Winand