Last year Jamie Brandon, an independent database researcher, wrote a blog post "Against SQL" which argued that SQL is fatally flawed and needs to be replaced. The most persuasive criticism of SQL in this post is that it isn't a library language. It has no ecosystem of reusable code solving common problems. In contrast, other languages like C++ and Java benefit from a huge array of open source libraries; it would be difficult to imagine writing the software we do today without the ability to "stand on the shoulders of giants" in the form of open-source code.
Instead, SQL implements reusable functionality through the standardization process. The major vendors of SQL databases get together and try to agree on new functions and syntax that they then all implement in a (hopefully) compatible way. Unsurprisingly, this process is extremely slow.
Why is it that so many programming languages support reusable open-source libraries, but SQL does not? There are two fundamental barriers to writing and sharing an open-source SQL library:
- Every SQL implementation provides a different, incompatible extension mechanism.
- There is no package manager for SQL and thus no way to share a library with others.
An interesting counterexample to all of this is PostGIS, a widely used open-source extension that adds support for geospatial data to Postgres. How did PostGIS overcome these two problems? It avoided the first problem by targeting a single database management system, Postgres. It solved the second problem by becoming so popular that major Postgres vendors like AWS and Azure simply pre-package PostGIS with their implementations and a user can activate it by running the command "CREATE EXTENSION postgis."
The example of PostGIS points to a potential solution to make SQL into a library language.
First, we have to accept the fact that different implementations of SQL are effectively different languages, and that open-source libraries will need to be built separately for each database management system. The good news is that, at least in the world of analytical databases, there are only a few popular systems. At Fivetran, we see the "big four" of Snowflake, BigQuery, Redshift and Databricks accounting for nearly all new users.
Second, the explosive popularity of dbt potentially gives us a way to distribute an open-source library. dbt includes a package manager, which is currently focused on distributing reusable data models, but which could in principle be used to distribute open-source libraries of user-defined functions.
Together, these ideas point to a way to develop an open-source library ecosystem for SQL, at least in the context of analytical workloads. This has already begun to happen: Fivetran has distributed over 50 data models to over 500 customers through dbt package hub. These data models apply commonly used transformations to schemas like Salesforce, Netsuite and Google Ads. In the future, we could imagine other types of packages being distributed through this mechanism: perhaps a library of common time series functions, or a library for analyzing unstructured text. The time has come for the open-source software revolution to reach SQL.
Read more articles by our co-founder and CEO on the Fivetran blog.