How fast is DuckDB really?

DuckDB is a new SQL execution engine that has been getting lots of attention in the last year. There are many projects like it, so what makes DuckDB special? First, it's exceptionally simple to operate. It's a standalone C++ library with no dependencies, so it's easy to incorporate into larger projects. Second, it's fast. It incorporates the key idea, columnar-vectorized execution, that makes modern analytical databases fast.

But just how fast is DuckDB? Fans often compare it to systems like Postgres or SQLite, but these are operational databases. They are not especially fast at analytical workloads. We would like to know how DuckDB compares to the best analytical databases running on similar hardware. Benchmarking is something of a hobby of mine, so I decided to find out for myself.

How did I configure DuckDB?

I ran DuckDB in 3 configurations on Google Cloud:

Configuration	vCPU (cores)	Memory (GB)
t2d-standard-8	8	32
t2d-standard-16	16	64
t2d-standard-32	32	128

I used DuckDB's built-in tpcds extension to generate the TPC-DS dataset and queries. To understand how performance varies with data size, I generated three different-sized datasets for each configuration: a relatively small size, a medium size, and a large size.

Configuration / Scale	Small	Medium	Large
t2d-standard-8	63 GB	125 GB	250 GB
t2d-standard-16	125 GB	250 GB	500 GB
t2d-standard-32	250 GB	500 GB	1000 GB

I used a Python script to generate the data and execute queries on DuckDB.

How did I configure Database X?

For comparison, I used a commercial data warehouse. I did my best to configure it with similar hardware. I exported the same data generated by DuckDB and imported it into Database X.

What did I find?

DuckDB was faster for small datasets and small hardware. Database X was faster for larger datasets and larger hardware. In the plot below, each line represents a single configuration. You can see that, for a given number of CPUs, DuckDB is faster when the data is small but slows down dramatically as the data gets larger.

We can gain insight into what is happening by plotting the same data a little differently. Here, each line represents constant data per CPU. With Database X, if we simultaneously double the data size and the number of CPUs, performance remains constant. The same is not true of DuckDB. DuckDB is not as good at dividing large amounts of data across many CPU cores. Said another way, Database X scales linearly while DuckDB does not.

Nonetheless, this is an extremely impressive performance for such a young database management system. We should expect that, with time, DuckDB's ability to scale linearly will improve.

But Wait, There's More!

I also benchmarked DuckDB on my 2021-era Macbook Pro. It has an M1 Pro CPU with 10 cores and 32 GB of RAM. I generated the TPC-DS data at 78 GB, 156 GB, and 313 GB scale, which approximately matches the scale-per-CPU of the other tested configurations.

The Macbook Pro is impressive. It outperforms a server with 16 cores. If you're an analyst with a recent Macbook Pro and a small data warehouse, the laptop you use to write SQL queries might be faster than the data warehouse you run them on.

What does this mean for the data ecosystem?

To me, the most astonishing finding is just how fast the Macbook Pro is. For the data sizes people actually have, the computer on your desk is probably competitive with your data warehouse, especially if your data warehouse is shared by many users. And this trend is continuing—my Macbook is almost 2 years old! We have a huge opportunity to save money and improve the end-user experience by moving work to the client. Ironically, this would be a reversal of recent trends in business intelligence tools. But the increasing power of end-user machines is an inexorable force and an irresistible opportunity.