If you're interested in downloading this report, you can do so here.
Over the last two years, the major cloud data warehouses have been in a near-tie for performance. Redshift and BigQuery have both evolved their user experience to be more similar to Snowflake. The market is converging around two key principles: separation of compute and storage, and flat-rate pricing that can "spike" to handle intermittent workloads.
Fivetran is a data pipeline that syncs data from apps, databases and file stores into our customers’ data warehouses. The question we get asked most often is, “What data warehouse should I choose?” In order to better answer this question, we’ve performed a benchmark comparing the speed and cost of four of the most popular data warehouses:
- Amazon Redshift
- Google BigQuery
Benchmarks are all about making choices: What kind of data will I use? How much? What kind of queries? How you make these choices matters a lot: Change the shape of your data or the structure of your queries and the fastest warehouse can become the slowest. We’ve tried to make these choices in a way that represents a typical Fivetran user, so that the results will be useful to the kind of company that uses Fivetran.
A typical Fivetran user might sync Salesforce, JIRA, Marketo, Adwords and their production Oracle database into a data warehouse. These data sources aren’t that large: A typical source will contain tens to hundreds of gigabytes. They are complex: They contain hundreds of tables in a normalized schema, and our customers write complex SQL queries to summarize this data.
The source code for this benchmark is available at https://github.com/fivetran/benchmark.
What Data Did We Query?
We generated the TPC-DS  data set at 1TB scale. TPC-DS has 24 tables in a snowflake schema; the tables represent web, catalog and store sales of an imaginary retailer. The largest fact table had 4 billion rows .
What Queries Did We Run?
We ran 99 TPC-DS queries  in Feb.-Sept. of 2020. These queries are complex: They have lots of joins, aggregations and subqueries. We ran each query only once, to prevent the warehouse from caching previous results.
How Did We Configure the Warehouses?
We set up each warehouse in a small and large configuration for the 100GB and 1TB scales: