Cloud Data Warehouse Benchmark 2020: Redshift, Snowflake, Presto and BigQuery

The Fivetran data warehousing benchmark compares price, performance and differentiated features for Redshift, Snowflake, Presto and BigQuery
September 8, 2020

If you're interested in downloading this report, you can do so here.

Over the last two years, the major cloud data warehouses have been in a near-tie for performance. Redshift and BigQuery have both evolved their user experience to be more similar to Snowflake. The market is converging around two key principles: separation of compute and storage, and flat-rate pricing that can "spike" to handle intermittent workloads.

Fivetran is a data pipeline that syncs data from apps, databases and file stores into our customers’ data warehouses. The question we get asked most often is, “What data warehouse should I choose?” In order to better answer this question, we’ve performed a benchmark comparing the speed and cost of four of the most popular data warehouses:

  • Amazon Redshift
  • Snowflake
  • Presto
  • Google BigQuery

Benchmarks are all about making choices: What kind of data will I use? How much? What kind of queries? How you make these choices matters a lot: Change the shape of your data or the structure of your queries and the fastest warehouse can become the slowest. We’ve tried to make these choices in a way that represents a typical Fivetran user, so that the results will be useful to the kind of company that uses Fivetran.

A typical Fivetran user might sync Salesforce, JIRA, Marketo, Adwords and their production Oracle database into a data warehouse. These data sources aren’t that large: A typical source will contain tens to hundreds of gigabytes. They are complex: They contain hundreds of tables in a normalized schema, and our customers write complex SQL queries to summarize this data.

The source code for this benchmark is available at https://github.com/fivetran/benchmark.

What data did we query?

We generated the TPC-DS [1] data set at 1TB scale. TPC-DS has 24 tables in a snowflake schema; the tables represent web, catalog and store sales of an imaginary retailer. The largest fact table had 4 billion rows [2].

What queries did we run?

We ran 99 TPC-DS queries [3] in Feb.-Sept. of 2020. These queries are complex: They have lots of joins, aggregations and subqueries. We ran each query only once, to prevent the warehouse from caching previous results.

How did we configure the warehouses?

We set up each warehouse in a small and large configuration for the 100GB and 1TB scales:

ConfigurationCost / Hour [4]Redshift5x ra3.4xlarge$16.30Snowflake [5]Large$16.00Presto [6]4x n2-highmem-32$8.02BigQuery [7]Flat-rate 600 slots$16.44

How did we tune the warehouses?

These data warehouses each offer advanced features like sort keys, clustering keys and date partitioning. We chose not to use any of these features in this benchmark [7]. We did apply column compression encodings in Redshift; Snowflake and BigQuery apply compression automatically; Presto used ORC files in HDFS, which is a compressed format

2020 Data Warehouse Benchmark

Compare Redshift, Snowflake, Presto, BigQuery

DOWNLOAD THE REPORT

2020 Data Warehouse Benchmark

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

Cloud Data Warehouse Benchmark 2020: Redshift, Snowflake, Presto and BigQuery

Cloud Data Warehouse Benchmark 2020: Redshift, Snowflake, Presto and BigQuery

September 8, 2020
September 8, 2020
Cloud Data Warehouse Benchmark 2020: Redshift, Snowflake, Presto and BigQuery
The Fivetran data warehousing benchmark compares price, performance and differentiated features for Redshift, Snowflake, Presto and BigQuery

If you're interested in downloading this report, you can do so here.

Over the last two years, the major cloud data warehouses have been in a near-tie for performance. Redshift and BigQuery have both evolved their user experience to be more similar to Snowflake. The market is converging around two key principles: separation of compute and storage, and flat-rate pricing that can "spike" to handle intermittent workloads.

Fivetran is a data pipeline that syncs data from apps, databases and file stores into our customers’ data warehouses. The question we get asked most often is, “What data warehouse should I choose?” In order to better answer this question, we’ve performed a benchmark comparing the speed and cost of four of the most popular data warehouses:

  • Amazon Redshift
  • Snowflake
  • Presto
  • Google BigQuery

Benchmarks are all about making choices: What kind of data will I use? How much? What kind of queries? How you make these choices matters a lot: Change the shape of your data or the structure of your queries and the fastest warehouse can become the slowest. We’ve tried to make these choices in a way that represents a typical Fivetran user, so that the results will be useful to the kind of company that uses Fivetran.

A typical Fivetran user might sync Salesforce, JIRA, Marketo, Adwords and their production Oracle database into a data warehouse. These data sources aren’t that large: A typical source will contain tens to hundreds of gigabytes. They are complex: They contain hundreds of tables in a normalized schema, and our customers write complex SQL queries to summarize this data.

The source code for this benchmark is available at https://github.com/fivetran/benchmark.

What data did we query?

We generated the TPC-DS [1] data set at 1TB scale. TPC-DS has 24 tables in a snowflake schema; the tables represent web, catalog and store sales of an imaginary retailer. The largest fact table had 4 billion rows [2].

What queries did we run?

We ran 99 TPC-DS queries [3] in Feb.-Sept. of 2020. These queries are complex: They have lots of joins, aggregations and subqueries. We ran each query only once, to prevent the warehouse from caching previous results.

How did we configure the warehouses?

We set up each warehouse in a small and large configuration for the 100GB and 1TB scales:

ConfigurationCost / Hour [4]Redshift5x ra3.4xlarge$16.30Snowflake [5]Large$16.00Presto [6]4x n2-highmem-32$8.02BigQuery [7]Flat-rate 600 slots$16.44

How did we tune the warehouses?

These data warehouses each offer advanced features like sort keys, clustering keys and date partitioning. We chose not to use any of these features in this benchmark [7]. We did apply column compression encodings in Redshift; Snowflake and BigQuery apply compression automatically; Presto used ORC files in HDFS, which is a compressed format

2020 Data Warehouse Benchmark

Compare Redshift, Snowflake, Presto, BigQuery

DOWNLOAD THE REPORT

2020 Data Warehouse Benchmark
Topics
No items found.
Share

Articles associés

No items found.
No items found.
No items found.

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.