ETL vs. ELT: Why a post-load process wins every time

Historically, the ETL process (Extract-Transform-Load) made the most sense for data transformation because the cost of computation storage and bandwidth were both high – and transforming your data before loading it into the data warehouse would reduce both. However, in the last decade, cloud data warehouses, such as Snowflake, Amazon Redshift, and Google BigQuery, have become ubiquitous, driving down storage costs and increasing processing power exponentially.

This has made it much less of a concern to store raw data in the data warehouse – and opened up the potential for a different way of transforming data post-load, rather than pre-load. Known as ELT (Extract-Load-Transform), this post-load data transformation process has a number of advantages over traditional ETL.

[CTA_MODULE]

1. Faster transformation times

In one recent survey, data professionals reported spending an average of 45 percent of their time on getting data ready (loaded and cleaned) before they could use it to develop models and visualizations. Not only is this a huge time and resource demand on an organization’s data team’s engineers – it also means it’s taking longer to realize value from the data.

ELT can help companies become more efficient with data processes and reduce the time it takes to transform data by transforming data in a cloud data warehouse post-load. Some of the efficiencies gained include:

Faster load times: Because ETL uses a processing server and secondary system, it takes longer to load the data into the destination system. With ELT the load process is direct from your data sources to the destination system.

Faster transformation times: In ETL, transformation time is typically slower and depends on the size of the data set. A large data set can take a long time to transform. In ELT, transformation is faster, especially for large data sets, because data is loaded directly into a destination system and transformed in-parallel.

Shorter turnaround timelines and less engineer demand: Because analysts can perform transformations within the data warehouse environment without needing to rely on data engineers, it shortens the turnaround times for all analytics projects and allows for speedier delivery of insights.

2. Perpetual access to raw data

A key difference between ETL and ELT is where the raw data is stored. In ETL, the raw data remains stored in the production system from which it originated. The raw data copied from this production source is then transformed from a normalized schema to a dimensional schema.

Thus, the data being loaded into the destination system is no longer the same as the raw data initially copied from the production system. In ELT, however, the raw data from the production system is copied and loaded directly into the destination system before transformation. This is important for a number of reasons:

Provides an auditable source of truth: Having the raw data stored in the destination system gives you an auditable source of truth. Sometimes it can be impossible to re-load the original raw data from the production system because it no longer exists.

Reduces time to mine data for insights: You can mine the raw data for additional insights as needs come up without having to reload it every time.

Eliminates need to re-source data: In some instances, if a transformation is found inadequate, the raw data will need to be re-transformed. In ETL the raw data source must first be re-loaded (assuming it’s still available) to the secondary system and then re-transformed. In ELT, the raw data is stored in the destination system, so it does not need to be reloaded and can be transformed again immediately, saving significant time.

3. Greater flexibility

One aspect of data analytics is that there is often a need to mine the same data source for different purposes. However, in ETL, when your query needs change, you will need to rebuild your ETL pipelines. This can be costly, time-consuming and will require data engineering expertise.

With ELT, because the raw data is already loaded into the destination system, a data analyst can create the queries in real-time without engineering resources. This saves you re-engineering time and cost and gives you more flexibility to query the raw data as many times as you want.

Moreover, with the ELT process, data pipelines can be automated. This not only eliminates data engineering time in building custom pipelines but also in maintaining them. And it allows the entire process – from extraction to load to transformation – to be done by a data analyst rather than requiring an engineer.

ELT is the future of data

ELT simplifies data integration, results in lower failure rates, allows for flexible scaling and moves the transformation process to the warehouse, where you can apply such skills as SQL to achieve data transformation. All of this is to say that ELT is a simpler, faster and more affordable data pipeline process — and the better option for the vast majority of organizations looking to get more valuable and actionable insights from their data faster.

[CTA_MODULE]

1. Faster transformation times

Faster load times: Because ETL uses a processing server and secondary system, it takes longer to load the data into the destination system. With ELT the load process is direct from your data sources to the destination system.

Faster transformation times: In ETL, transformation time is typically slower and depends on the size of the data set. A large data set can take a long time to transform. In ELT, transformation is faster, especially for large data sets, because data is loaded directly into a destination system and transformed in-parallel.

Shorter turnaround timelines and less engineer demand: Because analysts can perform transformations within the data warehouse environment without needing to rely on data engineers, it shortens the turnaround times for all analytics projects and allows for speedier delivery of insights.

2. Perpetual access to raw data

Provides an auditable source of truth: Having the raw data stored in the destination system gives you an auditable source of truth. Sometimes it can be impossible to re-load the original raw data from the production system because it no longer exists.

Reduces time to mine data for insights: You can mine the raw data for additional insights as needs come up without having to reload it every time.

Eliminates need to re-source data: In some instances, if a transformation is found inadequate, the raw data will need to be re-transformed. In ETL the raw data source must first be re-loaded (assuming it’s still available) to the secondary system and then re-transformed. In ELT, the raw data is stored in the destination system, so it does not need to be reloaded and can be transformed again immediately, saving significant time.

3. Greater flexibility

ELT is the future of data

[CTA_MODULE]