Use Fivetran to sync any unsupported source

Learn how Fivetran cloud function connectors can pull data when there is no prebuilt connector.
July 14, 2020

Fivetran is the industry leader in fully managed data integration from diverse sources to your data warehouse. We have over 150 prebuilt connectors that can deliver data from popular sources like Zendesk, NetSuite and Salesforce to your warehouse within minutes.

However, many organizations struggle to get data from obscure sources. In fact, this is a dilemma for nearly every company that uses data correctly. There often won't be a prebuilt connector for every esoteric source you have, no matter which integration vendor you use.

This also happens to us, the internal data analytics team at Fivetran! We get to use all of the shiny connectors Fivetran builds, but even then, we sometimes need to pull data from an unsupported source. So what do we do?

There is a very easy answer here. We use the Fivetran Google cloud function connector to pull data from any source – yes, any, even if Fivetran does not have a native prebuilt connector for it. How do we do it? Often with less than 100 lines of Python code.

Let me explain how you should think about data integration tools. There are three fundamental pieces: data reader, core processor and data writer:

  • Data reader. This is the piece that talks to the source API you are trying to get the data from.
  • Core processor. This is the piece that takes the output of the data reader and makes sense of it. It figures out what is an update versus what is an insert, it recognizes what is a new column or a new table, etc.
  • Data writer. This last piece takes the output from the core processor and loads it to the final destination – your data warehouse.

For a well-architected integration tool, the core processor and the data writer are agnostic to the data reader. They don’t really care what the source of the data is, as long as it is passed to them in a specific format. This is exactly where the Fivetran function connector comes in. You can write a short piece of code for the data reader piece, and connect it to the existing core processor and data writer that Fivetran uses behind the scenes.

Here is a real-life example. We needed to pull data from the HR tool Namely, and Fivetran does not have a native connector for it. We literally wrote it in less than 100 lines of code. You can find the data reader code for it below, and the example output format that it produces, which is needed for the core processor to just “take it from there."

Code: https://gist.github.com/gareginordyan/5240efcfacb175bb47192d109ef542e7

Output snippet:

That’s all there is to it! You just host the Python function you wrote in your Google Cloud environment (or Lambda in AWS, or Azure Functions in Microsoft Azure) and then point the relevant Fivetran connector to it. By this time, your first coffee of the day is probably empty. Go get another cup and your data will be waiting for you in your warehouse before you have a chance to finish it.

For more detailed instructions about setting up and configuring cloud functions, check out our previous posts on the subject, or look at the functions section in our docs.

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

Use Fivetran to sync any unsupported source

Use Fivetran to sync any unsupported source

July 14, 2020
July 14, 2020
Use Fivetran to sync any unsupported source
Learn how Fivetran cloud function connectors can pull data when there is no prebuilt connector.

Fivetran is the industry leader in fully managed data integration from diverse sources to your data warehouse. We have over 150 prebuilt connectors that can deliver data from popular sources like Zendesk, NetSuite and Salesforce to your warehouse within minutes.

However, many organizations struggle to get data from obscure sources. In fact, this is a dilemma for nearly every company that uses data correctly. There often won't be a prebuilt connector for every esoteric source you have, no matter which integration vendor you use.

This also happens to us, the internal data analytics team at Fivetran! We get to use all of the shiny connectors Fivetran builds, but even then, we sometimes need to pull data from an unsupported source. So what do we do?

There is a very easy answer here. We use the Fivetran Google cloud function connector to pull data from any source – yes, any, even if Fivetran does not have a native prebuilt connector for it. How do we do it? Often with less than 100 lines of Python code.

Let me explain how you should think about data integration tools. There are three fundamental pieces: data reader, core processor and data writer:

  • Data reader. This is the piece that talks to the source API you are trying to get the data from.
  • Core processor. This is the piece that takes the output of the data reader and makes sense of it. It figures out what is an update versus what is an insert, it recognizes what is a new column or a new table, etc.
  • Data writer. This last piece takes the output from the core processor and loads it to the final destination – your data warehouse.

For a well-architected integration tool, the core processor and the data writer are agnostic to the data reader. They don’t really care what the source of the data is, as long as it is passed to them in a specific format. This is exactly where the Fivetran function connector comes in. You can write a short piece of code for the data reader piece, and connect it to the existing core processor and data writer that Fivetran uses behind the scenes.

Here is a real-life example. We needed to pull data from the HR tool Namely, and Fivetran does not have a native connector for it. We literally wrote it in less than 100 lines of code. You can find the data reader code for it below, and the example output format that it produces, which is needed for the core processor to just “take it from there."

Code: https://gist.github.com/gareginordyan/5240efcfacb175bb47192d109ef542e7

Output snippet:

That’s all there is to it! You just host the Python function you wrote in your Google Cloud environment (or Lambda in AWS, or Azure Functions in Microsoft Azure) and then point the relevant Fivetran connector to it. By this time, your first coffee of the day is probably empty. Go get another cup and your data will be waiting for you in your warehouse before you have a chance to finish it.

For more detailed instructions about setting up and configuring cloud functions, check out our previous posts on the subject, or look at the functions section in our docs.

Articles associés

No items found.
No items found.
How CIOs can drive AI success with a strong data foundation
Blog

How CIOs can drive AI success with a strong data foundation

Lire l’article
How to use Fivetran and Databricks to move data and innovate
Blog

How to use Fivetran and Databricks to move data and innovate

Lire l’article
Implementing a data fabric: From silos to insights
Blog

Implementing a data fabric: From silos to insights

Lire l’article

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.