Learn
Learn

How to load data from Salesforce to BigQuery: The ultimate guide

How to load data from Salesforce to BigQuery: The ultimate guide

March 13, 2023
March 13, 2023
How to load data from Salesforce to BigQuery: The ultimate guide
In this blog post, we'll walk you through step-by-step instructions for setting up an automated pipeline for transferring data from Salesforce into Google's managed cloud solution: BigQuery.

Do you have data in Salesforce that you want to use and analyze in Bigquery? loading your data from Salesforce into BigQuery is a great way to centralize, store and analyze the valuable insights within your organization. It can be difficult to know how exactly how to do this, so we're here to guide you through the process of moving your information from one platform to another with ease. In this blog post, we'll walk you through step-by-step instructions for setting up an automated pipeline for transferring data from Salesforce into Google's managed cloud solution: BigQuery. So let's get started!

[CTA_MODULE]

What is Salesforce? 

Salesforce is a dynamic customer relationship management (CRM) platform that helps businesses streamline customer interactions and build relationships in real time. With Salesforce, organizations are able to store customer data securely and use it to build powerful sales automation tools that make the process of managing client information significantly more efficient. It also allows companies to customize campaigns to target specific customer segments, giving them the capability to exceed their customer’s expectations. 

What is Google BigQuery? 

Google BigQuery is a fully managed Cloud Data Warehouse, provided as a SAAS offering by Google. Not only can BigQuery ingest/store mammoth amounts of data, it can process petabyte-scale queries quickly to give you meaningful insights in a few seconds. Google BigQuery uses a column-based architecture to store all forms of data. So if your data is not already in the column based format, it needs to be converted to it. Columnar storage results in better storage utilization, faster scanning and querying of data, as compared to row based storage. 

BigQuery is well suited for OLAP (Online Analytical Processing), and provides easy manageability with built-in AI (Artificial Intelligence) & ML (Machine Learning) tools. BigQuery infrastructure uses massively parallel computing and a highly scalable secure storage engine, making it one of the world's most popular Data Warehouses.

Methods to load data from Salesforce to Bigquery

With Salesforce helping you generate reams of valuable data on leads, sales, and much more, it can be difficult to keep everything organized in one place. One way to deal with this is to collect the data in a central data warehouse such as bigquery. Having all your data collected together in this way allows you to use it strategically to maximize profits through forecasting sales trends and optimally allocating marketing and other expenses. To get your Salesforce data into bigquery, there are two methods available: manual or automated. With the manual approach, you can write up scripts for ETL (extract-transform-load) operations that handle this task and allow for secure integration between applications. Using an automated alternative such as Fivetran means fewer clicks and faster transfer time. Let's learn more about these methods.

Method 1: Salesforce to Bigquery using Fivetran tool 

Fivetran is an automated data movement platform, which can move data from and into various data sources. Data collected from your various sources like sensors, call logs, CRMs, ERPs, websites etc. can be transferred to a data lake or a data warehouse, to be processed and analyzed for competitive advantage and continued profits. Fivetran is a Cloud-based ETL tool that can pre-process and transform your data, to make it readily consumable by your analytical applications. 

Imagine a scenario where Salesforce holds your sales/customers/campaigns data, while there are other software/applications that hold your accounting/production/inventory/website data. To integrate all these data sources and to run analytics over all your data, you would need a platform like BigQuery to create a data warehouse which will hold your data in various formats, and allow point and click analytics. Fivetran can easily connect to your multiple data sources like Salesforce/databases, and pipeline the data into your BigQuery destination automatically, on its own. 

Fivetran provides an inbuilt connector that allows you to easily pull all your underlying data from Salesforce, into a SQL/Warehousing-enabled environment like BigQuery, to further allow you to run advanced analytics on your CRM data. You can find the Fivetrans Salesforce-BigQuery connector here:

https://console.cloud.google.com/marketplace/product/fivetran-bq-dts/salesforce 

This connector will effortlessly replicate your Salesforce data into BigQuery, continuously update it and keep it in sync, run analytics over it to gain useful actionable insights. Once your data is in BigQuery, you can use BigQuery ML to build and operationalize machine learning models, or use Data Studio to create meaningful visualisation, or use BigQuery geospatial to run geospatial analytics etc. Your SF data can easily be combined with data from other sources to create more holistic and in-depth reports. Your engineering resources can concentrate on their core competency tasks, rather than worrying about plumbing and upkeeping the data pipelines. 

Steps to connect Salesforce with Fivetran 

Step 1. Open the UI of Fivetran’s Salesforce-BigQuery connector 

Select Salesforce as the source and BigQuery as your data destination. Click Setup button 

Step 2. Next, specify your desired schema name and authorize your salesforce connection ( Fivetran will use your credentials to access your SF data, time to time.)

After authentication, click "Save and Test" Fivetran will now try to connect to your Salesforce and check if the related APIs are working. 

Step 3. Next, you will need to allow Fivetran to access your BigQuery cluster and create, update and manage tables for your data. 

Specify your BigQuery ProjectID(destination dataset), its access details/URL etc. 

Inside BigQuery, grant permissions to your Fivetran account to access and manage BigQuery tables. 

Next, add a new Principal, with your Fivetran service account  

And then select a role for your Fivetran account specified above, I used the "BigData user" role. 

Step 4. Next step is to go to your Fivetran Dashboard, go to destinations and specify BigQuery as your destination

Click + Add Destination, specify a name and then select BigQuery as the destination type. In the destination setup form, enter the Project ID you found in Step 3 above. 

Enter the Data location and Data processing location, based on your geography BigQuery will provide these locations from where its services will be made available to you. Lastly, specify your Timezone, and Click Save & Test. 

Step 5. Fivetran will now perform the following tests:- 

Connection test - to check your connection to your BigQuery data warehouse, and if it can retrieve a list of the datasets. 

Check Permissions test - to check if we have the required permissions on your data warehouse. 

Bucket Configuration Test - to make sure you have the Storage Object Admin permission on your data bucket if you are using your own data bucket to process the data.  

Step 6. Finally, after these tests are successfully done, you need to select objects to sync and hash. 

Select those Salesforce objects that you want to send to BigQuery here. Congratulations! your BigQuery warehouse is connected to and ready to be used with Fivetran. 

7. Finally, go to the Salesforce connection dashboard 

 

Check if everything looks fine and click "Start initial Sync". 

This will load your first(initial) dataset and then take care of future updations. So we have quickly set up the integration, just with a few clicks. Fivetran does all the heavy lifting, and relieves you from the intricacies of data transfer, data synchronisation, data pre-processing and pipeline maintenance. Next, let's discuss some powerful and versatile features of Fivetran.

[CTA_MODULE]

Salesforce to Bigquery: Advantages of using Fivetran

Fivetran's SF connector uses as lowest as possible API calls, making it fast and resource efficient. The connector uses an intelligent mix of Bulk SF API and REST API, while syncing your data, thereby increasing throughput. 

Since your SF account has a limited rolling 24-hour Bulk API quota, if the connector sees that 90% of the Bulk PAI quota is consumed, it shifts to using the REST API quota. If the SF REST API quota consumption also reaches 90%, then the syncs are automatically rescheduled by 1 hour, ensuring that there is always some SF API quota available for your other business critical applications.

So, you don't need to worry about your other processes choking over depletion of the day's API quota. There are many more capabilities of the Fivetran SF connector which are very useful, some important ones are listed below.  

Fivetran does one-to-one mapping of your Salesforce objects with your destination objects. Except for the computed/derived fields in SF, Fivetran maps your Salesforce objects in such a manner that the semantics within the data are maintained and usage of this data is eased. Below is a scale down ERD based depiction of the process. 

Note: Not all relationships are modelled here for the sake of brevity, but you can easily appreciate how Fivetran intelligently relates the objects to one another.  

@Source:https://docs.google.com/presentation/d/1sMmOXdKDkxJtRZa_ybyXtyrq22wE6phNo9EZ52RfV5Y/edit#slide=id.g3cb9b617d1_0_237 

  • Fivetran handles your computed formula fields on its own. 

The formula fields in Salesforce(SF), which get computed/derived based on current values in other SF fields, do not lend themselves well for incremental/linear synchronization with other fields. Fivetran automatically detects such formula fields and keeps them excluded from the tables synced to your destination. 

Fivetran translates the SF formulas to equivalent SQL formulas via its inbuilt formula translation capability, and applies downstream transformations to such fields, to keep them in sync with the other data. 

This ensures that these fields are replicated correctly in your destination, BigQuery in this case.  

  • Fivetran syncs your updated Salesforce data easily 

As you use Salesforce, your data inside SF will change and needs to be updated/synced with the new and changed data. Fivetran does incremental syncing for you, and updates only new or modified data, thereby keeping the update process lean and efficient. Fivetran maintains an internal set of progress cursors, which is used to track the exact point where our last successful sync left off. 

There are certain timestamps within SF, e.g. 

SystemModStamp/LastModifiedDate/CreatedDate/LoginTime etc., which Fivetran uses intelligently to detect which rows have changed since the last sync, and then runs incremental syncs. Still, if you wish to override this behavior, then you just have to make the "updateable"/"replicable" property of your SF table(s) to FALSE. In the above case, Fivetran will automatically re-import the whole tables marked with false 

  • Fivetran can easily sync between your AWS PrivateLink and your Salesforce Private Connect If you have a Salesforce Private Connect license to use AWS PrivateLink, Fivetran can move your data from Salesforce to an AWS destination. 
  • Fivetran offers Transparency and fine grained Audit 

Fivetran clearly shows what it did/does with your data, when the syncing updates were/would be running, rows processed/deleted, last time each table updated, any current issues you may be experiencing etc. 

To enable this level of audit and transparency, Fivetran adds extra columns to every table added to your destination, to give you insights into the progress/changes of your integration. e.g. for tables that do not have a primary key for each row, Fivetran adds fivetran_index (INTEGER) and fivetran_id (TEXT) to show the order of updates and avoid duplicates, respectively. 

The fivetran_audit table lets you see when each of your tables was updated, and any issues that were encountered. If there are multiple warnings on a single table, Fivetran creates a fivetran_audit_warning and lets you view all of them as a stacktrace/event chain, for easy detection and solving of issues.  You can also activate "history_mode" for a few/all of your tables, Fivetran will add some more columns to these tables like fivetran_start/fivetran_end/fivetran_active, for additional tracking. 

System tables inside Fivetran, like  FIVETRAN_QUERY/FIVETRAN_FORMULA_MODEL/FIVETRAN_FORMULA_HISTORY/FIVETRAN_API_CALL, provide still more transparency on how Fivetran is working on your data. The above transparency and audit features allow you to make better use of Fivetran and tune your SF data to work well with Fivetran. 

  • Plethora of connectors available in Fivetran. 

Fivetran provides more than 300 connectors for many different Databases/Marketing Analytics/Sales Analytics/Product Analytics/Finance & Ops Analytics/Engineering Analytics/SAAS offerings etc. So, whatever be your data stack, there is probably an inbuilt connector in Fivetran. If you have data residing in many different silos/data housing solutions, Fivetran could be your one stop solution for seamlessly integrating all of them. 

[CTA_MODULE]

Method 2: Salesforce to Bigquery ( manual integration )

You can export all your Salesforce data as CSV/Avro files and then import these files into BigQuery to set up Google BigQuery Salesforce Integration. 

Step 1 - Inside your Salesforce data explorer panel, choose the table(s) that you want to export. 

Step 2 - Select Export to Cloud Storage and choose the export format as CSV. 

Also choose the compression type as GZIP (GNU Zip) or accept the default value.  

Download this file. 

Step 3 - Inside your BigQuery, go to the Data Explorer section, and choose "import" and then select "Batch Ingestion".  

Select the file type to be CSV/Avro, remember, Avro file ingest can load data much faster to BigQuery as compared to CSV files. In the UI, you can enable schema auto-detection or specify it explicitly. After specifying details like the dataset and table name, click "Import". Though this method is simple, it can become tedious if you have many tables/files to import. Also, if you have multiple data sources with constantly changing data, this method is practically impossible to implement. 

Method 3: Salesforce to Bigquery by writing custom code 

Both Salesforce and BigQuery provide versatile and powerful APIs for data import and export. You can write your own code, in your preferred language like Java or Python, and achieve the above functionality. This method gives the maximum flexibility and power, as you have fine grained control over each data transfer operation. But, this method requires programming resources and a lot of time to get working.  Moreover, as your data sources and sinks evolve/change, you will need to maintain and update your programs as well. 

Salesforce to Bigquery: Disadvantages in writing custom code

  • You need to have multiple programming resources that understand both Salesforce and BigQuery internals. 
  • All intricacies of the two platforms will need to be addressed. 
  • You will also need a test and verification team to ensure that all the desired data has reached BigQuery and resides there in a format that lends itself to be amicably  consumed by the analytical processes. 
  • Your custom code will need to be frequently updated if there are any changes to your business processes OR to the underlying platforms. 
  • You will need to write multiple programs to transfer your data from disparate sources to BigQuery. ( Whereas Fivetran already has connectors for most popular data sources). 
  • Your budget spent on programming and testing would far exceed your investment in Fivetran, in just a matter of months. 

Conclusion

While there are multiple methods of transferring data from Salesforce to BigQuery, your focus should be on implementing the right analytics to get meaningful insights into your business processes and data. Your engineering resources should be focussing on your core tasks of analyzing the data and creating easy to comprehend visualizations, and not on creating data inflows/outflows. 

It's better not to waste a lot of resources and effort in plumbing and maintaining the data pipelines, which should rather be delegated to modern tools created specifically to address these challenges. Fivetran is a well tested tool coming from experienced developers, it is constantly evolving and changing for the better, to address current and future challenges. 

[CTA_MODULE]

Connect Salesforce to any data warehouse
Free Trial

Related posts

No items found.
How to load data from salesforce to snowflake: definitive guide 
Blog

How to load data from salesforce to snowflake: definitive guide 

Read post
How to load data from Google analytics to Bigquery
Blog

How to load data from Google analytics to Bigquery

Read post
How to load data from Salesforce to Redshift: A definitive guide
Blog

How to load data from Salesforce to Redshift: A definitive guide

Read post
How to load data from salesforce to snowflake: definitive guide 
Blog

How to load data from salesforce to snowflake: definitive guide 

Read post
How to load data from Google analytics to Bigquery
Blog

How to load data from Google analytics to Bigquery

Read post
How to load data from Salesforce to Redshift: A definitive guide
Blog

How to load data from Salesforce to Redshift: A definitive guide

Read post

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.