The signature benefit of Fivetran is automated data movement. As cloud-based applications and databases proliferate, Fivetran removes the need to build data pipelines connecting these data sources to destinations by hand, instead allowing users to integrate data by traversing a few menus and pushing a few buttons.
Our longstanding partnership with Databricks combines automated data integration with the power of the data lakehouse architecture. As the complexity of an organization’s data needs grows, data lakehouses make eminent sense as destinations for any datasets, combining the cost-effectiveness and scalability of data lakes with ACID- compliance, governance, concurrency support, and other benefits of data warehouses.
What a serverless data lakehouse brings to the table
A serverless architecture is a logical next step to augmenting the capabilities of the data lakehouse, offering instant and elastic compute, lower costs and less technical overhead as resources scale up and down automatically with usage.
These attributes synergize extremely well with automated data integration, delivering a versatile, user- and budget-friendly package that provides the flexibility to send metadata to data catalogs, choose cloud-based, private cloud or on-premises deployments, control data and processing residency, choose cloud providers and more.
As a platform, Databricks also offers many services that complement the capabilities of the data lakehouse. These services include support for machine learning notebooks, SQL-based transformations and data visualizations, allowing you to easily construct front-to-back data integration using just Fivetran and Databricks. Conveniently, these features are all accessible through the Databricks Workspace UI.
Setting up Fivetran and Databricks
Starting the process consists of navigating a succession of menus, whether through Databricks Partner Connect or Fivetran directly.
From Databricks Partner Connect
If you already have a Databricks account but no Fivetran account:
- Start in the Databricks Workspace UI and select Partner Connect in the lower left.
- Select the appropriate button for Fivetran to initiate a trial sign-up. You may adjust your compute size as needed. This will pass your ID and SQL endpoint to Fivetran.
- You will be brought to Fivetran for a trial signup (or log in if you are an existing user).
- From the Fivetran UI, you will be able to see and access your Databricks destination. Go to it.
From Fivetran
Alternatively, if you already have both a Fivetran account and a Databricks account, you can set up your destination from the Fivetran interface. Add your Databricks destination; you will need to supply credentials.
Whether you began from the Databricks or Fivetran interface, you are now prepared to configure your connectors and make your initial sync. You will need credentials for each source. Choose what columns and tables to block or hash in accordance with your business needs and the regulatory demands of your industry. You will need to set a schedule and then you can begin your initial sync.
Querying, modeling and visualizing through Databricks
Once data from your initial sync is available, you will be able to access it through the Databricks Workspace UI. The Catalog Explorer will allow you to view and explore the contents of databases, schemas and tables just as you would through any database GUI.
From the SQL Editor, you can query, explore and analyze data. Like any SQL-based platform, you can use aggregate functions to create metrics. You can save queries or create new tables as needed. These products will form the basis for the visualizations and dashboards you will create shortly.
From Dashboards, you can create dashboards and add visualizations by selecting queries. Much like other BI platforms, you have numerous charts and parameters to choose from.
To see this process in more detail, consider watching our hands-on lab on the topic.