Every organization is using GenAI now or will be soon. Your success with GenAI depends on how effectively, efficiently and securely your organization’s unique datasets can be used with foundational models and in GenAI apps.

[CTA_MODULE]

In this post, I’ll walk you through how Fivetran accelerates building GenAI applications for customer service. I’ll demonstrate the process using Google Cloud with BigQuery, Google Cloud Storage and Vertex AI. If you aren’t familiar with Fivetran, it’s a fully automated and fully managed data movement platform that supports delivering high-quality, usable, trusted data for any data workload in BigQuery. Also, the integration with BigQuery and Vertex AI is special, and I’ll also run through how they work together.

End-to-end high-level architecture and data flow: Fivetran and Google Cloud

Vertex AI, part of Google Cloud, includes a studio experience, GenAI APIs, foundational models and several ways to quickly build applications. My focus will be setting up with Fivetran a variety of connectors to sources that house my customer service data. In any enterprise, that data is typically scattered across multiple platforms, including operational databases, file systems and applications like Jira, Zendesk and Slack.

The Gen AI application flow including data sources, Fivetran, BigQuery, and Vertex AI

Using Fivetran, I’ll move that data into Google BigQuery and then quickly prototype a simple GenAI search app and chat app in Vertex AI. However, to build those GenAI apps, I need contextual, focused data specific to my organization. This is where Fivetran's automated data platform comes in.

Fivetran allows you to centralize data in a fully automated, fully managed way while modernizing your data infrastructure, achieving greater data self-service and building differentiating data solutions like GenAI apps. Everything in this post was achieved with zero code and zero maintenance.

You can check out this Gen AI approach with Fivetran, BigQuery and Vertex AI in the following video.

So let's get our datasets flowing into BigQuery with Fivetran and start experimenting with GenAI.

Adding a new source connector with Fivetran

I have multiple data sources already flowing into Google BigQuery. You see SAP ERP, Workday, SQL Server, GA4 and S3, among others.

These are the current sources that Fivetran is moving into Google BigQuery

Fivetran comes with over 500 source connectors out of the box, including support for PostgreSQL hosted on all major clouds. One of my customer support data sets lives on Google Cloud PostgreSQL.

The Fivetran engineering, product and technical documentation teams do a fantastic job laying out the steps to get data flowing quickly into BigQuery. If you’d like to take a look outside of the Fivetran UI, read this: Google Cloud PostgreSQL Setup Guide.

Google Cloud PostgreSQL Setup Guide in Fivetran docs

Additionally, for each source and destination, documentation includes details on version support, configuration support, feature support, any limitations, a sync overview, schema information, type transformations and much more. Here is the PostgreSQL source detail page.

Importantly, in the Fivetran UI, any source setup pages are framed on the right by the setup guide in the right gray navigation. It’s the fast path to ensuring you understand the options to connect quickly to any source with Fivetran.

I talk a lot about predictability, standardization and optionality. I can choose my schema prefix name, and I don't have to create any schemas or tables ahead of time in BigQuery.

I’ll use the following prefix for this connector:

pg_bg_cs3

Google Cloud PostgreSQL setup page in Fivetran

From there, Fivetran needs credentials to authenticate to the PostgreSQL database including hostname, user name, password info and the database I want to access in that PostgreSQL instance, which is the industry database.

Fivetran handles the initial sync and incremental change data capture (CDC) automatically. Once the initial historical sync is complete, Fivetran performs incremental updates of any new or modified data from the PostgreSQL source database. I can choose WAL, XMIN or Fivetran Teleport (log-free change detection) to perform incremental updates.

I also have multiple connection options for this PostgreSQL database. Those options include SSH, reverse SSH, VPN, Google Private Service Connect and a proxy agent.

I'm going to connect directly to this database source. Just like you have optionality with connection methods, you also have multiple options for change detection in the database. For Postgres, I can choose WAL, XMIN or Fivetran teleport sync, depending on my use case and organizational requirements.

I’m going with Teleport sync for today. It's a log-free change detection mechanism. From there, Fivetran runs connection tests to the database. Fivetran encrypts all data in motion and uses TLS, allowing for both encryption and authentication of the connections that Fivetran makes to the PostgreSQL database. Any data that sits ephemerally in the Fivetran service is also encrypted with AES 256.

Selecting the customer service interactions dataset to use with Google BigQuery and Vertex AI

Fivetran then connects to and fetches the schemas, tables and columns you can access in the Postgres SQL database. For this industry database, I have 13 schemas available. I can sync everything in the database for this connector or selectively determine my dataset, which I want to do today.

I just need the interactions table in the customer service schema. I will block all the other schemas and tables from moving into BigQuery.

Schema: customer_service

Table: interactions

Dataset selection (interactions) along with schema, table and column blocking (plus column hashing)

If I hover over any columns in the tables, I will be presented with an option for hashing at the column level. This allows for additional data privacy and anonymization on any PII data that I don’t want to move from my database source to BigQuery. Importantly, Fivetran’s hashing still enables that column to be used in my downstream GenAI application workflow.

Managing source changes and schema drift

Fivetran then needs to know how I want to handle incremental changes since the database will change in the future.

Fivetran handles any and all schema drift as part of the incremental change detection automation

I will “Allow all,” but I have many options here. Any and all DML and DDL changes are automatically captured by Fivetran and delivered to BigQuery—no coding is required. I can determine the polling frequency to the PostgreSQL source for the change detection and subsequent movement to BigQuery.

Starting the initial sync from PostgreSQL to Google BigQuery

Once I have those selections saved and tested, I'm ready to start syncing my customer service interactions dataset from Postgres to BigQuery. Remember that not only will Fivetran move the interactions dataset we just selected during the initial historical sync, but CDC is automatically set up so that if there are any subsequent changes to that table or those columns, those changes will be persisted to BigQuery. No code is required, and the incremental changes are captured at your preferred schedule.

That’s it - I’m ready to start syncing my PostgreSQL interactions dataset to BigQuery

This was a small dataset, just a single table, and the initial sync was completed very quickly.

The initial sync was extremely fast for this dataset

Once Fivetran has moved the data and completed error checking, Fivetran doesn’t store any data. A cursor is maintained at the sync point for the next incremental sync to capture changes.

For incremental syncs, Fivetran defaults to every six hours, but I can change that to an interval of one minute up to 24 hours, depending on my use case and the data freshness requirement for the downstream data product.

Fivetran SaaS connectors support incremental syncs from every 1 minute up to every 24 hours

My PostgreSQL connector (pg_bg_cs3) is now in the list with all other connectors, and I have access to all those data sources now in BigQuery.

All connectors are active and persisting changes to any and all of these datasets to BigQuery

Check out the new dataset in BigQuery

You can see the other existing datasets in BigQuery that I can work from. There's my PostgreSQL schema as well, and there is the table that Fivetran moved over, the customer service interactions dataset that I wanted. The key is that Fivetran provides a faithful one-to-one representation of the interactions dataset source to BigQuery, and that data is ready for a BigQuery data workload.

Fivetran’s automated data movement platform provides a Gen-AI ready dataset in BigQuery

Adding a Slack dataset for messaging data

Before I create a GenAI app in Vertex AI, I want to show you how easy it is to build a complete set of focused, contextual customer service data for GenAI apps. I’ll quickly create several other datasets in BigQuery by connecting to other sources.

Your AI apps will only be as good as your data, and for a production GenAI application, you'll need some additional contextual data sources.

Today, I will go with Slack, Zendesk and Jira. The Fivetran application connectors connect via an API, and to connect Slack to Fivetran to consolidate Slack information into BigQuery, you simply need access to an active Slack account.

I want to give a big shout-out to Angel Hernandez at Fivetran. He provided me with a Slack demo account and routinely delivers incredible sources and destinations for demonstrations.

I simply provide the schema name and authorize the API for the Slack connector, and Fivetran does the final connection test. That's it. I'll start the initial sync.

Adding a Jira connector for issue-tracking data

Fivetran supports Jira both on premises and in the cloud. How Fivetran connects to Jira depends on your Jira installation. For Jira cloud, Fivetran needs my Jira hostname, port and a connecting user with relevant permissions.

I can use either OAuth or basic authentication. There's also an application link configured in Jira that links the apps together. I’ll also need a consumer key and public key in the Jira form; in this case, that Jira step was done for me.

Lastly, I get to choose the issue sync mode. I can sync from all projects or select projects. Here, I'll choose a couple of sample projects.

Also, Fivetran has Quickstart data models available for the Jira connector, and I will ask Fivetran to automatically build those models for me in Google BigQuery using dbt.

Fivetran Quickstart transformations for the Jira connector

Adding one more connector - Zendeck Support for support interactions

Another dataset I'd like to add to BigQuery is my customer support information in Zendesk. Fivetran has multiple Zendesk connectors and setup is predictably very fast and very easy.

I only need my Zendesk support domain name and a Zendesk account with an administrator role. Fivetran has many out-of-the-box data models, and just like with Jira, the Quickstart data models are available for Zendesk, providing me with analytics-ready data in BigQuery.

I don't even need my own dbt project or any third-party tools to get this kicked off. Fivetran takes care of everything for me in the background.

All right, I feel really good about my customer service data set now. I will jump back out to BigQuery and take a quick look at the dataset which is now centralized and ready to go.

Fivetran and BigQuery working together give me a modern, self-service approach to building Gen AI apps on those contextual data sets. If you'd like to use the same Bitext customer support training interactions data set that I have in Postgres, you can find it in GitHub here. At this point, there are 20+ sources flowing into BigQuery, and you’ve seen how to set up four key sources.

Fivetran’s automated data movement platform persists a wide range of data sources to BigQuery

Some quick Vertex AI highlights

I will move to the right side of the architecture and start using Vertex AI and BigQuery to build a couple of simple Gen AI apps. I'll start with a search app, and then I'll set up a chat app as well. They'll both be focused on customer support.

Building chat and search apps with Vertex AI

Vertex AI enables you to build generative AI apps quickly. I also have a range of models to choose from, and Vertex AI automatically trains, tests and tunes predictive models for me within a single platform. This provides significant development acceleration, ultimately reducing training time, time to value and cost.

Some incredible capabilities are built into Vertex AI, but I’m only going to touch on a few of them today.

Building a GenAI search app with BigQuery and Vertex AI

I’ll continue my Google Gen AI journey at Vertex Search and Conversation. I've built a few apps, but I want to add a new one. I have four application type choices: search, chat, recommendations and generative playbook. Let’s test out a couple of those.

Vertex AI Search and Conversation app options

The declarative approach with Vertex AI search and conversation is good for someone like me who is not an ML or data engineer. I want to start with a new search app, and the configuration step is straightforward.

I simply name the app, and provide a company name and location.

Vertex AI Search App configuration and setup

From there, I'll let Vertex AI know which data store I want to use as the foundation for my app.

I've got some existing data stores available, but I actually want to access the customer service interactions dataset that I consolidated into BigQuery earlier with Fivetran.

Selecting a data store for the Vertex AI Search App

I will search for my schema and then let Vertex AI know the type of data I'm using. In this case, it's going to be structured BigQuery tables.

Let Vertex AI know the type of data that you want to use from BigQuery

I need a unique name for my data store. I select it. Vertex AI will process that information, and I’ll be off to the races.

Building a GenAI chat app with BigQuery and Vertex AI

While my search app is building, I'm going to create a Gen AI chat app as well.

Creating a chat app with Vertex AI follows the same process I just walked you through with a search app. This time though, Vertex AI will guide me through a conversational agent configuration.

Once I’m finished with the conversational chat app configuration, I choose my dataset. I’m creating a new data store from another version of the customer service interaction data that is in Google Cloud Storage in unstructured PDF format.

I want to note that Fivetran can move any type of structured or semi-structured data into BigQuery including relational tables, CSV, JSON, XML, Parquet, Avro, you name it. This PDF, though, I dropped into GCS separately along with some of the semi-structured docs I'd moved into GCS using Fivetran.

Take a look at the GenAI search app in Vertex AI

This was the structured form of the interactions data set. I'm going to jump into the search preview page and run a couple of simple searches. I've done nothing other than use the standard configuration page.

If I want to format my search results, it's easy to do and Vertex AI provides some nice controls in the widget tab.

Testing the GenAI chat app in Vertex AI

The model for my customer service chat app is still being trained, so I will use the CS Chat app I built earlier. It's the same data set - the customer service interactions data set.

For chat, Vertex takes me into the Dialogflow CX palette, and I can test my agent out. It's a relatively small dataset, but remember, it's focused and contextual which is key.

I will run some tests in the simulator, and then I’ll add the agent to a simulated website in CodePen. As you might expect, I'll ask some typical customer service questions. I want to know about shipping options, order cancellation and order changes and see how the Vertex AI chat app responds to each question.

The model was trained quickly, and the app was built exceptionally fast. I like the responses that I’m getting so far.

It's very, very cool. I didn't have to do any additional configuration or setup, input any sample answers, test answers, questions or anything like that.

This is all happening in real time, and certainly, everything can be tuned further within Vertex AI. Okay, I'm happy with that app and I want to publish it now. Vertex AI auto-generates the code I need to add the agent to my website.

Adding the Vertex AI chat app to my CodePen website simulator

Since I don't have a website, I will use CodePen. Here’s the link to Codepen to check it out if you haven’t used it before. It's an online community for testing and showcasing simple apps and your own code snippets.

I'm going to name this pen Customer_Service_Chat. Then, all I've got to do is paste in the code that Vertex generated for me. I should see my agent ready to go, as indicated by the blue bubble popup in the corner. I also expect the behavior to be very similar to what I saw in the Vertex simulator.

My tests included a quick greeting to be friendly and then questions about shipping options and a couple of other things like we did previously directly in the Vertex simulator.

Overall, I'm really happy with the chat app. I’m impressed with the ease of integrating the data set, the build, the integration, the performance and the fact that I did not need a team of data engineers and ML engineers for 8 or 10 or 12 sprints or more.

While I’m here…a few more Vertex AI chat app capabilities

Vertex AI’s agent settings for this chat app are interesting.

In the Generative AI tab, you can see that I'm using Gemini Pro, but I have my choice of models.

If you want your chat app to respond to audio prompts, it is very quick to set up a phone gateway by picking an available phone number. When I tested the gateway, the chat app responses were almost identical to the responses provided in the web app.

If you need more data enrichment and transformation

I want to momentarily circle back to Fivetran Transformations. Fivetran provides seamless integration with dbt, including more than 20 Quickstart data models for two of the connectors I set up today, Jira and Zendesk.

Fivetran Quickstarts allow you to automatically produce analytics-ready tables using prebuilt data models. Data is transformed with no code and no additional dbt projects or any third-party tools required. You can also connect to your own dbt Core project with dozens of additional dbt packages that Fivetran has developed and made available.

Transformations include integrated scheduling. They automatically trigger model runs following the completion of Fivetran connector syncs, and you can check out the wide range of connectors that support Quickstart data models, dbt Core and now dbt Cloud. So you have three options for a range of transformation requirements.

Get started now

Building search and chat Gen AI apps with Google BigQuery and Vertex AI was simple and fast. Fivetran ensures that any and all data movement to Google Cloud is standardized and automated across any data source with each fully automated and fully managed pipeline providing reliability, scalability, predictability, security and context for Gen AI apps.

If you want to give Fivetran a spin for a Google Cloud Gen AI use case or any other data workload, Fivetran makes it easy with a 14-day free trial.

I would love to hear from you about any connectors, data workloads or use cases you’d like to see profiled next. Take care!

About the author

Kelly Kohlleffel leads the Fivetran Global Partner Sales Engineering organization, working with a broad ecosystem of technology partners and consulting services partners on modern data product and solution approaches. He also hosts the Fivetran Data Drip podcast where some of the brightest minds across the data community talk about their data journey, vision, and challenges. Before Fivetran, he spent time at Hashmap and NTT DATA (data solution and service consulting), Hortonworks (in Hadoop-land), and Oracle. You can connect with Kelly on LinkedIn or follow him on Twitter.

[CTA_MODULE]

Data insights

Accelerate GenAI apps with Fivetran Google Cloud BQ and Vertex AI

April 2, 2024

Kelly Kohlleffel

Senior Global Director, Partner Sales Engineering

Fivetran

Anchor Link

Kelly Kohlleffel

Senior Global Director, Partner Sales Engineering

Fivetran

Topics

generative AI

machine learning

Fivetran and Google Cloud synergize incredibly well, bringing generative AI to your fingertips.

[CTA_MODULE]

You can check out this Gen AI approach with Fivetran, BigQuery and Vertex AI in the following video.

So let's get our datasets flowing into BigQuery with Fivetran and start experimenting with GenAI.

Adding a new source connector with Fivetran

I have multiple data sources already flowing into Google BigQuery. You see SAP ERP, Workday, SQL Server, GA4 and S3, among others.

Fivetran comes with over 500 source connectors out of the box, including support for PostgreSQL hosted on all major clouds. One of my customer support data sets lives on Google Cloud PostgreSQL.

I talk a lot about predictability, standardization and optionality. I can choose my schema prefix name, and I don't have to create any schemas or tables ahead of time in BigQuery.

I’ll use the following prefix for this connector:

pg_bg_cs3

I also have multiple connection options for this PostgreSQL database. Those options include SSH, reverse SSH, VPN, Google Private Service Connect and a proxy agent.

Selecting the customer service interactions dataset to use with Google BigQuery and Vertex AI

I just need the interactions table in the customer service schema. I will block all the other schemas and tables from moving into BigQuery.

Schema: customer_service

Table: interactions

Managing source changes and schema drift

Fivetran then needs to know how I want to handle incremental changes since the database will change in the future.

Starting the initial sync from PostgreSQL to Google BigQuery

This was a small dataset, just a single table, and the initial sync was completed very quickly.

Once Fivetran has moved the data and completed error checking, Fivetran doesn’t store any data. A cursor is maintained at the sync point for the next incremental sync to capture changes.

My PostgreSQL connector (pg_bg_cs3) is now in the list with all other connectors, and I have access to all those data sources now in BigQuery.

Check out the new dataset in BigQuery

Adding a Slack dataset for messaging data

Your AI apps will only be as good as your data, and for a production GenAI application, you'll need some additional contextual data sources.

I want to give a big shout-out to Angel Hernandez at Fivetran. He provided me with a Slack demo account and routinely delivers incredible sources and destinations for demonstrations.

I simply provide the schema name and authorize the API for the Slack connector, and Fivetran does the final connection test. That's it. I'll start the initial sync.

Adding a Jira connector for issue-tracking data

Lastly, I get to choose the issue sync mode. I can sync from all projects or select projects. Here, I'll choose a couple of sample projects.

Also, Fivetran has Quickstart data models available for the Jira connector, and I will ask Fivetran to automatically build those models for me in Google BigQuery using dbt.

Adding one more connector - Zendeck Support for support interactions

Another dataset I'd like to add to BigQuery is my customer support information in Zendesk. Fivetran has multiple Zendesk connectors and setup is predictably very fast and very easy.

I don't even need my own dbt project or any third-party tools to get this kicked off. Fivetran takes care of everything for me in the background.

All right, I feel really good about my customer service data set now. I will jump back out to BigQuery and take a quick look at the dataset which is now centralized and ready to go.

Some quick Vertex AI highlights

Some incredible capabilities are built into Vertex AI, but I’m only going to touch on a few of them today.

Building a GenAI search app with BigQuery and Vertex AI

I simply name the app, and provide a company name and location.

From there, I'll let Vertex AI know which data store I want to use as the foundation for my app.

I've got some existing data stores available, but I actually want to access the customer service interactions dataset that I consolidated into BigQuery earlier with Fivetran.

I will search for my schema and then let Vertex AI know the type of data I'm using. In this case, it's going to be structured BigQuery tables.

I need a unique name for my data store. I select it. Vertex AI will process that information, and I’ll be off to the races.

Building a GenAI chat app with BigQuery and Vertex AI

While my search app is building, I'm going to create a Gen AI chat app as well.

Creating a chat app with Vertex AI follows the same process I just walked you through with a search app. This time though, Vertex AI will guide me through a conversational agent configuration.

Take a look at the GenAI search app in Vertex AI

If I want to format my search results, it's easy to do and Vertex AI provides some nice controls in the widget tab.

Testing the GenAI chat app in Vertex AI

The model for my customer service chat app is still being trained, so I will use the CS Chat app I built earlier. It's the same data set - the customer service interactions data set.

For chat, Vertex takes me into the Dialogflow CX palette, and I can test my agent out. It's a relatively small dataset, but remember, it's focused and contextual which is key.

The model was trained quickly, and the app was built exceptionally fast. I like the responses that I’m getting so far.

It's very, very cool. I didn't have to do any additional configuration or setup, input any sample answers, test answers, questions or anything like that.

Adding the Vertex AI chat app to my CodePen website simulator

My tests included a quick greeting to be friendly and then questions about shipping options and a couple of other things like we did previously directly in the Vertex simulator.