Assembling a RAG architecture using Fivetran

[CTA_MODULE]

Artificial intelligence models are built on solid foundations of data. The most challenging part of building a viable AI model is accessing the large volumes of data necessary to train AI, feeding it to the model and ensuring that it stays up to date.

Fivetran enables data integration from a source to a data repository with a simple workflow that consists of navigating a series of menus. A simple start to Fivetran consists of no more than a few steps:

Sign up for Fivetran
Set up your first connector using credentials to a data source, choosing from one of more than 500 common SaaS, database, event streaming and file storage platforms
Set up and assign a destination, either a data lake or data warehouse
Set a schedule and begin an initial sync

This workflow quickly makes data available in a data lake or warehouse, where schemas from disparate sources can be combined, analyzed and modeled as necessary for all downstream purposes, including generative AI.

Building a generative AI model from scratch can cost many millions of dollars in infrastructure and expertise as well as months of processing time. A more practical option for most organizations is retrieval-augmented generation (RAG). RAG consists of supplementing an existing generative AI model – called a foundation model – with data that provides additional facts and context, producing outputs that are more factual and relevant to an organization’s needs. Famous foundation models include DALL-E and the GPT series by OpenAI, Midjourney by Midjourney, LLama by Meta and Claude by Anthropic.

There are many different kinds of foundation models specializing in different media or use cases. Among foundation models, large language models (LLMs), which are trained on massive corpora of text, are particularly promising for general-purpose use because they can mimic human language and comprehension. Pairing RAG with an LLM enables organizations to leverage the power of off-the-shelf generative AI for their proprietary data, saving money and time.

Generative AI depends on unstructured data in the form of all kinds of media, such as text, images, audio, video and more. Currently, Fivetran does not directly support the integration of raw media files. However, a number of Fivetran data sources include large bodies of text within structured data. These include:

Ada
Drift
Github
Intercom
Slack
Salesforce
Zendesk

These sources, normally difficult to access behind complex APIs and schemas, are readily supplied through Fivetran in a RAG architecture for generative AI.

RAG architecture for generative AI

The following architecture demonstrates how data is supplied for RAG. Using Fivetran, a team begins by extracting and loading data from sources to a central data repository.

There are several benefits of having this intermediate repository.

It puts all of your data in one place, allowing you to govern, observe and model it, before moving it into a vector database where it can no longer be transformed
This platform can form the backbone for other analytics uses, such as conventional reporting, dashboards and business intelligence
Most importantly, it adds modularity to your architecture. There are many reasons you may need to change vector embeddings and chunking strategies; a staging area allows you to do so without having to resynchronize all the data from your sources

For AI applications, governed data lakes are the best type of repository. They are most capable of handling both storage at very large scales and unstructured or semistructured data.

Once data lands in a data lake or data warehouse, it must undergo additional processing to be made useful for generative AI. One approach is to catalog and assign semantic, real-world meaning to data, constructing a referenceable knowledge graph that illustrates the relationships between entities, events and concepts.

‍

The other, more pertinent approach is to embed raw data – text, images, video and other media – as numerical representations called vectors in a vector database. Vectorized data can be attached to a request made by a user, called a prompt, and sent to a foundation model as additional context, providing more accurate and relevant answers.

Putting it all together

Suppose you have some text from an application that you want to use to augment an automated helpdesk chatbot. If we were to follow the RAG architecture we just described, the first leg of this workflow is quite simple; set up a connector from your source to your destination using Fivetran.

The next legs of the process involve some engineering work. You will need to accomplish the following steps:

Build a pipeline to extract, transform and load the relevant data from your data repository to the vector database. You can do this once, persist the results and perform the sync again after enough of the underlying data has changed.
Set up a user interface that can accept prompts and combine them with the relevant context from the vector database. Some notable examples of vector databases include ChromaDB, Pinecone and Weaviate.
Have the retrieval model send the augmented prompts to the foundation model and generate a response.

With the help of common languages, libraries and APIs for foundation models, it can be fairly straightforward to put together a simple application like a chatbot that leverages your unique, proprietary data. Ask your engineers, analysts and product professionals for ideas. There are many promising possibilities to explore just from combining your text-rich application data with foundation models.

[CTA_MODULE]

RAG architecture for generative AI

The following architecture demonstrates how data is supplied for RAG. Using Fivetran, a team begins by extracting and loading data from sources to a central data repository.

There are several benefits of having this intermediate repository.

It puts all of your data in one place, allowing you to govern, observe and model it, before moving it into a vector database where it can no longer be transformed

This platform can form the backbone for other analytics uses, such as conventional reporting, dashboards and business intelligence

Most importantly, it adds modularity to your architecture. There are many reasons you may need to change vector embeddings and chunking strategies; a staging area allows you to do so without having to resynchronize all the data from your sources

For AI applications, governed data lakes are the best type of repository. They are most capable of handling both storage at very large scales and unstructured or semistructured data.

‍

Putting it all together

The next legs of the process involve some engineering work. You will need to accomplish the following steps:

Build a pipeline to extract, transform and load the relevant data from your data repository to the vector database. You can do this once, persist the results and perform the sync again after enough of the underlying data has changed.

Set up a user interface that can accept prompts and combine them with the relevant context from the vector database. Some notable examples of vector databases include ChromaDB, Pinecone and Weaviate.

Have the retrieval model send the augmented prompts to the foundation model and generate a response.

[CTA_MODULE]