Ever since generative AI burst into the public consciousness with the release of ChatGPT, we at Fivetran have sought to use data integration to make generative AI practical. Retrieval-augmented generation (RAG), the core architecture that supports most current commercial implementations of generative AI, is a potent but accessible way to turn proprietary data into useful, intelligent products.
A RAG architecture requires two kinds of data integration. The first is the core Fivetran functionality – moving data from sources like applications and operational systems to data warehouses and data lakes. The second is turning data into embeddings that can be read by a vector database.
Without RAG, large language models still serve well as general-purpose productivity aids, working much like a search engine. With RAG, organizations have an opportunity to augment a foundation model with proprietary data from text-rich systems such as CRMs and customer support applications. These systems can be applied to a variety of both internal- and external-facing use cases, assisting staff and customers alike. Concretely, such applications tend to take the form of chatbots.
The architecture
Our original concept was to create an external sales chatbot for prospective customers. We ultimately chose to create an internal chat tool instead for two reasons. The first reason was that, as the proof of concept continued to show greater promise, we realized we could improve productivity across the organization by optimizing sales and support workflows, decreasing operational costs for customer support and maintaining a knowledge base that our entire organization could access through natural language. The second reason we opted not to create an external-facing product was to rule out the possibility of inadvertently exposing sensitive, customer-related data.
The architecture diagram below illustrates a generic RAG architecture. From left to right, data moves from a source through a pipeline to a destination. The destination is part of a broader cloud ecosystem with native AI/ML capabilities to support generative AI, but more on that later!
[CTA_MODULE]
Data integration – Source, pipeline and destination
Since our goal was to build a knowledge base, we extracted and loaded data from every text-rich source closely involved with our product and operations. We used our own connectors for the following sources:
- Zendesk – All tickets
- Slab – Knowledge base documents
- Slack – Public conversations
- Height – Tickets
- Salesforce – Conversations with prospects
- Google Drive – Additional knowledge base documents and other documentation
We also referred to our public documentation and a selection of documents and slideshows from Google Docs. Extracting and loading unstructured data of this kind is not (yet) supported through Fivetran and involves some finagling with custom connectors.
Once the data is loaded to its destination, it is denormalized into text-rich tables using dbt transformations. This process runs continuously and our connector currently updates every six hours. In principle, new data created in any of the sources listed above is accessible to the chatbot within six hours.
Retrieval model – Cloud AI suite, chat API and foundation model
Although we trialed a number of off-the-shelf generative AI services, we found that vendors generally lacked data pipelining, transformation and other essential capabilities. We would have to assemble our own retrieval model. Luckily, there were many off-the-shelf tools and technologies we could connect to each other.
One major advantage of modern destinations such as Databricks and Snowflake is that they are bundled with an entire ecosystem featuring native AI tools. Tools like Snowflake Cortex and Databricks Mosaic AI support both knowledge graphs and vector databases in their backends. For the purposes of this project, we found that the knowledge graph, despite on paper offering more stringent, deterministic answers, did not produce obviously better responses than the vector database while being slower and more computationally intensive.
Native AI services like Cortex and Mosaic often obviate the need to build separate embedding or vectorization pipelines. Instead, you select tables and columns in your destination through the tool and it will automatically embed them. You can also label each table and column with metadata as needed.
The retrieval application is a simple, single-node Python-based Slack application. We anticipated that our users would get the most mileage and convenience from interacting with our chatbot through Slack. Our foundation model is GPT-4o, although most AI services support any number of common industry standards.
Overall, the engineering effort required to set up our chatbot was relatively modest, consisting largely of writing middleware between the various tools and technologies involved. The bulk of the technical effort went to testing and iterating through the responses produced by the chatbot. This required indexing and curating the data properly as well as tuning prompts and retrieval methods.
This engineering effort is ongoing. We encourage users to leave feedback, which allows us to debug the search results, explore fine tunes, etc.
How people use it
Internally, FivetranChat has largely taken the place of ChatGPT or Google. Instead of hoping that an external product has crawled publically accessible Fivetran documents, everyone in our organization can ask questions about our product, operations and company policies and expect accurate answers.
The following is a sampling of FivetranChat questions:
- “How does table re-sync work?”
- “Does Fivetran create new schemas in Snowflake for every connector?”
- “What version of Java SDK should I install?”
- “When is the deadline to book flights for Camp Fivetran (our company retreat)?”
- “Can I use sick leave for PTO?”
Questions concerning the product’s functionality are the most prevalent, usually asked by technical, customer-facing members of our organization, such as sales engineers, solution architects and customer support engineers. Other questions, usually about errors encountered during development, are most commonly asked by software engineers.
Since the initial rollout of FivetranChat in July 2024, both unique users and questions per month have grown considerably. As of November 2024, about one third of the company’s overall headcount actively participates in the dedicated Slack channel, averaging nearly three hundred queries per day. User satisfaction hovers around 85% – not perfect, but more than usable, especially at a cost of $0.10 per question. More importantly, FivetranChat makes it increasingly unnecessary to manually sift through documents or ask for another person’s time.
The future of RAG and Fivetran
As a proof of concept, FivetranChat conclusively demonstrated the following:
- By leveraging AI/ML services native to a cloud data platform and its ecosystem, you can set up RAG data architecture without a separate vector database.
- Similarly, by using off-the-shelf technologies, you can minimize much of your software engineering burden. Instead, aside from relatively simple middleware to ensure that different pieces of the architecture can communicate with each other, most of the technical effort of building your RAG product goes toward testing and validating results.
- Unstructured data isn’t the only data that matters; text-rich fields from structured data also play an important role in RAG.
- Your staff, especially customer-facing people, likely have many routine questions about the complex goods and services your organization offers. RAG implementations that answer these questions can be relatively simple yet extremely practical.
In the process of creating FivetranChat, our team also developed tools and technologies that we intend to turn into public-facing products, including connectors for unstructured data and Quickstart data models for RAG.
Most importantly, FivetranChat is now a full-fledged internal tool with extensive, routine internal usage that produces considerable value. As things currently stand, many efforts at generative AI never reach production.
What FivetranChat demonstrates, above all, is that we do not have to accept the outcomes shown above by Bain. Building a practical, useful generative AI product can be extremely straightforward. As new products are created, practical generative AI will only become more and more accessible for all use cases.
[CTA_MODULE]