Most discussions of generative AI presume that the data it will learn from and generate is unstructured, consisting of raw text, images, video, audio and all kinds of media. To train or augment a generative AI model, unstructured data is converted into numerical representations called vectors with a huge number of dimensions, which are then used to determine similarity between entities of all kinds. Vectors are then stored in special databases called vector databases.
One common way to use generative AI is as an answer machine akin to a search engine. The foundation models that we are accustomed to interacting with are often trained on huge corpuses of raw text or other media scraped from across the public web. Likewise, the first instinct of many organizations in terms of using generative AI is to build a retrieval-augmented generation (RAG) chatbot to consolidate its knowledge base, answer questions about its operations that would otherwise be difficult or annoying to answer and serve as a general purpose productivity aid.
[CTA_MODULE]
Unstructuring structured and semi-structured data
Much of the data containing details about an organization’s operations is produced by SaaS applications and operational databases. Such data is typically stored in a structured, tabular format, not as media files. What if you want to ask your chatbot about specific details of your business operations?
Tables produced from your SaaS applications and databases may still contain rich textual information that can be used to power chatbots and other products that depend on knowledge about your organization’s operations. This data is semi-structured, being stored in a tabular format but containing free-form text.
Extracting text from text-rich tables
The existence of free-form text in your tables gives you an opportunity to turn structured data into unstructured data. You can directly extract and vectorize the contents of text-rich fields from tables.
Depending on the additional fields that are available, you could also concatenate them together to build text-rich fields, and then extract and vectorize the contents. You could even use this opportunity to add additional context based on your knowledge of your particular business domain.
In short, while structured, normalized data is extremely useful in conventional analytics, it can be adapted to serve the needs of generative AI as well. It requires a transformation that removes, rather than imposes, structure.
This is great news not only because Fivetran is primarily a provider of structured data integration but also because most of the practical, analysis-ready data that companies will generate through their activities will remain structured for the foreseeable future.
In addition, many providers of modern data platforms offer tools that neatly and seamlessly integrate into a common ecosystem. For a detailed (yet relatively easy) walkthrough of how to augment a large language model with structured data, check out this piece, based on a successful hands-on lab we have hosted at conferences and trade shows.
Another approach to generative AI and structured data
For reporting, reviewing metrics and analytical queries more generally, however, the approach outlined above may not be especially appropriate. That doesn’t mean that generative AI has no role to play, however. Traversing the vast volumes of numerical and categorical data regarding your customers and prospects, team performance, business outcomes and more using a business intelligence platform or SQL can be a painful exercise for analysts and stakeholders. There is a strong case for the ability to interact with this data using natural language.
This is where text-to-SQL (or even text-to-Python and other languages) comes into play. Business intelligence platforms increasingly leverage generative AI to convert natural language into queries or scripts, which can then be used to produce charts, tables and metrics as needed for reporting.
Whether the data is structured or unstructured, Generative AI creates many opportunities to relate to data in completely unprecedented ways. To experience for yourself how Fivetran can help you jumpstart your GenAI initiatives, consider a demo. If you would like to speak with our CTO’s office about your GenAI use case, please get in touch with us.
[CTA_MODULE]