The case for using structured and semi-structured data in generative AI

Generative AI doesn’t just work with unstructured data.
August 8, 2024

Most discussions of generative AI presume that the data it will learn from and generate is unstructured, consisting of raw text, images, video, audio and all kinds of media. To train or augment a generative AI model, unstructured data is converted into numerical representations called vectors with a huge number of dimensions, which are then used to determine similarity between entities of all kinds. Vectors are then stored in special databases called vector databases.

One common way to use generative AI is as an answer machine akin to a search engine. The foundation models that we are accustomed to interacting with are often trained on huge corpuses of raw text or other media scraped from across the public web. Likewise, the first instinct of many organizations in terms of using generative AI is to build a retrieval-augmented generation (RAG) chatbot to consolidate its knowledge base, answer questions about its operations that would otherwise be difficult or annoying to answer and serve as a general purpose productivity aid.

[CTA_MODULE]

Unstructuring structured and semi-structured data

Much of the data containing details about an organization’s operations is produced by SaaS applications and operational databases. Such data is typically stored in a structured, tabular format, not as media files. What if you want to ask your chatbot about specific details of your business operations?

Tables produced from your SaaS applications and databases may still contain rich textual information that can be used to power chatbots and other products that depend on knowledge about your organization’s operations. This data is semi-structured, being stored in a tabular format but containing free-form text.

Extracting text from text-rich tables

The existence of free-form text in your tables gives you an opportunity to turn structured data into unstructured data. You can directly extract and vectorize the contents of text-rich fields from tables.

Depending on the additional fields that are available, you could also concatenate them together to build text-rich fields, and then extract and vectorize the contents. You could even use this opportunity to add additional context based on your knowledge of your particular business domain.

In short, while structured, normalized data is extremely useful in conventional analytics, it can be adapted to serve the needs of generative AI as well. It requires a transformation that removes, rather than imposes, structure.

This is great news not only because Fivetran is primarily a provider of structured data integration but also because most of the practical, analysis-ready data that companies will generate through their activities will remain structured for the foreseeable future.

In addition, many providers of modern data platforms offer tools that neatly and seamlessly integrate into a common ecosystem. For a detailed (yet relatively easy) walkthrough of how to augment a large language model with structured data, check out this piece, based on a successful hands-on lab we have hosted at conferences and trade shows.

Another approach to generative AI and structured data

For reporting, reviewing metrics and analytical queries more generally, however, the approach outlined above may not be especially appropriate. That doesn’t mean that generative AI has no role to play, however. Traversing the vast volumes of numerical and categorical data regarding your customers and prospects, team performance, business outcomes and more using a business intelligence platform or SQL can be a painful exercise for analysts and stakeholders. There is a strong case for the ability to interact with this data using natural language.

This is where text-to-SQL (or even text-to-Python and other languages) comes into play. Business intelligence platforms increasingly leverage generative AI to convert natural language into queries or scripts, which can then be used to produce charts, tables and metrics as needed for reporting.

Whether the data is structured or unstructured, Generative AI creates many opportunities to relate to data in completely unprecedented ways. To experience for yourself how Fivetran can help you jumpstart your GenAI initiatives, consider a demo. If you would like to speak with our CTO’s office about your GenAI use case, please get in touch with us.

[CTA_MODULE]

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

The case for using structured and semi-structured data in generative AI

The case for using structured and semi-structured data in generative AI

August 8, 2024
August 8, 2024
The case for using structured and semi-structured data in generative AI
Generative AI doesn’t just work with unstructured data.

Most discussions of generative AI presume that the data it will learn from and generate is unstructured, consisting of raw text, images, video, audio and all kinds of media. To train or augment a generative AI model, unstructured data is converted into numerical representations called vectors with a huge number of dimensions, which are then used to determine similarity between entities of all kinds. Vectors are then stored in special databases called vector databases.

One common way to use generative AI is as an answer machine akin to a search engine. The foundation models that we are accustomed to interacting with are often trained on huge corpuses of raw text or other media scraped from across the public web. Likewise, the first instinct of many organizations in terms of using generative AI is to build a retrieval-augmented generation (RAG) chatbot to consolidate its knowledge base, answer questions about its operations that would otherwise be difficult or annoying to answer and serve as a general purpose productivity aid.

[CTA_MODULE]

Unstructuring structured and semi-structured data

Much of the data containing details about an organization’s operations is produced by SaaS applications and operational databases. Such data is typically stored in a structured, tabular format, not as media files. What if you want to ask your chatbot about specific details of your business operations?

Tables produced from your SaaS applications and databases may still contain rich textual information that can be used to power chatbots and other products that depend on knowledge about your organization’s operations. This data is semi-structured, being stored in a tabular format but containing free-form text.

Extracting text from text-rich tables

The existence of free-form text in your tables gives you an opportunity to turn structured data into unstructured data. You can directly extract and vectorize the contents of text-rich fields from tables.

Depending on the additional fields that are available, you could also concatenate them together to build text-rich fields, and then extract and vectorize the contents. You could even use this opportunity to add additional context based on your knowledge of your particular business domain.

In short, while structured, normalized data is extremely useful in conventional analytics, it can be adapted to serve the needs of generative AI as well. It requires a transformation that removes, rather than imposes, structure.

This is great news not only because Fivetran is primarily a provider of structured data integration but also because most of the practical, analysis-ready data that companies will generate through their activities will remain structured for the foreseeable future.

In addition, many providers of modern data platforms offer tools that neatly and seamlessly integrate into a common ecosystem. For a detailed (yet relatively easy) walkthrough of how to augment a large language model with structured data, check out this piece, based on a successful hands-on lab we have hosted at conferences and trade shows.

Another approach to generative AI and structured data

For reporting, reviewing metrics and analytical queries more generally, however, the approach outlined above may not be especially appropriate. That doesn’t mean that generative AI has no role to play, however. Traversing the vast volumes of numerical and categorical data regarding your customers and prospects, team performance, business outcomes and more using a business intelligence platform or SQL can be a painful exercise for analysts and stakeholders. There is a strong case for the ability to interact with this data using natural language.

This is where text-to-SQL (or even text-to-Python and other languages) comes into play. Business intelligence platforms increasingly leverage generative AI to convert natural language into queries or scripts, which can then be used to produce charts, tables and metrics as needed for reporting.

Whether the data is structured or unstructured, Generative AI creates many opportunities to relate to data in completely unprecedented ways. To experience for yourself how Fivetran can help you jumpstart your GenAI initiatives, consider a demo. If you would like to speak with our CTO’s office about your GenAI use case, please get in touch with us.

[CTA_MODULE]

Want to learn more? We have a primer on generative AI for you.
Read it
Want to learn more? We have a primer on generative AI for you.
Read it

Articles associés

Assemblage d’une architecture RAG à l’aide de Fivetran
Product

Assemblage d’une architecture RAG à l’aide de Fivetran

Lire l’article
Build your own RAG-based GenAI application in 30 minutes
Product

Build your own RAG-based GenAI application in 30 minutes

Lire l’article
How to build a data foundation for generative AI
Data insights

How to build a data foundation for generative AI

Lire l’article
Lyra Health leads mental health innovation with data lakes & AI
Blog

Lyra Health leads mental health innovation with data lakes & AI

Lire l’article
FivetranChat: A homebrewed generative AI story
Blog

FivetranChat: A homebrewed generative AI story

Lire l’article
How CIOs can drive AI success with a strong data foundation
Blog

How CIOs can drive AI success with a strong data foundation

Lire l’article
Lyra Health leads mental health innovation with data lakes & AI
Blog

Lyra Health leads mental health innovation with data lakes & AI

Lire l’article
FivetranChat: A homebrewed generative AI story
Blog

FivetranChat: A homebrewed generative AI story

Lire l’article
Why RAG is the most accessible path to commercial AI
Blog

Why RAG is the most accessible path to commercial AI

Lire l’article

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.