The AI data quality conundrum

Visionaries from Capgemini, Databricks and Fivetran lay out the data quality imperative for implementing enterprise AI applications.
December 8, 2023

Human history is marked by milestones. The discovery of fire. The Industrial Revolution. The Internet. Each one served to propel mankind forward at an exponential rate, unimaginable by previous generations.

Now, artificial intelligence is redefining everything we thought we knew about the world. Again. 

Fivetran recently hosted a conversation with Databricks and Capgemini to discuss AI and what businesses need to do right now to take advantage of it. 

[CTA_MODULE]

The webinar called “Data Integration for the AI Era: Unleashing the Potential of Data” takes a look at the challenges organizations face when implementing AI and how it can be deployed practically across business use cases.

The panel includes a trio of enterprise IT and AI experts:

  • Mark Van de Wiel, Field Chief Technology Officer at Fivetran
  • Anindita Mahapatra, Lead Solution Architect at Databricks
  • Paul Intrevado, GenAI North America Delivery Lead and San Francisco Data Science Practice Lead at Capgemini

Gen AI runs on quality data

Reliable, timely and high-quality data are essential for any successful generative AI project.

But integrating data from various sources like SaaS applications and on-premises systems across the enterprise frequently leads to “dirty data” — unorganized, inconsistent and containing missing values or errors — that requires extensive cleansing to ensure consistency. 

If you’re expecting high-impact outputs, you need high-quality inputs.

It all begins with sources – applications, databases, event streams and files that provide a footprint of all your operations, including text-rich records such as correspondence, customer feedback and more. Fivetran is a powerful, secure, yet effortless tool for moving and integrating such data from a wide range of sources into a central repository, such as the Lakehouse.

Databricks’ Lakehouse architecture combines data lake and data warehouse features to provide a single source of truth for business intelligence (BI) and AI use cases. The reference architecture blueprint includes stages for data ingestion, transformation, processing and serving — everything needed to convert raw data into actionable insights or the foundation for training AI models. 

Running data through layers of curation helps properly prepare it for use in executing a host of AI and BI use cases like dashboard creation and deep learning.

How generative AI is transforming technology

Improved decision-making and insights 

Generative AI represents a significant shift away from traditional data mining and machine learning practices that involve extensive data preparation. In the past, data analysts spent a considerable amount of time preparing data for modeling, which often became obsolete shortly after implementation. 

With generative AI, it is possible to achieve powerful results by inputting data with far fewer intermediate steps, and the results can be revolutionary.

Reshaping industries 

While many people are (rightfully) concerned about job displacement, many more are bullish on the promise AI holds for transforming life as we know it, similar to how the internet reshaped the world. Finding cures for diseases, addressing climate change and improving education worldwide are a few of the most impactful possibilities enterprises can help achieve with AI. 

For example, AI recently helped researchers identify an existing drug to treat idiopathic multicentric Castleman disease (iMCD), a rare and life-threatening disorder that involves hyperactivation of the body’s immune system, which causes uncontrolled organ dysfunction. 

The patient at the center of the case had run out of treatment options, but AI-powered proteomics (proteins study) solutions helped doctors successfully create a novel application of an existing drug and helped the patient achieve remission.

Data security

It’s crucial to strike a balance between innovation and security to ensure that it is deployed responsibly and with significant human oversight. 

In an era of heightened awareness around data privacy and security, implementing generative AI requires deep discussions around security protocols, excluding sensitive information, role-level access and privacy standards. It’s essential to thoroughly evaluate and control these technologies before implementation.

When AI systems have access to information they otherwise shouldn’t and lack safety and ethics guardrails, the results can be disastrous.

In one example, Capgemini’s Intrevado shared a story about a grocery store in New Zealand that implemented a generative AI solution that recommended poison sandwiches and drinks that produced chlorine gas when combined.

Cost

Many organizations are prioritizing safeguarding their critical data and maintaining on-premises control as part of their AI implementation strategy. However, they also need to strike a practical balance between security and how much it costs to do so.

Databricks’ Mahapatra says that cost reduction is becoming more inherent in the base models themselves and that cost of fine-tuning has gone down in the last few months. Even creating foundational LLM models of their own is no longer out of reach for large enterprises.

In fact, the explosion of open-source LLMs and enhancements in “legacy” LLMs continue to drive costs down, providing the opportunity for enterprises to maximize both control over their data and their expenses while pushing the innovation envelope. 

[CTA_MODULE]

Practical business applications for generative AI

Sci-fi-level transformation may be a ways off, but AI’s practical business applications are here today. 

Machine learning can support multiple lines of business 

One primary use case is spreading machine learning and AI modeling across multiple lines of business. Combining individual lines of business like finance and HR with a data science Center of Excellence can be highly impactful for many organizations.

This hybrid model involves subject matter experts within each business unit who act as liaisons between the business units and the Center of Excellence, effectively bridging the gap between technical knowledge and business needs.

For example, ML can help predict possible scenarios and assess how likely an employee is to leave a company. With the help of data science experts, the finance team can leverage these scenarios to analyze the potential cost of employee churn — in terms of lost productivity and the cost of a replacement — and use those figures to more accurately forecast the impact on budgets and the bottom line. 

Boost productivity

Concerns over potential data breaches compel large enterprises to prioritize internal solutions to enhance employee productivity. Mahapatra says that companies in the insurance industry, for instance, are already using AI to bridge knowledge gaps and to correct performance pitfalls. 

“There's tons and tons of information for a new employee, so there's a significant knowledge gap between a person who's about to retire and a person who has just come into the workforce,” Mahapatra adds. “Gen AI capabilities can augment their (employees’) day-to-day activities with a smart agent that can help them with their job by understanding things like: ‘Only this portion of the document is very relevant, which saves time wading through pages of information. Now, that's significant productivity gains.”

Shifting job functionalities

Gen AI can help offload repetitive or time- and resource-intensive tasks to free human assets for higher-value activities. For example, Databricks offers an assistant feature that can generate and explain code, making coding tasks more efficient.

In call centers, large language models (LLMs) can help customer service by providing context for customer inquiries and automatically delivering answers or intelligently routing incoming requests to the right team and resource. 

Marketing teams in all industries use LLMs for creative writing, where they can enhance text and generate content — including images and videos — for greater relevance and impact. 

Realizing the true value of AI

Generative AI has the potential to revolutionize industries, automate tasks and unlock creativity. However, it is vital to recognize that this powerful technology comes with its own set of implementation challenges, particularly around data privacy, ethics and security. Check out the full webinar for more details.

Schedule your free live Fivetran demo and see how easy it is to centralize data for all your AI and ML implementations.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

The AI data quality conundrum

The AI data quality conundrum

December 8, 2023
December 8, 2023
The AI data quality conundrum
Visionaries from Capgemini, Databricks and Fivetran lay out the data quality imperative for implementing enterprise AI applications.

Human history is marked by milestones. The discovery of fire. The Industrial Revolution. The Internet. Each one served to propel mankind forward at an exponential rate, unimaginable by previous generations.

Now, artificial intelligence is redefining everything we thought we knew about the world. Again. 

Fivetran recently hosted a conversation with Databricks and Capgemini to discuss AI and what businesses need to do right now to take advantage of it. 

[CTA_MODULE]

The webinar called “Data Integration for the AI Era: Unleashing the Potential of Data” takes a look at the challenges organizations face when implementing AI and how it can be deployed practically across business use cases.

The panel includes a trio of enterprise IT and AI experts:

  • Mark Van de Wiel, Field Chief Technology Officer at Fivetran
  • Anindita Mahapatra, Lead Solution Architect at Databricks
  • Paul Intrevado, GenAI North America Delivery Lead and San Francisco Data Science Practice Lead at Capgemini

Gen AI runs on quality data

Reliable, timely and high-quality data are essential for any successful generative AI project.

But integrating data from various sources like SaaS applications and on-premises systems across the enterprise frequently leads to “dirty data” — unorganized, inconsistent and containing missing values or errors — that requires extensive cleansing to ensure consistency. 

If you’re expecting high-impact outputs, you need high-quality inputs.

It all begins with sources – applications, databases, event streams and files that provide a footprint of all your operations, including text-rich records such as correspondence, customer feedback and more. Fivetran is a powerful, secure, yet effortless tool for moving and integrating such data from a wide range of sources into a central repository, such as the Lakehouse.

Databricks’ Lakehouse architecture combines data lake and data warehouse features to provide a single source of truth for business intelligence (BI) and AI use cases. The reference architecture blueprint includes stages for data ingestion, transformation, processing and serving — everything needed to convert raw data into actionable insights or the foundation for training AI models. 

Running data through layers of curation helps properly prepare it for use in executing a host of AI and BI use cases like dashboard creation and deep learning.

How generative AI is transforming technology

Improved decision-making and insights 

Generative AI represents a significant shift away from traditional data mining and machine learning practices that involve extensive data preparation. In the past, data analysts spent a considerable amount of time preparing data for modeling, which often became obsolete shortly after implementation. 

With generative AI, it is possible to achieve powerful results by inputting data with far fewer intermediate steps, and the results can be revolutionary.

Reshaping industries 

While many people are (rightfully) concerned about job displacement, many more are bullish on the promise AI holds for transforming life as we know it, similar to how the internet reshaped the world. Finding cures for diseases, addressing climate change and improving education worldwide are a few of the most impactful possibilities enterprises can help achieve with AI. 

For example, AI recently helped researchers identify an existing drug to treat idiopathic multicentric Castleman disease (iMCD), a rare and life-threatening disorder that involves hyperactivation of the body’s immune system, which causes uncontrolled organ dysfunction. 

The patient at the center of the case had run out of treatment options, but AI-powered proteomics (proteins study) solutions helped doctors successfully create a novel application of an existing drug and helped the patient achieve remission.

Data security

It’s crucial to strike a balance between innovation and security to ensure that it is deployed responsibly and with significant human oversight. 

In an era of heightened awareness around data privacy and security, implementing generative AI requires deep discussions around security protocols, excluding sensitive information, role-level access and privacy standards. It’s essential to thoroughly evaluate and control these technologies before implementation.

When AI systems have access to information they otherwise shouldn’t and lack safety and ethics guardrails, the results can be disastrous.

In one example, Capgemini’s Intrevado shared a story about a grocery store in New Zealand that implemented a generative AI solution that recommended poison sandwiches and drinks that produced chlorine gas when combined.

Cost

Many organizations are prioritizing safeguarding their critical data and maintaining on-premises control as part of their AI implementation strategy. However, they also need to strike a practical balance between security and how much it costs to do so.

Databricks’ Mahapatra says that cost reduction is becoming more inherent in the base models themselves and that cost of fine-tuning has gone down in the last few months. Even creating foundational LLM models of their own is no longer out of reach for large enterprises.

In fact, the explosion of open-source LLMs and enhancements in “legacy” LLMs continue to drive costs down, providing the opportunity for enterprises to maximize both control over their data and their expenses while pushing the innovation envelope. 

[CTA_MODULE]

Practical business applications for generative AI

Sci-fi-level transformation may be a ways off, but AI’s practical business applications are here today. 

Machine learning can support multiple lines of business 

One primary use case is spreading machine learning and AI modeling across multiple lines of business. Combining individual lines of business like finance and HR with a data science Center of Excellence can be highly impactful for many organizations.

This hybrid model involves subject matter experts within each business unit who act as liaisons between the business units and the Center of Excellence, effectively bridging the gap between technical knowledge and business needs.

For example, ML can help predict possible scenarios and assess how likely an employee is to leave a company. With the help of data science experts, the finance team can leverage these scenarios to analyze the potential cost of employee churn — in terms of lost productivity and the cost of a replacement — and use those figures to more accurately forecast the impact on budgets and the bottom line. 

Boost productivity

Concerns over potential data breaches compel large enterprises to prioritize internal solutions to enhance employee productivity. Mahapatra says that companies in the insurance industry, for instance, are already using AI to bridge knowledge gaps and to correct performance pitfalls. 

“There's tons and tons of information for a new employee, so there's a significant knowledge gap between a person who's about to retire and a person who has just come into the workforce,” Mahapatra adds. “Gen AI capabilities can augment their (employees’) day-to-day activities with a smart agent that can help them with their job by understanding things like: ‘Only this portion of the document is very relevant, which saves time wading through pages of information. Now, that's significant productivity gains.”

Shifting job functionalities

Gen AI can help offload repetitive or time- and resource-intensive tasks to free human assets for higher-value activities. For example, Databricks offers an assistant feature that can generate and explain code, making coding tasks more efficient.

In call centers, large language models (LLMs) can help customer service by providing context for customer inquiries and automatically delivering answers or intelligently routing incoming requests to the right team and resource. 

Marketing teams in all industries use LLMs for creative writing, where they can enhance text and generate content — including images and videos — for greater relevance and impact. 

Realizing the true value of AI

Generative AI has the potential to revolutionize industries, automate tasks and unlock creativity. However, it is vital to recognize that this powerful technology comes with its own set of implementation challenges, particularly around data privacy, ethics and security. Check out the full webinar for more details.

Schedule your free live Fivetran demo and see how easy it is to centralize data for all your AI and ML implementations.

Ready to delve into Data Integration for the AI Era?
Watch the full webinar
Ready to delve into Data Integration for the AI Era?
Watch the full webinar

Related blog posts

Can generative AI be the smartest member of your company?
Company news

Can generative AI be the smartest member of your company?

Read post
How to build a data foundation for generative AI
Data insights

How to build a data foundation for generative AI

Read post
Why the modern data stack is critical for leveraging AI
Data insights

Why the modern data stack is critical for leveraging AI

Read post
No items found.
BODi modernizes infrastructure to build customer 360
Blog

BODi modernizes infrastructure to build customer 360

Read post
Automated fraud detection with Fivetran and BigQuery
Blog

Automated fraud detection with Fivetran and BigQuery

Read post
How Fan 360 redefines the sports landscape
Blog

How Fan 360 redefines the sports landscape

Read post

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.