How to secure AI data: Lessons from Fivetran’s CISO

Learn the strategies for managing AI risk and compliance in regulated industries.
January 2, 2025

Data governance and security have emerged as major challenges for C-suite leaders pursuing AI. According to a recent survey of 300 enterprise executives by MIT Technology Review Insights report, 44% cited these issues as major obstacles in their AI efforts. These challenges are particularly pressing in regulated sectors like government and financial services, where the stakes for compliance and data protection are especially high.

As an Information Security executive, I see much of the complexity stemming from a growing reliance on system-to-system integrations. Gartner projects that by 2026, more than 80% of enterprises will have used generative AI APIs or deployed generative AI-enabled applications, underscoring the increasing need to manage data across interconnected systems.

Robust security and governance frameworks are more important than ever. If organizations don’t treat these mechanisms as business critical, their AI ambitions will likely fall short. Let’s take a look at why.

[CTA_MODULE]

Understanding AI data security risks

Many large enterprises are centralizing their operational data into data lakes or warehouses to extract insights. While this approach has its merits, there’s non-trivial risk in setting appropriate data access controls for downstream AI use cases. The source system’s data access might comply with SOX regulations, designed to prevent insider trading by restricting access. But what about the downstream systems? 

Sensitive data residing in both the original and centralized systems can double the workload for security teams, complicating access reviews and governance. When AI needs to access that data, replicating the existing permission structures for AI is extremely challenging. The control mechanisms are entirely different — a blob-based data lake in S3 doesn’t have the same row-based access controls as traditional data management systems.

Compliance offers must ensure the right tools and controls are in place to monitor and manage both data and its usage. For most enterprises, this is an entirely new challenge altogether. 

Once the data is in AI, it needs to be protected

A critical misconception in AI development is the assumption that data fed into AI models doesn't require additional security measures to protect it. This can lead to significant vulnerabilities, particularly when using third-party AI services. Data used to train models could be exposed through queries, especially if models lack proper anonymization or operate in less secure environments. The risk is heightened in sectors like healthcare, where breaches can result in violations of regulations like HIPAA.

The attack vector is similar to classic SQL injection attacks. Using simple prompts, researchers have been able to extract sensitive training data from ChatGPT, including email addresses, names and phone numbers. Imagine if this happened with AI trained on electronic health records (EHRs) at a major healthcare institution.

Gartner emphasizes the importance of focusing on AI trust, risk and security management (AI TRiSM), which includes ensuring data protection, model interpretability and resistance to adversarial attacks. Failing to manage these risks can result in security breaches, financial loss and damage to an organization’s reputation. We saw an example of this with Salesforce in 2018 when an API error exposed sensitive data.

Strategies for robust AI governance

To mitigate these risks, implementing robust data governance frameworks is essential. Set clear policies for data usage, access and storage, and enforce these consistently across all AI initiatives. Techniques like tokenization or differential privacy, which protect individual data points during model training, help reduce the risk of data leakage.

CISOs should prioritize their time on solving complex problems, like getting their teams to clean and secure data. Don’t waste time doing things like manual data integrations when there are commercially available solutions that give you so much leverage. The success of AI initiatives relies on a secure, compliant data foundation. Focus on solving data challenges first, the rest will follow, accelerating the path to AI success. 

Fivetran helps prepare data for AI and ML by delivering clean, secure and governed data to its destination. The following are just a few examples of how Fivetran protects data after it’s arrived in a data destination, whether for building AI models or analytics reporting.

  • Governance and auditability: Granular access control makes it easy to democratize the movement of data to power AI and ML, by methods that are fully governed and auditable. 
  • Metadata sharing and access logging: Metadata sharing enables root cause and impact analysis, as well as auditing and lineage, helping determine the origin and access history of data used in machine learning.
  • Automated data quality: Automated, idempotent replication ensures that normalized, deduplicated data is clean and error-free when it reaches its destination.
  • Data protection mechanisms: Column blocking and hashing allow sensitive data to land in an anonymised, but still queryable state, or omitted from the sync for instances where it is not needed for AI and ML workloads. 

Gartner’s forecast that by 2026, 80% of enterprises will use generative AI APIs highlights the urgent need for CISOs to develop comprehensive governance frameworks in response to rapid AI advancements. Continuous monitoring and adaptation are key to ensuring that governance frameworks remain effective and aligned with both regulatory requirements and organizational goals.

The better control you have, the lower your risk — though zero risk is unattainable. Organizations must balance AI’s economic potential with the practical challenges of data security and governance. While it’s tempting to rush AI adoption for competitive advantage, keep in mind Gartner’s prediction that nearly one-third of generative AI projects will be abandoned after the proof-of-concept stage by the end of 2025 due to rising costs and unclear business value. WIthout a solid data foundation, AI efforts are unlikely to succeed.

[CTA_MODULE]

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

How to secure AI data: Lessons from Fivetran’s CISO

How to secure AI data: Lessons from Fivetran’s CISO

January 2, 2025
January 2, 2025
How to secure AI data: Lessons from Fivetran’s CISO
Learn the strategies for managing AI risk and compliance in regulated industries.

Data governance and security have emerged as major challenges for C-suite leaders pursuing AI. According to a recent survey of 300 enterprise executives by MIT Technology Review Insights report, 44% cited these issues as major obstacles in their AI efforts. These challenges are particularly pressing in regulated sectors like government and financial services, where the stakes for compliance and data protection are especially high.

As an Information Security executive, I see much of the complexity stemming from a growing reliance on system-to-system integrations. Gartner projects that by 2026, more than 80% of enterprises will have used generative AI APIs or deployed generative AI-enabled applications, underscoring the increasing need to manage data across interconnected systems.

Robust security and governance frameworks are more important than ever. If organizations don’t treat these mechanisms as business critical, their AI ambitions will likely fall short. Let’s take a look at why.

[CTA_MODULE]

Understanding AI data security risks

Many large enterprises are centralizing their operational data into data lakes or warehouses to extract insights. While this approach has its merits, there’s non-trivial risk in setting appropriate data access controls for downstream AI use cases. The source system’s data access might comply with SOX regulations, designed to prevent insider trading by restricting access. But what about the downstream systems? 

Sensitive data residing in both the original and centralized systems can double the workload for security teams, complicating access reviews and governance. When AI needs to access that data, replicating the existing permission structures for AI is extremely challenging. The control mechanisms are entirely different — a blob-based data lake in S3 doesn’t have the same row-based access controls as traditional data management systems.

Compliance offers must ensure the right tools and controls are in place to monitor and manage both data and its usage. For most enterprises, this is an entirely new challenge altogether. 

Once the data is in AI, it needs to be protected

A critical misconception in AI development is the assumption that data fed into AI models doesn't require additional security measures to protect it. This can lead to significant vulnerabilities, particularly when using third-party AI services. Data used to train models could be exposed through queries, especially if models lack proper anonymization or operate in less secure environments. The risk is heightened in sectors like healthcare, where breaches can result in violations of regulations like HIPAA.

The attack vector is similar to classic SQL injection attacks. Using simple prompts, researchers have been able to extract sensitive training data from ChatGPT, including email addresses, names and phone numbers. Imagine if this happened with AI trained on electronic health records (EHRs) at a major healthcare institution.

Gartner emphasizes the importance of focusing on AI trust, risk and security management (AI TRiSM), which includes ensuring data protection, model interpretability and resistance to adversarial attacks. Failing to manage these risks can result in security breaches, financial loss and damage to an organization’s reputation. We saw an example of this with Salesforce in 2018 when an API error exposed sensitive data.

Strategies for robust AI governance

To mitigate these risks, implementing robust data governance frameworks is essential. Set clear policies for data usage, access and storage, and enforce these consistently across all AI initiatives. Techniques like tokenization or differential privacy, which protect individual data points during model training, help reduce the risk of data leakage.

CISOs should prioritize their time on solving complex problems, like getting their teams to clean and secure data. Don’t waste time doing things like manual data integrations when there are commercially available solutions that give you so much leverage. The success of AI initiatives relies on a secure, compliant data foundation. Focus on solving data challenges first, the rest will follow, accelerating the path to AI success. 

Fivetran helps prepare data for AI and ML by delivering clean, secure and governed data to its destination. The following are just a few examples of how Fivetran protects data after it’s arrived in a data destination, whether for building AI models or analytics reporting.

  • Governance and auditability: Granular access control makes it easy to democratize the movement of data to power AI and ML, by methods that are fully governed and auditable. 
  • Metadata sharing and access logging: Metadata sharing enables root cause and impact analysis, as well as auditing and lineage, helping determine the origin and access history of data used in machine learning.
  • Automated data quality: Automated, idempotent replication ensures that normalized, deduplicated data is clean and error-free when it reaches its destination.
  • Data protection mechanisms: Column blocking and hashing allow sensitive data to land in an anonymised, but still queryable state, or omitted from the sync for instances where it is not needed for AI and ML workloads. 

Gartner’s forecast that by 2026, 80% of enterprises will use generative AI APIs highlights the urgent need for CISOs to develop comprehensive governance frameworks in response to rapid AI advancements. Continuous monitoring and adaptation are key to ensuring that governance frameworks remain effective and aligned with both regulatory requirements and organizational goals.

The better control you have, the lower your risk — though zero risk is unattainable. Organizations must balance AI’s economic potential with the practical challenges of data security and governance. While it’s tempting to rush AI adoption for competitive advantage, keep in mind Gartner’s prediction that nearly one-third of generative AI projects will be abandoned after the proof-of-concept stage by the end of 2025 due to rising costs and unclear business value. WIthout a solid data foundation, AI efforts are unlikely to succeed.

[CTA_MODULE]

Read MIT Technology Review's "AI readiness for C-suite leaders"
Download the report now
Read MIT Technology Review's "AI readiness for C-suite leaders"
Download the report now
Topics
No items found.
Share

Related blog posts

MIT research: Data readiness is key to AI readiness
Data insights

MIT research: Data readiness is key to AI readiness

Read post
Data governance is a top priority for AI success
Data insights

Data governance is a top priority for AI success

Read post
AI readiness requires a unified data architecture
Data insights

AI readiness requires a unified data architecture

Read post
No items found.
No items found.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.