Since the launch of ChatGPT in late 2022, the limelight has been on generative AI. By generating text, code, images, audio and other media in response to prompts, generative AI can multiply the effort of creative and intellectual work of all kinds.
What generative AI can (and can’t) do
All organizations produce a huge corpus of text through contracts, blogs, call transcripts, chats, project management tools, emails and documentation of all kinds. With large-scale pattern recognition abilities and full access to your organization’s accumulated data, a generative AI model can become the most knowledgeable member of your organization, assisting with analytics, customer assistance, sales and marketing content. It can generate code to accelerate the software engineering process and even rapidly brainstorm and prototype new products and concepts.
The main limitation of generative AI is that, although it can simulate human-like intelligence, it isn't human and has no semantic, contextual understanding of the data it trains on. It's limited by the patterns within the data, which means that ensuring the data is trustworthy is of utmost importance.
To support the effective, safe and ethical use of generative AI, you'll need a solid foundation of high-quality data underpinned by governance. A strong data foundation helps ensure that data scientists and engineers can focus on prototyping, bringing to production and continuously improving highly sophisticated AI models.
Garbage in, garbage out
The trustworthiness of data—and therefore, the quality of a generative AI model—depends on the quality, volume and freshness of data. Without any semantic, human-like ability to separate truth from fiction, generative AI can generate hallucinations and other nonsense results. Although avoiding nonsense results is partly a matter of retraining and tuning a model, it's more fundamentally a matter of ensuring that its underlying data is accurate, consistent, complete, valid and, above all, accessible.
Automation is accessibility
The accessibility of data depends on a suite of tools called the modern data stack: a data platform that can store structured and semi-structured data, a data pipeline that extracts and loads data from critical systems to that platform and tools to orchestrate the transformation and modeling of data.
Automation is a key feature of these tools. Without automation, data engineers, analysts and scientists must either build and maintain these systems or manually combine data into usable models. Both approaches can produce poor results, divert and bog down valuable data talent with manual and repetitive tasks and lead to long turnaround times and business decisions made on the basis of stale data.
Know and protect data
Accessibility must be combined with a strong framework for data governance, allowing an organization to know, protect and manage access to its data. To know data, data teams must also be able to track the provenance and lineage of data models to ensure they're valid and replicable. They must also have metadata about when and how data has been accessed and altered.
To protect data, businesses must be able to mask sensitive information like PII (personally identifiable information) and control data residency. Businesses must also be able to manage permissions and access as data use becomes increasingly democratized across an organization. Data governance is increasingly critical both for the sake of regulatory compliance and better customer outcomes.
Generative AI requires a solid data foundation
With the ongoing hype cycle of generative AI, more and more open-source generative AI models pre-trained on large repositories of public data are available that can be further trained on your proprietary data. These models will likely only continue to grow in power, sophistication and ease of use. But to make full use of these models, you must first ensure the integrity, accessibility and governance of your own data.