How to make generative AI a reality

Fivetran CEO George Fraser, former Snowflake CEO Bob Muglia and Capgemini EVP of GenAI Steve Jones explore the path from AI concepts to full-scale implementation.

https://fivetran-com.s3.amazonaws.com/podcast/season1/episode13.mp3

Apple Podcasts

JUMP TO

Mentioned in the episode

Topics:

AI/ML

More about the episode

Execution beats strategy in every field, including generative AI. In this discussion, George Fraser, CEO of Fivetran, Bob Muglia, former CEO of Snowflake, and Steve Jones, EVP of Capgemini, explore the key challenges and solutions for developing mature, production-ready generative AI models.The focus isn’t just on clever algorithms or massive data volumes — it’s on effective data curation and robust data management practices.

The future of generative AI remains open, but clear trends are emerging. As Jones notes, “The technology is going to keep evolving. Every week there's a new model. Every day there's a new variation and a new piece. But one thing I can predict is if a company can rely on AI to really make decisions and forecasting and simulation ahead of the market, they're going to win against somebody who can’t.”

Muglia also highlights the growing importance of knowledge graphs: “A knowledge graph is simply a digital representation of human created knowledge that can be understood by a machine. We're still at the point where that is quite nascent in technology, but I think it's going to become much more important over the next couple of years.”

Highlights from the conversation include:

The future of storing structure and semantic meaning in knowledge graphs
How organizations can master data curation and other data management practices
Understanding the limitations of relying on a single AI model for all analytics needs

Watch the episode

Transcript

George Fraser (00:01)

Hi. I'm George Fraser, CEO of Fivetran. I'm joined today by Steve Jones, EVP of Data-Driven Business at Capgemini, where Steve helps companies make the promise of AI into a reality for their business and by Bob Muglia. Bob has had a series of roles as a technology executive over the course of his career, including being the CEO of Snowflake during its formative years. He now sits on boards, including Fivetran's board.

We're going to be talking about AI today. It's a hot topic, and we have a lot of content to cover. I would like to start by asking what people are doing with AI right now. There are a lot of proof of concepts out there.

Steve Jones (00:41)

Yeah.

George Fraser (00:42)

There's a lot of talk about AI, but what workloads are people doing, in reality, today?

Steve Jones (00:51)

I think one of the things to recognize is the let's call it historical AI: what people did with machine learning and deep learning that's been going on for years. Forecasting, predictive analytics — those things are all out there. Then when you start to get into the gen AI space and people are starting to use content and starting to use the idea of interrogation and natural language interrogation, the gap, as you say, between POC and production is huge. So actually a lot of the time people are doing fantastic POCs — great business opportunity — but not being able to transition it into production.

One of the big things that we've been focusing on a lot at Capgemini has been the fact that, here in San Francisco, there's a huge number of companies that will help you do A POC. A huge number of companies will help you pretend you have the data. A huge number of companies will therefore have a phenomenal demo, but if you look at “What does it take to actually put a gen AI solution into a call center that can be trusted and deal directly with your customers?” Those solutions are going into production, but only with companies that really look at the quality of the data, the provisioning of the data, its availability to ensure the whole digital business model is defined and constrained. There really isn't an area where you can look and say that there aren't examples of people putting things live into production. There's a large CPG company we work with where pretty much everything that they produce on e-commerce channels is generated using generative AI in 30+ languages. There’s people using generative AI to respond to customers from an acquisition perspective, from a sales perspective and from a retention perspective.

We have an example involving one of the world's largest pharmaceutical companies. If you speak Portuguese and it's 2:00 AM and you are being treated for breast cancer, you can speak to a digital avatar that'll give you a zero-hallucination response and information about your treatment. Are you taking the right drugs? Are your symptoms what you would expect? It obviously won't do any diagnosis because that's not what it's there for and it's very closely bound, but you can do that with the current generation of AI.

The answer right now is that people are doing everything with AI, but the people who are winning are the ones who are getting it into production. I think one of the biggest pieces right now is that the gap between a good demo and production has never been bigger. One of the biggest things around that is, have you got the data you can trust for that solution so that you can trust its decisions? For a lot of companies, they just do not have that.

Bob Muglia (03:22)

I’d probably say one of the most successful applications right now is for developers — things like copilots that can help developers write code more efficiently. A lot of companies are benefiting substantially from that and getting more productivity out of their developers. In general, generating content summarization, transcription and translation are more mature technologies. I think what is next on the list (which has still not hit the maturity, to the points that you're both making) is creating knowledge bases within companies and allowing these bots to answer questions for either end users within the organization or customers.

The potential for that is incredible, and I don't think we're very far away from solutions becoming reasonably mature. People are doing a lot with retrieval augmented generation: taking knowledge bases, using vectorization and vector databases to store that content, and taking that in conjunction with models to allow people to answer questions. But it's still a little early and the tools are still very nascent. It's still very much a do-it-yourself environment, and I think that'll change substantially in the next 6 to twelve months.

George Fraser (04:30)

One of the things we've noticed in our attempts to create an internal knowledge base search tool at Fivetran — Fivetran, if you think about it, is a very good use case for this as a support agent or a sales engineer or an account executive or a software engineer. At Fivetran, you need to have mastery of this extreme breadth of information. We have 500 connectors. You might get asked a question by a customer about any one of them, and you might have to be able to be conversant in the internal details of the SAP HANA database or 499 other things. One of the things we've found is that a lot of the work is really in just curating the data, in getting it together and ready for your language model. The layer you put over top of that DIY is not so bad. That layer can be relatively thin. A lot of vendors are building a lot of tools to help accelerate that, but the long pole in the tent does seem to be collecting and curating the data that feeds the application. Would you agree with that, Steve?

Steve Jones (05:37)

I completely agree. I actually wrote something this week in terms of gen AI washing. I think it’s great technology, but there's never been a technology that's so easy to implement badly. The demo may go quite well for the first two seconds, but then when you get into the dimensionality of the data — because we're talking about how you add dimensionality to unstructured data, data that, a lot of the time, has not been written well. A lot of support documents have been written by engineers and that's not always the best quality. It's more like a thought pattern on a page sometimes.

With call centers, for instance, I was working with clients and the first thing we identified is that they had phenomenal data for bad calls — calls that didn't solve problems — because people document the excuses. People say, “Well I wasn't able to fix it because of this.” The ones who could fix it, fixed it because they were KPI-ed on call time.

The first thing we had to do with that was build something to do summarizations of existing calls back to the call people to say, “Is this what you did?”

This involved integrating with the code base and integrating with the things they used in their job, because otherwise all we had was fixed. Then we have to look at how to actually represent that information to an LLM so that it actually makes sense, because there's a lot of hype in terms of inferring intelligence, but they didn't have any.

The way you vectorize the information fundamentally tells the LLM how you want it to be presented. You can't just take a 20,000 word document and say, “I'm just going to chunk it up into bits.” That's not what you need to do. You need to start looking for the thread that makes sense for this type of question. Take the HANA piece: if somebody phones up asking how to connect it, then there's probably something in there about connection and passwords and authentication and those pieces. If somebody's asking you, "I've just got this data back from the HANA database and I don't understand what this field is,” that might be in the same document, but if I just chunked it based on 500 words or 500 characters, it wouldn't actually help me in any way to answer the questions.

I have to sit back and think, how does this data represent the business in a way to the LLM so now I can ask the question? Curation of the data, particularly unstructured data, is fundamentally a new skill that businesses have to learn to make LLMs more than just incredibly creative hallucination engines.

George Fraser (08:15)

One of the things we immediately noticed when we were working on this at Fivetran was that a really important data source is our internal knowledge base, which is basically a wiki. We're replicating that using Fivetran by connecting to the underlying CMS. One of the things we immediately noticed is that depending on how the information is structured in articles in that internal knowledge base can make the LLMs perform a lot better or worse.

Bob Muglia (08:39)

Oh yeah.

George Fraser (08:40)

Where we went next with that was really interesting to me. Everyone's reaction was that it might be easier to just change the structure of our knowledge base a bit to make it more LLM friendly. Rather than trying to build some layer of pre-processing, let’s just move the articles around a little bit. That is a feedback loop that is going to happen once you go live. I think it's going to be really interesting to see to what degree that starts to happen. Will we start to build knowledge bases, not for humans, but for LLMs?

Steve Jones (09:08)

One of the pieces we've worked a lot regarding code is how to redefine your coding standards so an LLM can then come back and do support maintenance. Now comments mean a lot more than just putting the code in, because the comments are what described the LLM. The old school problem we have with maintenance is where somebody's fixed the bug but not the comment. The LLM is going to get confused by that. It’s those sorts of practices where you put an audit in place to ensure the comments in a file seem to describe the code that follows them. That's something an LLM is actually quite good at doing. You can put that into your DevOps cycle so as you're checking in the code, you’re checking that you commented the code that matches the code you just changed. If not, you can go change the comment.

That's the sort of practice you're going to have to put in place moving forwards. Otherwise, the LLM is going to start trusting the comment more or trusting the code more, and you're going to end up with the significant problem of it hallucinating what your code base actually does.

Bob Muglia (10:18)

People and machines have to be able to work together in a way that each understands the other. You talked about people making changes, and I think as time goes on and the models get better, they'll be able to actually generate more of what you can think of as knowledge graphs from data that we have, whether it's documents or different kinds of systems. We’re beginning to really understand the structure of these things.

One of the areas where we still have immaturity is the ability to take the structure of human-generated information and transcribe it into a format that the models can really understand. I think knowledge graphs are going to play a very key role and companies will begin to think more about how to establish their business process and rules in the form of knowledge graphs that can be understood by both people and by machines.

George Fraser (11:12)

You love knowledge graphs?

Bob Muglia (11:14)

I do. I'm a fan of them.

George Fraser (11:15)

You can't have a conversation with Bob for more than 15 minutes without knowledge graphs coming up.

For someone who's not familiar, what is the tip of the iceberg? What's a knowledge graph? How is it different from a folder filled with documents or a table with rows and columns?

Bob Muglia (11:31)

Well, a folder filled with documents has the foundation of a knowledge graph in it. When people create a document, they create structure, and that structure has meaning. It has a lot of meaning. Look at a table within a document. It’s not necessarily just straight rows and columns, they often have sub columns and things like that, all of which have meaning that we have learned to infer over time. How do you translate that into a format that the machines can understand?

The way to do this is to essentially create a structure that reflects what the human would understand, and that turns out to be a knowledge graph. A knowledge graph is simply a digital representation of human-created knowledge that can be understood by a machine. We're still at the point where that is quite nascent in technology, but I think it's going to become much more important over the next couple of years.

George Fraser (12:27)

Nascent is a good word in this context. It’s early days for this new wave of AI powered by giant matrix multiplication. There are a lot of unknowns. How does an organization prepare for the unknown?

Steve Jones (12:45)

If you are going to say, “In future 50% of our decisions are going to be AI enabled, either supported or directly made by AI,” then the first question is, “Do we have a representation of our business in a digital form that could make that happen?”

Even if the answer is, “It's going to take five years to get there,” then the precursor is, “Do we have a digital representation of our business?”

One of the biggest problems right now in most organizations, as Bob just said in terms of the documents, is you can apply that to everything. You apply that to the decisions. Is there a digital description of the decisions and the digital instruction in the organization? No, it's embedded within the organizational hierarchy and cherished beliefs and cultural social norms. Therefore, I can't rely on an AI to make a decision.

I think one of the biggest things that businesses need to look at is this isn't a technology change. I was speaking to the head of AI at one of our largest financial services clients the other week, and they actually made a great point, which is that it’s an organizational change. This is going to change the organization more than it changes technology. Technology is going to keep evolving. Every week there's a new model. Every day there's a new variation and a new piece. What's really changing is the idea that the business is going to delegate authority for decisions to AI like they've never done before.

Therefore, if they don't have that digital description of their business, then they're going to lose. I think the one thing I can predict is if a company has a digital description and can rely on AI to really make decisions and do forecasting and simulations ahead of the market, they're going to win against somebody who can't. I think it's that sort of retail mall versus Amazon moment. We're going to get to a stage where some businesses and sectors can manage themselves digitally and trust AI to make decisions, and others can't and it will not be a fair competition. I think that's the thing right now: as a business, are you in control of your operational reality? Do you have a digital definition of your business?

Bob Muglia (14:55)

Nobody really does right now.

Steve Jones (14:57)

I agree.

Bob Muglia (14:59)

All that information is scattered all over the place. It's in documents; it's in people's heads. A lot of it is in code embedded within applications, many of which you don't even own, because they're SaaS applications that you have a subscription to. The interesting thing is how you actually create that digital description, which, to me, is a knowledge graph. By the way, when you say digital description, I'll say knowledge graph. I think they're basically the same.

Steve Jones (15:21)

Can I say hyper graph?

Bob Muglia (14:59)

I think one of the really interesting things I've learned over the last few years is that people are not very good at taking complex business processes and creating those digital descriptions. I do think this is where models are going to help. They're going to help to curate that, but it's going to require this combination of models being able to pull information out of whatever sources exist, being curated by people and validated by people, to ensure that the business process is accurately reflected in the digital description. If it's not accurately reflected, the model can't possibly make the right decisions.

Steve Jones (15:59)

We believe, particularly in tech, that business processes are real. When you start modeling businesses, they work much more as a collaborative network than they do as a business process, but we've encoded those business processes into applications.

My favorite one is the idea that lead-to-cash is real. There's a sales team and there's a finance team, and they hate each other for very good reasons. Finance is there to control risk, and the sales team wants to sell anything they can. The dynamic of that collaboration we encode within lead-to-cash. Then when we have data problems, we pretend it's a business process issue.

I think we're going to find that the models start learning that those business processes aren't real. It's the collaborations and the organizational networks, which is the graph, that actually drive the business and its operations rather than what we've encoded in those transactional systems. That’s all about that data view and that collaborative view of the organization. For most people today, that just lives in Excel spreadsheets and emails, and that's a massive problem.

George Fraser (12:27)

There are a lot of vendors who are trying to help people tackle these problems, but AI is a paradigm shift and sometimes paradigm shifts blow up existing vendors. What do we think is going to happen? Are the existing vendors going to adapt and encompass these workloads or is this going to turn the apple cart over and allow someone else to come along whose data platform is based on knowledge graphs or whatever it might be that is going to serve these new workloads?

Bob Muglia (17:37)

So far, AI has very much benefited the existing vendors and they've been able to incorporate technology into their existing products and enhance them. To me there are five major data platforms: Snowflake and Databricks plus the three cloud vendors, Microsoft, Amazon and Google. I think that all of them are focused on AI and incorporating AI deeply into their platform. I think it's the correct way to do it, because everybody is moving to a common form of data lake that'll be able to store all kinds of data, structured, semi-structured, and unstructured or complex data. The tools and facilities are being built around these platforms, and I think that's the way the world is going to go. So I don't predict that there's going to be upheaval in the platform vendors. I do predict major changes in terms of what those platforms provide, particularly with new AI facilities.

Steve Jones (18:29)

All of these platform vendors today deal with post-transactional data. They all deal with what I call “dead data.” It's finished, it's there, we curate it, we provide the view. It's great for learning in those areas, but for decisioning for operational speed pieces, that's the remit of the application vendors.

Historically, the entire dynamic of the market has been that we all accept that applications produce rubbish data and then the data platform vendors spend a huge amount of time cleaning it up. We use an expression, “data is the new oil.” Part of that is the idea that data needs refinement before you can possibly trust it. Well, if I've got an AI that's now acting as my procurement manager with one of my suppliers, I can't wait 24 hours. I can't wait for all of that cleaning.

If you're looking at an area where there's going to be the most disruption and contention between players in the market — if I'm running an application world and I want to be data driven and I want to use AI to make decisions, that means my application needs to be led by data. Right now that's not what happens with applications. Applications are led by transaction, are led by process and then output a bunch of poor quality data.

If you're saying, “No, AI drives those decisions,” that creates a very interesting dynamic between the data platform vendors who've never gone there and the application vendors who've never gone there. That's where this new world lives: in front of the current applications. If AI makes decisions in operations, that's the shift. I agree, I don't think it's going to be a massive shift on the backend, but I do think when you start seeing AI go into operations and make operational decisions, we're going to change the way we build applications. That is a massive switch.

George Fraser (20:20)

That's interesting. That might be more of a change for the application vendors than the data platform vendors because they're going to be having to federate to gather context from all of these other places.

Steve Jones (20:33)

They'll have to get the context from other places and they'll have to concentrate on data accuracy. It won't be about data quality. Data quality is what we do because the data isn't accurate. If an application vendor wants to say, "I'm going to lead with AI and data," that means the data has to be accurate. Right now that's not a common thing within the application space.

George Fraser (20:57)

I completely agree that the idea that data needs to be refined is true, but really, it should not be your first choice.

Steve Jones (21:09)

Yes, exactly.

George Fraser (21:10)

We should fight against that. Go and try to fix your Salesforce configuration, go and try to simplify your SAP implementation so that the data is more accurate to begin with. There will be many benefits now and in the future, not just that your data engineers don't have to spend all their time regrouping opportunity channels.

It will be much easier to build all kinds of other workflows that you haven't even anticipated yet.

Steve Jones (21:31)

Yeah, exactly. Historically in data, we've accepted that applications produce poor data. I used to say that the biggest lie in data is, "We'll fix it at source," because we never did because we didn't have the authority.

Now you're starting to see the business side, which is the other side of applications saying, “No, I need to be able to rely on AI.” I think that mentality changed. The need, therefore, to be bringing context into operations changes the dynamic for the data platforms, but also more significantly changes the data dynamic of applications.

George Fraser (22:10)

Historically, the data platforms have been used a lot for reporting. That was the number one use case, and probably still is today if we went and annotated everyone's SQL queries. More and more workloads are running on top of these systems, including the ones we've been talking about. What are the implications for the major data platforms in terms of how they need to change to do things other than reporting in order to do all these new workloads?

Bob Muglia (22:40)

I think one of the key things is going to be continuing to focus on the governance of these environments, because more and more intelligent applications are built on top of these platforms. They are part of the business process and are beginning to make business decisions directly, so it’s important to have the governance pieces in place so that only the data that should be accessible is accessible at the right time.

Those are some of the short-term changes I think you'll see — a lot of focus on ensuring that these things are coherent and secure, because you're going to be putting almost all of your data in a central place and using it to now make a much broader set of business decisions because AI will do it directly.

Steve Jones (23:24)

I think the other piece is that reporting is a human-speed, post-transactional, 24-48 hour process. People are comfortable with the delay. The biggest thing that people have to get away from when they think about reporting is that moving towards AI is that it needs to fundamentally change the data availability speed.

It’s not, “Bob needs a report on his desk tomorrow morning.” AI is running 24/7. It's changing and making the recommendations. We are monitoring it 24/7. We are looking at those things and this information is streaming through. It’s that sort of evolution of change of pace.

The other piece between reporting and more large-scale data is the shift away from the mindset that there is a magic schema. Governance is incredibly important, but there is no magic schema that describes your business. I still see people today with data lakes, with these modern data platforms still trying to say that governance is having an enterprise data model and making everybody comply with an enterprise data model.

Number one, it never ever works and it never ever will. It’s also a massive drag on AI efforts, because I've never met a data scientist in my entire career who ever worried about one of those things. Number two, these new models (when you're talking about how they consume data in those pieces) don't care about your EDM either. The biggest piece is number one: you've got to change your mindset on speed. Number two is you've got to recognize that governance means representing the operational accuracy of the business, not representing arbitrary quality metrics of things like completeness.

Yes, great, that is not complete. We've got to deal with it. Is it accurate within its context? It is accurate, but I'd really like to have these extra four fields. Good for you. I don't care.

I think the mentality of moving reporting to AI is that accuracy and the reflection of your reality should be the measure. A lot of the time in reporting it's been, Does the field in the report at the end have data in it and look good? That's not the same governance that we need between traditional reporting and governing for AI and operations.

George Fraser (25:42)

From your lips to God's ears. I think I have seen so many companies spend way too much time trying to create that perfect schema that has one table for every concept, and it just cannot be done.

The right approach is to ask what problems we’re trying to solve this year and then work backwards from those. What is the schema that we need to support those? It’s going to evolve every year. There's never going to be some promised land — the perfect dimensional schema where everything has been de-duplicated and quality-ified.

Steve Jones (26:22)

As Bob said, it's not the same in different areas of the business. When you talk about knowledge graphs, part of the point of the power of those has to do with the sales team.

One of my favorite examples comes from working with a very large brewer who tried for two and a half years to create a unified schema for all of this stuff. The biggest area they had was customer. I was an MDM guy back then, and they tried to do it. The first question I asked was, "Okay, how do you sell in these different regions?"

The answer was, "Well in North America and Europe, they sell to big box retailers and wholesalers and people like that."

Great. "What's your ability to cross-sell and upsell?"

"Zero. They're on different continents. How on earth would we do that?"

"What do you sell in South America?"

"Oh, we do a lot of small-scale retailers and mom and pops. We actually have a direct-to-consumer business in these."

Why on earth did you spend two and a half years trying to create a standard model just for your business to go, “Yeah, we don't work like that”? The business needs to protect the business, and in different areas of the business will have a different view on the data. That is not a mistake. Often with reporting mentality, it's viewed as a mistake to be fixed rather than recognizing the business doesn't work like a perfect data model.

Bob Muglia (27:39)

That comes back to the digital description that you described earlier; the knowledge graph, the idea that different parts of the business, different divisions and different groups within them do things differently. That needs to be reflected.

The other thing that's really important is time, because things change over time. The way a business worked last year could be quite different than the way it works this year. That's reflected in the data.

Without that understanding of the change in the way the business process works, the data is almost garbage. You can't really analyze it correctly. Having all of that understood is critical. And more and more we'll need to have those things digitally described for the AI to make the correct decisions.

George Fraser (28:10)

We made a big investment years ago in making history mode ubiquitously available in Fivetran for this reason. It’s not enough just to know what is true today. You need to know what was true in the past. Sometimes you need to know, what did I think was true two years ago from the point of view of one year ago?

Steve Jones (28:32)

It's incredibly important. The history of change of data is one of the most valuable things that a business has because that tells them how they got from A to B and how these things evolved and changed. It's what AI models will need when you do deep learning on your own data. It's the change. That's really what you're trying to do.

When you look at applying gen AI instead of doing forecasting or simulation, one of the most brilliant things from last year that Google did on the DeepMind piece was a weather simulation using genAI. If you'd gone 10 years ago and said, we're going to do a simulation of weather for the planet, people would say, "That's insane."

We can do simulations down to a one kilometer area for weather forecasting because we've got 40 years of history of weather data. With Fivetran, you've baked in this idea of history of change and your competition doesn't. If you've got five, 10, 15 years of history and can say, these are the ways that things impacted and changed, and run an AI that can automate that and therefore make a forecast 24 hours ahead of my competition — you’re going to beat your competition. It will be as simple as that. History of change is probably the most valuable thing that companies should be constructing right now if they haven't already started.

Recognizing what the value of that is going to be (not just in terms of managing the business and BCP) is incredibly important, but that's going to be the foundation of how you start changing the way you do management into the future and decisioning, because that's how you're going to be actually forecasting the future based on that history of change.

That's a huge shift over our traditional Excel and 3% forecast targets.

George Fraser (30:20)

A moment ago you mentioned a company having 40 years of history in a dataset. When you have 40 years of history, that means your dataset is going to outlive your tools. Many tools are going to come and go, and you are going to have that same dataset and this creates a problem. You can end up with a long series of very painful migrations. One thing people are talking about a lot right now as a potential solution to this problem and other problems is data lakes. Do you think data lakes have value here? And what does that term mean to you?

Steve Jones (30:54)

The basic principle of a data lake is a separation of compute and storage. Now, the compute side has multiple variations, but we're talking about having an archival standard for data availability.

To me, the foundation of a lake is the idea that I've stored everything and every change. I will compress it. If it's something like a stock ticker saying what the stock price is every half second, you don't store every one, you just store the changes. Regardless, it's that mentality of having an archival history of change, and then on top of that, provisioning it for purpose. The purpose will change, the usage will change, the technologies will change, but it's all going to be based on the idea that I've got a stored history of change in my organization.

George Fraser (31:40)

At Fivetran, we’ve seen that pattern sometimes with large companies. In DIY data pipelines, they'll record the same data over and over and over and over. They'll say, I have a petabyte of data, but really they have a terabyte of data a thousand times.

Bob Muglia (31:58)

If you don't change data capture, that can be really problematic. I think the one key thing is that data lakes are moving to an open format that allows a choice of vendor tool around it. We're in a little bit of a weird situation right now because we have two formats that are competing with each other, Delta and Iceberg. It looks like-

George Fraser (32:20)

Six or seven catalogs.

Bob Muglia (32:21)

The catalogs are all different. It looks like there will be some convergence on this in the not-too-distant future. Iceberg seems to be gaining a fair amount of traction, and by having a common open format, which allows (regardless of who owns the catalog) different vendor solutions to be built on top, whether it's data quality, data warehousing, AI tools, machine learning — those things can be combined. I think that's a very, very major change that is now possible because the technology has evolved sufficiently, and we have some open formats that can be used by multiple vendors.

George Fraser (32:58)

Right. I said earlier that it's a folly to make predictions in a time of so many unknowns, but let's do it anyway. Why not? Let's all make some near-term predictions for what's going to happen, let's say, in the next year?

I'll start. My prediction is that these AI workloads are going to be more like business intelligence than application development. I think the platform vendors are creating some great building blocks, and companies are going to find a lot of success in assembling those building blocks together along with a very curated view of their own data in order to create great internal tools.

Bob Muglia (33:39)

I think in the next 12 months or so, we’ll see the idea of creating knowledge bases within companies, both for their customers and for their internal users, and having AI bots able to answer questions for people. I think that's going to become mainstream in the next year. That's a very big technology shift if you think about it, because a whole lot of what we do is look for information. Having systems able to do that for us is going to be a big change.

Steve Jones (34:08)

I'll make two predictions just to break the rules and be wrong twice.

The first one I'll say is I think that purpose-specific models for enterprises are going to beat out big general models. That includes taking general models and almost lobotomizing them for purpose, restricting them. There won't be one model that knows everything. They’re going to be purpose-specific, and the orchestration of that is going to be the way that enterprises really scale this up, because that's how you scale people to be honest.

The second thing is what you said in terms of the development, whether it's BI or whether it's like application development — I'll actually challenge it. We can have a competitive prediction. I think it's going to be something new, because I think that it's like when people started building for the internet over building client servers. I think there's a shift that requires people to know how to build an application that is data-driven, that has two sets of cultures that need to be combined.

Whether it looks slightly more like business intelligence or slightly more like an application, I think it's going to absolutely have that mentality. Some of the ones I've been working on in the last year or so have teams that are very blended between people who understand what operations mean and people who understand what data means. I think that's what we're going to see in that space. I think we're going to see the multi-model, collaborative model-type piece and the knowledge graph and hypergraph description around that. We're going to see this creation of a new sort of skillset in people who understand data and as well as those who understand data within the scope of an operational decision. I think those two skillsets are relatively rare right now. If I was starting out, that's probably what I'd be focusing on as a career.

George Fraser (35:55)

Thank you both for joining me for this fascinating conversation. I think we can all agree that we're all excited about participating in the evolution of this whole system.

Bob Muglia (36:06)

A lot of change, a lot of change happening. A lot of positive things can be done with this.

Show the full transcript

Expedite insights

Mentioned in the episode

66%

more effective at replicating data for analytics

33%

less data team time to deliver insights