For so many organizations today, data is the business. Data is a critical enabler in every part of the company from the front line to the back office. Over the last decade, the need to store, process and analyze data in new and innovative ways has led to the rapid innovation in cloud technology as well as an exponential growth of new products — creating a new world based around the modern data stack.
In a recent fireside chat, Fivetran CEO George Fraser caught up with Ali Ghodsi, CEO of Databricks, and Bob Muglia, tech industry veteran and former CEO of Snowflake, to discuss what’s happening in the world of data.
This interview has been edited for length and clarity.
Bob Muglia: What's going on George, in data?
George Fraser: People's ability to work with data is catching up to their desire. That's a big trend.
Bob Muglia: We're at a point now where many people have access to the information they need, and that wasn't true a few years ago. That really is the modern data stack that's made that work. Sometimes people talk about different approaches, and if you're in Silicon Valley and smart enough, you can try and build these things yourself. The modern data stack has democratized data for companies of all sizes to really use it effectively.
Ali Ghodsi: The cloud also, right?
Bob Muglia: Yeah. The modern data stack couldn't be created without the cloud because the modern data stack is really about delivering analytics as a service, being able to achieve cloud scale and the cost efficiency of the cloud. The other thing is SQL – using SQL for data modeling. That's key to the modern data stack.
Ali Ghodsi: I would say SQL is awesome, but Python is also on fire. There are millions of students going out there learning Python. You're going to have all these people that come out that are data literate and they know how to do programming, whether it's Python or SQL.
If you look at those two trends, Python and SQL are on fire.
Bob Muglia: People will learn a procedural language like Python and they'll really get great at that, but they're also calling SQL to actually get their job done. Those two things together are really how the modern data stack is solving the predictive analytics problem.
George Fraser: SQL is interesting because you see a lot of late learners of SQL be very successful with it. They can learn SQL pretty readily, and they can use it to solve real problems. The human dimension of it is very important. It has a few less sharp edges than say Python and other procedural languages and that creates an open door for a lot of people.
Ali Ghodsi: I'm going to predict the modern data stack and the future will be, the lingua francas will be Python and SQL on equal footing. They will be supported, and they have pros and cons.
George Fraser: It's funny because they're both deeply flawed for historical reasons, and it's unclear whether the flaws will ever be fixed. They're not big enough to really stop their momentum. It seems to be inevitable.
Ali Ghodsi: The perfect language is Esperanto, but here we are speaking English.
George Fraser: Yeah, exactly. The perfect ones never win.
Bob Muglia: SQL isn't perfect, but it really has solved an incredible amount of problems. It has become, as we say, the lingua franca of how data is transformed and how it's accessed and queried. I don't think that's going to change anytime soon. I don't know that it can ever evolve to take on what people are doing with languages like Python, so we'll have to see. Eventually, something will come to replace SQL, and there will be new languages that will supplant Python.
George Fraser: I don't think it will even be supplanted. I think when we're flying starships to Alpha Centauri, there will be SQL databases in there. It's like a chicken and egg problem that never change these things.
Ali Ghodsi: Lakehouses.
George Fraser: There'll be a SQL lakehouse in there?
Ali Ghodsi: With Python and SQL lakehouse in Alpha Centauri.
George Fraser: Will starships have lakehouses? That is an interesting question.
Bob Muglia: There'll be a lot of data in starships, I can tell you that for sure. I think that SQL will continue to evolve, and it's interesting to see with tools like dbt how people have been able to use it in new and different ways. One of the challenges with SQL is composition though. dbt helps with that, but there are some fundamental issues that really make it difficult to use in broader senses, which is why we need to have additional languages like Python to work with data.
Ali Ghodsi: Yeah. Modularizing your code composition as you said, testing your code, getting it, not having these massive CTEs (common table expressions) that are just many, many pages. I think Python helps you modularize it and do software engineering. But SQL then is much more democratized and easier to pick up, so I think there's a place for both. DataFrame API is the common pattern that you can see across Python and SQL. It's used by most machine learning frameworks like R and so on.
George Fraser: I used to use R back in my scientist days. R is another funny one. It was not really designed by experts in programming languages. Somehow it got this momentum and to this day, a lot of the best, most well-curated, well-implemented statistics algorithms are in R, not Python. It's hard even to scale R to one machine, forget multiple machines.
Bob Muglia: I'm a big fan of a new one called Julia and what Julia can do. It's a modern language that really can make it easier for people to write these programs.
There's evolution happening in the language space, and there's evolution happening in SQL. It's interesting to see how languages can make programmers both more productive and be much more efficient in the execution of the system.
Remember, these languages are old we're dealing with right now. SQL's 40 years old, and Python's how old?
Ali Ghodsi: Python was a language that was discovered.
George Fraser: Thinking back to where we started, the emergence of the modern data stack, I often think that the real sea-change event was actually a change in cost. There was a group of vendors, data warehouses, data storage and data processing frameworks that were dramatically cheaper than the previous generation.
They almost broke ranks with the previous generation of vendors and dropped the price by 90 percent. That unlocked ELT, and it unlocked all these additional workloads. The cost of the fundamental components of these systems has continued to decline over the last 10 years. I wonder if there will be another kind of breaking ranks where we see the cost drop 10 times again.
Ali Ghodsi: I think hardware trends are important to pay attention to. As you said, networks got exponentially cheaper, so it enabled data lakes to become super cheap and easy to write to. That enabled separation of compute and storage. Before that, you couldn't separate compute and storage because writing to the data lake would overwhelm the network. So you would have to think about where the data is, where the computer is, you'd have to co-locate them.
That's why data warehouses back in the day were in one appliance because you can't move it over the network. You'll crash the whole network. Now it has become cheap. I think these new Arm chips that are coming now are really, really interesting. They seem to be able to bend the curve.
Bob Muglia: I don't know if we'll see an order of magnitude shift like we saw with the cloud coming in because there were really three things that came together with that. It was the blob storage that was highly durable and very cost effective, networking as you described, fast networks, and compute on-demand. What we're seeing is the costs go down and that will continue to happen. The barrier now may not be cost, it's complexity.
George Fraser: You think that what we need is not a 10x reduction in cost at this point, it's cheap enough. We need a 10x improvement in usability.
Bob Muglia: I think that what we'll see is continued improvement in algorithms, continued improvement in simplicity, simplified deployment patterns, and clarity across the industry as to how this works. All of that happens with the modern data stack. The modern data stack is the thing that's bringing everything together.
The biggest challenge I see right now and the complexity people have is around governance of these systems, and that's a mess. There are governance products that have been around for a while that were built in the pre-modern data stack era. Then there are pieces and parts that are being put together by different vendors in the modern data stack space. I think we'll see that converge and become much more holistic and much better clarified and easier for customers to actually construct and use.
Governance needs a revolution equivalent to the cloud and the modern data stack.
George Fraser: What do you think a good governance solution looks like? Is it a product that connects to all the systems and gives you that bird's eye view? Sort of a Datadog for governance?
Bob Muglia: Well that's certainly part of it. One of the biggest parts is really enforcement and having clarity of how you do tokenization, how you establish rights access and having that consistent across the organization. It's really challenging to do rights access management across an entire data estate.
George Fraser: There's another aspect of governance, which is a little bit less obvious, change management. We had a customer here yesterday, Jesse Pedersen from Autodesk, talk about one of the challenges. When you start to use all this data and you start to do great things with it, you build products and applications that people all over the company are using.
Now, whatever is the original source of that data, whoever's managing that needs to understand, they need to think about change management. If they change that custom field, they might break God knows what somewhere else. This is something engineering teams have been dealing with for a long time. But Salesforce admins, Workday admins and analytics teams, it's a little bit new for them. We've seen this at Fivetran, and at Autodesk, they've seen a ton of this as well.
[Data governance] is partly a cultural shift and also a tool shift where you have to make people aware of these dependencies that have been introduced by all this cool stuff we're doing with data. It's not all roses.
Ali Ghodsi: Well, I think you need testing for this. Just like for software engineering, we have tests that run and tell us, "Okay, this is ready to ship." You'll need to have the testing infrastructure for your data and for your data pipelines.
Bob Muglia: When we have systems that track lineage and the history of data, at least there'll be an understanding of the potential impacts downstream. Today, very few customers have access to that information, so they're flying blind. That's one of the challenges that has to be addressed by the industry. Lineage is hard.
George Fraser: You have to trace lineage all the way back to the application.
Bob Muglia: Right. To the operational system and then through all of the transformations. This becomes much more important as we start to see predictive data apps emerge. That's really what I think people want to build today.
They want to take all of the data they have in their data lake, data warehouse and be able to leverage that not just to teach their people and get insights for people, but to begin to build applications that allow organizations to be data smart.
And really, a data app is something that autonomously takes action based on data. More and more of those actions have to then be fed into the applications that people use, like Salesforce. So the whole thing is a circle really. The challenge is going to be that these things become more business critical, so the management and governance of it just becomes all the more important.
George Fraser: We have a project along those lines right now, predicting our revenue each month. We have a very simple system that just does linear extrapolation that predicts our revenue and it becomes accurate about halfway through the month. There’s this process that happens every month where everyone nervously watches from about day 7 to day 14 as the estimate dials in. Is this going to be a good month? And in principle, it's possible to do much better than that. We should be able to, with an hour of data at the beginning of the month, have a pretty confident estimate of what it will be at the end, but it's more complicated than a SQL query.
Bob Muglia: And then to use that data to predictively take actions and have your organization take actions. That's the future of these data apps.
Ali Ghodsi: Yeah. All SaaS applications will have this kind of intelligence. They don't have that intelligence today. In some sense, all software today is, to some extent, stupid. Your HR software should tell you who's leaving, who should get a promotion. Your CRM software should tell you who you should upsell, who is going to churn, so on and so forth. Your security software should know all the attacks that are happening in zero time. But we're far away from that. That's going to happen in the next 10 years, and the modern data stack is going to be the foundation of all of that.
Bob Muglia: That's why the idea of putting it all together in one system, supported by the modern data stack, is the only way to solve these problems.
Ali Ghodsi: You mentioned governance earlier and making sure that you can actually get security here into your systems. Well security is really hard to do if it's a security system that's integrating with different apps, so I do think there's going to be data platforms that emerge where the security is built-in. They can take all the different data sources, and they have AI built-in so that you can start actually doing this. Probably, that's the simplification that's going to happen in this space.
George Fraser: I agree. I think this is the history of all software sectors. It's unbundling and rebundling, and we're probably at the end of a chapter of unbundling and rebundling is going to be happening over the next 10 years.
Ali Ghodsi: You just bundled.
George Fraser: We just did one small bundling, and it may not be the last one.
Ali Ghodsi: Any lessons from that?
George Fraser: Integrating an acquisition is easier when it's so important that everyone in the company is focused on it. If it becomes hard enough, it starts to actually become a little bit easier.
Bob Muglia: In this case for Fivetran, bringing HVR in allows us to really take on enterprise scenarios that were much more difficult for us to handle. Support for really high-volume data connectors allows us to really solve the toughest problems that customers have. If we do that correctly and can do that for our customers, that hopefully will allow us to solve all of the other problems they have.
Ali Ghodsi: We're very excited about it, so you can get your CDC (change data capture) from the HVR side, and also get all the SaaS applications into our lakehouse.
George Fraser: It's kind of the old world and the new world, and you need both in your central data repository. The old world's not going away. It needs to be brought along.
Bob Muglia: Those Oracle databases are pretty important.
George Fraser: They're not going anywhere, that's right. The real question is, when we're flying that starship to Alpha Centauri, we'll have an Oracle database on it. Larry Ellison would say yes.
Ali Ghodsi: Oracle MySQL database?