The Autodesk Data Platform team, which is responsible for delivering critical insights across all platforms in the Autodesk suite to hundreds of users across the business, needed to deliver a more cohesive product and experience to their customers. But they were struggling with extraction, storage and transformations. The team was running into concurrency issues with its storage solution. Too many calls to the database and queries running in parallel slowed the queries, the storage limit was finite and scaling speed for computing was going to require a plan upgrade.
As the team evaluated their options for modernizing their data stack, the build versus buy question was front and center — as it is for almost every data team when looking to modernize. In the past, building data pipelines in-house was a common choice because it often involved writing only a few scripts to extract and ingest data. Today, however, with the ever increasing volume of data and data sources — building, and especially maintaining, data pipelines is no longer as simple.
“Just because you can write custom code to do data integration or create your own data warehouse built on Hadoop doesn't mean you should, especially for business-critical analytics,” explains Mark Kidwell, Chief Data Architect, Data Platforms and Services, at Autodesk.
So how do you decide whether build or buy is right?
When his own team confronted the build versus buy challenge, Kidwell said the decision usually came down to 4 key considerations.
Register for the 2023 Modern Data Stack Conference with the discount code MDSCON-BUILDVBUY to get 25 percent off by March 1, 2023!
1. Is the tool cost-effective with our data cloud warehouse?
Cloud data warehouses are becoming the standard for data transformation and analysis because these technologies allow for cost-effective and reliable data ingestion, processing and storage. As part of modernizing its data stack, Autodesk needed to increase its storage capacity and its ability to scale. Snowflake’s cloud data warehouse provided a cost-efficient option for meeting both of those needs.
And, because cost efficiency was a primary driver for moving to the cloud, it remained a primary driver when evaluating whether to implement other data tools as well.
“It comes down to whether it will go into Snowflake economically,” says Kidwell, speaking of Autodesk’s own build versus buy decision-making. “If it's something where the existing toolchain and the Snowflake ecosystem support it, then we prefer that as our buy option.”
The low and declining costs of cloud computing and storage continue to increase the cost savings of a modern data stack compared with legacy or DIY custom solutions. While the use of tools that integrate and work well together allows organizations to bring together the best-of-breed cloud-native technologies. These tools can save considerable engineering time otherwise spent designing, building and maintaining data pipelines.
2. Does it provide a no-code or low-code route for data ingestion?
As an organization's data footprints grow into an expanding set of SaaS applications, events, databases and files, it's important to have a way to quickly and easily add new sources and connectors. “We prefer going with a no-code or low-code route for data ingestion,” says Kidwell.“We shouldn't be building these kinds of things ourselves. We should be working on solving problems that add business value.”
The other benefits of low or no-code tools are that they allow for data movement to become more self-service. This helps eliminate IT from being a bottleneck and allows business users to move data from one place to another on their own. It also frees up more data engineers' time to work with stakeholders on modeling data or orchestrating more efficient processes.
3. Will our time be better spent solving problems that add business value over building a new tool?
When buying a data tool versus building, some key benefits are a quick time to value, less maintenance and eliminating the need to keep up with data source APIs. Autodesk, for example, was able to reduce pipeline maintenance from five percent of analyst time to less than one percent. In addition, transformation run times have been reduced by 68 percent, which freed up data engineers' time to work on more strategic projects.
“You can build great tools internally…It is just an enormous amount of effort,” says Kidwell. “You have to have a dedicated team to solve simple problems because you're not just building a one-off solution, you are building a platform for doing this repeatedly.”
Given the time investment, Kidwell leans towards buying. “You're expected to be able to plug in new sources, get high velocity and be agile,” he says. “Having the availability of Fivetran and a cloud data warehouse means that we get the benefits of the work that the Fivetran or Snowflake platform has already built.”
This has allowed Autodesk to focus on the next level of work instead. For Kidwell’s team, this has meant that they can focus more on product data and broaden the scope of the types of machine learning they can enable with their new platform. “It’s exciting,” says Kidwell. “This is where the team wants to be more involved and grow.”
4. Does the tool align with the existing toolchain and our cloud data ecosystem?
There are many data integration tools in the market, and their technical approaches and feature sets vary significantly. When evaluating if it makes sense to buy a specific tool, it’s important to consider integration factors such as:
- Support for sources and destinations. Does the tool support your sources and destinations? Does the provider offer a way for customers to suggest new sources and destinations or build their own? Do they routinely add new ones?
- Extensibility. Can the tool integrate with other tools in your data ecosystem like orchestration platforms, logging services and transformation tools?
- Standardized schemas and normalization. Does the tool normalize and standardize schemas automatically? This will save you from initial post-load transformation work so that you can focus on more complex transformations and analytics downstream.
- Zero-touch. Zero-touch, fully managed tools are extremely accessible, with connectors that are standardized, stress-tested and maintenance-free. Configurable tools require you to dedicate engineering time.
- Automation. Integration tools should remove as much manual intervention and effort as possible. Consider whether a tool offers features like automated schema migration, automatic adjustment to API changes and continuous sync scheduling.
- API integration. Can the tool be accessed programmatically? This is key for enterprise deployments that require tasks such as large-scale connector creation and user provisioning.
Making the right decision
With Fivetran and Snowflake, Autodesk has seen massive savings in time and maintenance. With its architecture in place, Autodesk can now work on consolidating its BI dashboards and determining a machine learning infrastructure – the types of activities Kidwell says his team considers “the fun stuff.”
Join us at the 2023 Modern Data Stack Conference to hear more from Autodesk's Mark Kidwell and Condé Nast's Nana Yaw Essuman on how they're tackling next-gen tasks from ML operations to instant movement of massive amounts of data around the world.