The right data integration solution can save engineering resources, costs and time, but with more than 150 different tools on the market, finding the right one can be difficult. As a result, there’s an overwhelming number of considerations, from free open-source, self-hosted options to cloud-native commercial SaaS products — and everything in between.
Two popular data integration options are Fivetran and Airbyte. Both are strong products with plenty of capabilities. Choosing one over the other can come down to a few key differences. Let’s take a look at both to help you make an informed decision.
Airbyte vs. Fivetran: At a glance
Airbyte is an open-source data integration engine for building ELT data pipelines that sync data from applications, APIs, and databases to analytical data destinations like data warehouses and data lakes. Airbyte was founded in January 2020 and provides three offerings:
- Open Source: A self-hosted option designed for data-mature organizations with strong engineering capabilities.
- Cloud: A cloud-hosted solution designed for organizations seeking managed services with some extensibility.
- Enterprise: A cloud-hosted solution with more white-glove support and “bring your own cloud” options.
Fivetran is an automated data movement platform that offers fully managed data connectors to sync data from SaaS applications, APIs, databases and other structured data sources into target analytical destinations like data warehouses and data lakes. Fivetran was founded in 2012 and has a variety of offerings, all of which are fully-managed, automated and serve anyone from small startups to large, data-mature enterprises.
As of early January 2023:
Data sources and destination connectors
The ability to connect to data sources and destinations is one of the first and most important considerations for selecting a data integration tool. But there’s more to a connector than meets the eye.
Though both platforms offer hundreds of connectors, “equivalent” connectors are not always truly equivalent. Because Airbyte is an open source platform, many of its connectors are also open source and constrained by the availability and capabilities of community contributors. Whereas all of Fivetran’s connectors are fully supported, developed in-house by 300+ engineers and managed, many of Airbyte’s connectors are built, maintained and supported by the open source community.
Airbyte offers support for most popular data sources and destinations (see a complete list of connectors). Airbyte uses a grading system to help users know what to expect from a connector. According to its documentation, the connectors are divided into three grades: Generally Available (GA), Beta and Alpha. The GA connectors are well-tested and robust, while the Beta and Alpha connectors refer to those in different stages of development.
As of early January 2023, 15 percent of Airbyte’s source connectors are listed as Generally Available, with six percent of destinations GA. Additionally, users can develop their connectors to connect to not already supported sources using the Airbyte Connector Development Kit (CDK). They offer a free connector program, meaning all Alpha and Beta connector pipelines are completely free to use.
Fivetran offers connectors for the most popular data sources and destinations (see a complete list of connectors) and an option for customers to implement their own connectors using Fivetran cloud functions. This enables customers to write scripts to fetch data with support for multiple programming languages (Go, Java, Node.js, Python, C# or F#).
In early February 2023, Fivetran announced a new offering that brings more SaaS connectors to market at an accelerated pace, along with a program to deliver them to enterprises by request. Leveraging the latest in generative AI and natural language processing technologies, Fivetran Lite connectors can be built in as little as 30 days, and Fivetran expects to introduce hundreds of new connectors in the next year, bringing the total number of fully-managed connectors to more than 500.
With the acquisition of HVR, Fivetran customers gain access to a broader range of high-performance data replication solutions, including log-based change data capture connectors for companies with high volume on-premises data processing databases such as Oracle, SQL Server, SAP HANA and DB2. Beyond log-based CDC, Fivetran provides necessary capture capabilities for SAP ERP, both ECC and S/4HANA.
In summary, though both platforms support hundreds of data sources and destinations, there’s a marked difference in completeness and support between the connectors on each platform. For data engineers capable of software development, Airbyte offers deep extensibility given its open source nature. If you want to change the behavior of a connector, you can modify the source code yourself. In cases where support and testing are needed, customers and users can submit a GitHub issue and await response from Airbyte developers. Fivetran approaches extensibility with its REST API, allowing for plenty of customization and configuration, but not modifications to the platform or connectors.
For data engineers seeking robust and complete connectors that are fully supported, uniformly maintained and managed 24/7, Fivetran offers a more commercial experience with traditional support models and service level agreements. Fivetran’s community is more support- and solution-focused. Customers can expect connectors to be developed in a standardized, non-crowdsourced fashion that follows consistent practices. Furthermore, Fivetran customers don’t have to make decisions or concern themselves with resource provisioning, hosting and patching.
Because of the different development models for each platform, we recommend doing a connector-by-connector comparison for the most important sources and tables to ensure the connector(s) you need most will do the work you need. It’s also worth digging through Fivetran’s extensive entity relationship diagrams, or ERDs, (see this Sailthru example) to understand exactly what tables and columns you can use. In many cases, Fivetran’s connectors can extract from more tables and offer security features (e.g. column blocking and hashing) that Airbyte’s do not. Or, Airbyte may have open source dependencies with known limitations, like its use of Debezium for CDC, which has non-trivial limitations.
Also consider that data sources may change their APIs or extraction methods with new product releases. Fivetran adjusts its connectors when schemas and APIs change. With Airbyte, you may have to crack open the source code if your connector does not work for whatever reason, or ask the community for help.
Setup and scalability
Ease of setup and scalability have direct implications on the speed and availability of data within your organization. The faster you’re set up, the faster you’re getting the data insights you need.
Airbyte has two experiences for getting started: self-hosted and cloud. For the self-hosted experience, you would start by:
- Installing Docker
- Spinning up a Docker container to run the Airbyte instance
- Navigating to the Airbyte access UI through your host server
- Mapping source and destination (similar to Fivetran)
- Confirming the connections
- Syncing data
The self-hosted offering requires more than infrastructure — it requires domain expertise in DevOps, network administration and application security. Going this route also means a greater likelihood of managing, compiling and deploying source code, whether it’s from the community or internal resources who are extending the baseline functionality, like custom connectors. This is an upside for highly-technical organizations. But for organizations looking to save money with a free open-source product, there could be greater setup and scalability costs than using Airbyte Cloud.
Airbyte’s Cloud setup provides a web-based, cloud-hosted user experience that is clean, simple to use, and can be completed within five minutes:
- Connect to your source
- Connect to your destination
- Select your source data
- Sync data to your destination
Fivetran is entirely web-based and cloud-hosted with a clean user interface. The initial setup can also be completed within five minutes:
- Connect to your destination
- Choose your data source(s)
- Select your source data
- Sync data to your destination
In summary, both Fivetran and Airbyte cloud provide easy-to-use, intuitive interfaces for setting up data integration. If you want a platform that is zero maintenance, Fivetran can deliver that functionality. Otherwise, if you’re comfortable with some level of manual intervention — such as when your columns or data types change in your schemas — then Airbyte may be sufficient for what you need. Lastly, for Airbyte open source, due to the requirement of hosting Airbyte on your server(s), you’ll need to manually upgrade Airbyte as new versions come out, secure your server(s), consider sizing and high availability and implement access control in a multi-user environment.
As organizations become more sophisticated with data and use it to make business decisions, the question of reliability will arise. The issues of support, uptime and service level agreements matter less for customers early in the data integration process. However, the more an organization comes to depend on its data, the more important its data integration pipelines become.
Whether self-hosting Airbyte Open Source or using cloud-hosted products from Airbyte or Fivetran, it’s never too early to think about what may be mission critical for your business months or even years into the future.
Airbyte Open Source is a self-hosted product, so its availability will depend on the sophistication and capabilities of your internal technical teams. A data integration platform is really a set of interconnected software and processes that move data from a data source (which may not be in your control) into a data destination (which likely is in your control). As such, the ability to support mission-critical data work comes down to a few factors in a self-hosted option:
- Ability to deploy and maintain a complex data integration product
- Ability to monitor, identify, troubleshoot issues and remediate (e.g. patches)
- Keeping a watchful eye on software and platform updates for various data sources and destinations
- Infrastructure redundancy (potentially including disaster recovery options)
- Proper configuration for self-healing or self-rebooting services and infrastructure
- Potential 24/7 monitoring and incident response capabilities
Airbyte Cloud offers a 99 percent SLA for uptime only on their limited number of generally available (GA) (currently 41 out of their total 269 source connectors) and Beta connectors (23 of 269). There is no SLA for Alpha connectors (the other 205 connectors). More specifics about Airbyte’s support and SLA can be found in its product release stages documentation.
It’s also worth noting that Airbyte “strongly discourages using Alpha releases for production use cases,” which rules out 49 connectors (some of which are for popular products, like OracleDB and Microsoft SQL Server).
Fivetran is a cloud-hosted, enterprise-ready platform built to support mission-critical systems and flows for every type of customer. The level of support provided depends on the plan selected and all SLA details are transparently provided on its website. All core services (web applications, APIs and replication servers) have a 99.9 percent uptime guarantee with a one percent service credit percentage. Data delivery services also have a 99.9 percent uptime guarantee with a 0.25 percent service credit percentage. All platform-provided (non-custom) connectors are fully supported and maintained by Fivetran.
In summary, Airbyte offers many great features at an attractive financial cost, but mission-critical work is only supported on 15 percent of its data source connectors with a 99 percent SLA on the cloud-hosted platform. Support is either available via the open source community (competing with thousands of unresolved issues and requests) or as part of a paid package. Fivetran provides support for every connector around the clock with support prioritized by the purchased service plan.
Security and privacy
Data integration tools are powerful and can ingest data from a wide variety of sources, including those with sensitive information (PII, classified, proprietary or competitive). Data access is an important necessity for modern companies, but it must be done responsibly so as to protect one of a business’s most important assets: its data.
According to an IBM study, the average data breach cost in 2022 has risen to $4.35 million. Data security must be a critical priority in your search for a data integration solution.
Airbyte offers thorough technical documentation for securing its Open Source and Cloud products. They provide a secure environment for customers following industry standard practices. Airbyte completed the SOC2 Type 2 data compliance certification and plans to undergo an independent review annually. Most Airbyte connectors require keys, secrets or passwords. Airbyte Cloud encrypts all data using HTTPS. Encryption features vary between platforms.
Fivetran adheres to industry-leading security standards and protocols following an extensive list of compliance and regulatory assurances, including SOC 2 Type 2, PCI-DSS, ISO27001, HIPAA and GDPR. Fivetran uses two-factor authentication for all user accounts, with strong password controls. In addition, Fivetran allows for Single Sign-On with SAML 2.0 and a list of industry-standard identity providers. Role-based access controls are available to manage user access to connectors and more. Data is encrypted at rest and in transit.
Fivetran offers advanced security features like column blocking and column hashing, which means that specific columns can be blocked or hashed so as not to be stored insecurely or inappropriately in a data destination. These and other features have implications for GDPR and other PII-related compliance requirements.
In summary, Fivetran offers more off-the-shelf security options, configurations and certificates than Airbyte. However, users who opt to self-host Airbyte can implement advanced security features on top of Airbyte.
Depending on the connectors and your company’s specific use case, you might benefit from advanced features in your data integration solution. In this section, we compare Fivetran and Airbyte’s advanced features using the Salesforce data connector as a reference.
Summary: Based on this Salesforce connector comparison, Fivetran has more advanced data integration features. In general, Fivetran’s connectors often have access to more tables and include built-in data security features well beyond what you’ll find in just the Salesforce connector.
Support and documentation
Ongoing vendor support and access to extensive and consistent documentation are essential areas to consider when choosing a data integration solution. These help teams set up faster and resolve issues quickly.
All Fivetran customers have 24/7 access to a team of technical specialists. They work with customers to troubleshoot technical issues as quickly as possible via emails, calls or through its live chat support feature. Fivetran also has extensive documentation to help users answer their questions.
Airbyte customers have access to in-app chat support and a Slack and Discourse community with access to Airbyte team members and other contributors. Enterprise customers have access to dedicated technical support, custom SLAs, training materials, and more.
In summary, Fivetran and Airbyte offer extensive support and documentation. They both also have community forums, but Fivetran doesn’t provide Slack or Discourse communities.
Pricing and costs
Airbyte and Fivetran pricing is fairly different, so it’s important to pay attention to how each pricing model works and anticipate how these models would work for you and your organization.
Airbyte Open Source is free to download, but requires spending money on infrastructure. For some organizations that have the sophistication and existing infrastructure, this could be an excellent option. These organizations will need to consider the volume of data and numbers of sources and destinations for a true cost comparison.
Airbyte Cloud and Enterprise plans are priced with “credits” based on the volume of rows or GBs of data (see pricing calculator). Airbyte Cloud credits start at $2.50 apiece and are also sold in bulk at a lower cost. Customers are not charged for data normalization, canceled jobs or writes to databases or destinations.
While we’re on the topic of pricing and costs, it’s worth noting that Airbyte is not entirely free open-source software. According to its License FAQ page, connectors are licensed under a standard MIT license and the Core product is licensed under an ELv2 license, which notably restricts Airbyte customers from providing Airbyte directly to others as a managed service. The premium Cloud and Enterprise features are closed source, but you can stay apprised on its public roadmap.
Fivetran uses a consumption-based pricing model based on the number of rows you sync. Each month’s consumption is calculated based on monthly active rows (MAR) across all connectors and destinations in your account (MAR is the number of distinct primary keys synced from the source system to your destination in a given calendar month). Unit costs decrease as usage increases. Transformations and platform connectors are free of charge, and each connector includes a 14-day trial, even after your free trial period has expired. You can use Fivetran’s pricing calculator to get a sense of cost scaling.
There are five pricing options suited for different use cases, including a Free Plan that launched on February 1, 2023 for smaller-sized companies that can use Fivetran up to 500,000 MAR at no cost. With 1 million monthly active rows on the Enterprise Plan, you get an estimated cost of $1,000 per month.
Both Airbyte and Fivetran offer 14-day trials for their paid products and are fairly transparent in their pricing. We recommend signing up for the free trial periods and log your usage to estimate your costs before committing to a data integration solution. Airbyte’s free connector program makes all Alpha- and Beta-level connector pipelines completely free to use.
The choice between Airbyte Open Source and Fivetran ultimately comes down to a “buy vs. build” situation. In the build situation, there are plenty of upsides:
- No vendor lock-in or walled gardens
- Large open-source community
- Flexibility in implementation and extensibility
There are some non-trivial downsides as well:
- Costs of hosting, scaling, deploying and maintaining
- Need for highly skilled staff across a variety of technical domains
When it comes to the Airbyte Cloud and Fivetran, however, there are several key considerations. The costs of each platform are largely driven by data volume, but also include post-implementation work, like whether you need to manually resolve schema changes. You may have security requirements that include protecting sensitive data, which requires features like column blocking and hashing. You may require more formalized support with faster turnaround, which may also factor into your costs. You may need deep access to tables and columns. And you may need a highly-automated platform that requires minimal setup, has self-healing, and holistic monitoring capabilities.
In the end, you must choose a solution that best meets your organization’s needs and your use case. This article highlights a number of considerations to take into account.
If your organization is growing rapidly and leans more towards “set and forget” than customization and control, you’ll likely lean more toward a fully-managed platform. On the other hand, if your organization uses commonly-used data sources and has a tech-savvy staff to build and maintain custom connectors, then you might consider the option with open source and a more developer-focused community.