Location: San Francisco, California
Company size: 400
Data stack: dbt Cloud, Fivetran Enterprise, Google Cloud, BigQuery, Looker
By implementing Fivetran and dbt Cloud, Vida Health transformed their data infrastructure, enabling wider access to data ingestion and transformation while reducing the need for engineering involvement in pipeline creation and management.
- 148 active connectors ingest data from various databases and SaaS data sources
- 1 day to generate a KPI for executives, down from 3 months previously
- 80% of work to create new data products can be self-served by analytics team, up from 20%
Data-driven, digital healthcare
Vida Health is a digital health company that provides personalized virtual care, via employer-sponsored plans, to individuals with chronic conditions such as diabetes, obesity and depression.
To tailor support and resources to individual needs, Vida Health collects data on the customer's medical history, past insurance claims, lab test results and log data from health-tech devices such as fitness trackers and digital scales.
“With data, you get a really good (health) profile of the member before they even sign up,” said Trenton Huey, Senior Director of Data at Vida Health. “It helps us personalize their onboarding and identify the right programs for them”.
The data enables Vida Health to monitor user progress, adjust treatment plans as needed and give users visibility into results via the Vida Health mobile app.
“The data is not just used for reporting and analytics, but to directly guide the end user experience in our application,” said Trenton.
Moving away from an inefficient custom pipeline
Before moving to Fivetran and dbt Cloud, the Vida Health team relied on a custom-built solution, using Python scripts and cron jobs to load and transform data in BigQuery. This approach presented several problems:
- The solution was not scalable: The homegrown pipeline struggled and often failed when data volume spiked.
- The pipeline was opaque and fragile: Components were poorly documented and understood by just a few people on the data team. When issues arose it sometimes took weeks to find and fix the root cause.
These issues sometimes resulted in reporting downtime of 2-3 days. Data was not reliable accessible to the teams that needed it to best serve their customers.
“Non-engineering teams were struggling to get anything done,” shared Trenton. “They’d end up building their own individual pipelines locally. But we weren’t able to reuse that work or feed into any other systems”.
Starting afresh: The needs and targets for a new data stack
Increasing collaboration in a newly unified data team
Vida Health had recently consolidated its data engineering, data science and data analytics functions into one team —seeking to break down silos and improve collaboration across these roles.
“It was more than just the tech components. We were asking ourselves: 'how can all of these teams work better together?'” said Trenton.
With accessible user interfaces in Fivetran and dbt Cloud, Vida Health's data analysts could take greater ownership of the data loading and transformation process and move their work forward more autonomously. They could consult with data engineers for advice, instead of leaning on them to create and maintain every change to the pipeline.
“We have opened new lines of collaboration by using dbt and Fivetran,” said Trenton.”For example, research done by clinical researchers can now be [reflected] more quickly and more granularly in the product.”
And this new collaboration helped the team achieve their goal: they had less than six months to onboard more than ten new clients. This meant they needed a pipeline that could ingest, transform, and deploy eligibility, claims and lab data —at scale. They knew they could tackle this with Fivetran + dbt.
Achieving data movement at scale with Fivetran
The first step in onboarding these new clients was centralizing their data across SaaS and proprietary data sources into one source-of-truth warehouse. They needed a way to do this at scale, with built-in automation and security — especially when handling health data. That’s why they turned to Fivetran.
“Fivetran’s reputation spoke for itself. We knew it could move our large — and growing — volumes of data from all of our sources securely. We didn’t even evaluate other solutions.” But, for Trenton, “The fact that claims data stored as flat files in GCS could be moved was the clincher. Everyone from data engineers to analysts were bought into Fivetran.”
With fully-managed Fivetran connectors, Vida Health moves data from all their SaaS applications — like Salesforce, Stripe, Zendesk and Google Analytics. This SaaS data, covering sales, marketing and product, powers business reporting and helps drive the company’s growth.
They also move proprietary Protected Health Information (PHI) —including laboratory data, past medical claims, pharmacy orders, among others— from multiple sources, such as PostgreSQL, MySQL, and Google Cloud Storage. They can do so in different formats into the same BigQuery instance. With Fivetran’s Enterprise plan, they also get maximum security, ensuring this data is safe and HIPAA compliant. This secure, centralized data serves as a reliable foundation for Vida Health to provide personalized care to customers.
Today, Vida Health uses Fivetran to centralize data from 148 different connectors into one BigQuery.
“Fivetran helped modernize infrastructure and democratize who ingest data. Previously data centralization was a slow process because it required custom python jobs. But now many teams set up pipelines and it’s easier to maintain. When we start new projects with new systems or data sources, we often say ‘it all starts with Fivetran.’”
- Trenton Huey, Senior Director of Data at Vida Health
Accelerating product development with dbt Cloud
Once data was loaded into BigQuery with Fivetran, the team used dbt Cloud to refactor old, unwieldy pipeline code into modular SQL queries that could be more easily read, reused, and maintained.
“We could cut thousands of lines of code," said Trenton. “At my last company, we needed several Spark jobs to transform data. With dbt, we need far fewer resources to do the same work.”
Prior to their migration to dbt Cloud, Vida Health's research and product teams had used separate pipelines to process data, often relying on local notebooks. This kept team knowledge siloed, and sometimes hampered their ability to represent clinical data with appropriate nuance in their product.
With its user-friendly IDE and auto-updating data lineage, dbt Cloud made the data transformation process accessible to Vida Health employees across clinical research, product and data teams. It has enabled the teams to share knowledge, add more precision to their data insights and product — and move faster.
dbt has "enabled things that were just not happening at all before. It's like zero to one on certain things," said Trenton.
“Analysis on the treatment of a certain condition used to take months, even quarters. And after that was ready, we’d still have to wait months until those insights were fed into the product,” said Trenton. “Now we get access to the data faster with Fivetran and build the logic directly on dbt, so we’ve reduced that time from quarters to weeks or, sometimes, days”.
A more resilient and maintainable pipeline with Fivetran and dbt Cloud
Vida Health now uses built-in features from dbt Cloud and Fivetran to prevent issues that could create pipeline downtime and ensure a quick recovery when incidents (rarely) occur.
At the top of the pipeline, “having access to pipeline creation and monitoring (with alerting) all in one place with Fivetran is helpful. We can easily pinpoint issues with data ingestion as they come up,” says Trenton. They also, “appreciate having fully managed connectors with schema migration and automation built in, for all data sources that power our technology.” This ensures data is always available.
The team uses dbt Cloud's data lineage to trace bugs, snapshots to detect data changes over time and built-in CI support to standardize testing and improve data quality. "We have CI/CD testing on all pipelines — that didn't exist for many of our pipelines in the past," said Trenton.
“Now we have much more precision on what we’re creating,” said Trenton. “We can trace issues down. Before, we had issues where it took two weeks to find out what was wrong and now we have an answer within a day.”
Trenton spoke on how “This has also increased our velocity because we spend less time on maintenance.”
With a modern data stack, Vida Health has the foundation needed to provide best-in-class care to their clients — driven by data.
Download this IDC report to learn how Fivetran drives millions in financial impact and enables new business initiatives for enterprises.