“Fivetran acts as a managed data landing zone, allowing us to have a single interface for all our data sources. It’s simple and straightforward to use – we are very happy with it.”
— Yoyu Li, Senior Data Architect, Testing for All
- Highly detailed visibility into the performance of the end to end testing process
- Speed and quality of service maintained
- Automation and integration saves time for a small team
- Anonymisation of personal data aids regulatory compliance
- ‘Out-of-the-box’ setup
- Part of a best-of-breed modern data stack
- Pipeline: Fivetran
- Sources: S3, Google Cloud Storage (pushing data from SSH Proxy MySQL, Google Cloud Firestore, Google Cloud SQL for PostgreSQL Private Instance, Woocommerce)
- Destination: Google BigQuery
- Business Intelligence: Google Data Studio
Launched to provide mass Covid-19 testing in the UK, Testing for All is a non-profit organisation that devised a low cost, high volume service for individuals and businesses. Every step is optimised in a process that achieves 5,000 high-quality Covid-19 tests a day for half the price of other services.
The Situation: Meeting tight schedules
Testing for All needed a privacy-centric technology stack to handle personal data, medical test results and biological samples, while simultaneously providing a prompt and simple-to-use service at scale.
A six-step process was devised that starts with registration and a test kit being dispatched in the post and ends with receiving the lab results. Speed and efficiency are pillars of the service, both in the eCommerce part of the process, signing up and ordering the kits, and the science part, the seven labs providing a range of swabbing techniques. The Royal Mail postal service is a key partner throughout.
“We had to manage data at a micro-level to know exactly what was going on with every step,” explained co-founder James Monico. “Moving from manufacture to dispatch, we needed to understand the failure rate and at what point a failure to complete a test happens.”
The Solution: Intuitive and automated
To identify and solve service delivery issues, Testing for All had to be able to pinpoint issues at different steps in the process – each of which relied on its own applications and databases. Order data was in MySQL; fulfilment in WooCommerce, an open-source eCommerce solution. Test sample data was collected in Google Cloud SQL, while Google Cloud Firestore was used to track Royal Mail.
Google BigQuery was chosen as the data warehouse with Data Studio as their data visualisation tool.
“We were already using Google Cloud to run the business, so a lot of the components were a no-brainer,” said Yoyu Li, Senior Data Architect. “They were easy choices with user-friendly interfaces.”
When it came to choosing a data ingestion tool at the base of a very modern data stack, Li was looking for something that was just as intuitive to use. For a small organisation with a handful of people, the solution also needed to be automated, fully managed, compatible with all its data sources and compliant with data regulations.
Fivetran was the perfect fit. Particularly important for Yoyu Li has been its role as a scheduler. Sync frequency features in Fivetran automate the movement of data from connected sources to BigQuery at pre-set intervals, from every five minutes to every 24 hours. After the initial sync, Fivetran incrementally pulls updates of new or changed data from the source. Only new data is uploaded, avoiding duplication.
“First and foremost for us Fivetran is a scheduler” said Yoyu Li. “It triggers the cloud function that extracts data from various data sources at regular time points. I like that it’s automatically authenticated with our Google Cloud service account, which make it very easy and more secure to set up.”
The Results: Ongoing process optimisation
As Testing for All looked to deliver a high-quality service in a tight timeframe, Fivetran’s combination of ‘out-of-box’ connectors to all the non-profit’s data sources and automated integration with Google BigQuery was pivotal.
“We are able to answer the key questions we were looking to answer,” said Yoyu Li. “Fivetran acts as a managed data landing zone, allowing us to maintain a single interface for all the data sources. It’s simple and straightforward to use – we are very happy with it.”
Data from eCommerce, fulfilment, and laboratory partners is all joined up and made available to query and collate in reports. Compatibility with Python and Pandas for pre-processing certain data fields has also been important for regulatory compliance. Personal information, like date-of-birth, is stored as age brackets, 40-59 for example, which protects personal privacy while still providing valuable data for analysis.
The whole testing journey can be visualised and reasons for incompletion analysed. Potential points of failure, like different turnaround times in the laboratories, can be optimised by aligning customer needs to the most appropriate facility. Faster labs are more expensive, for example, and best used when someone has a flight to catch.
“To find issues, you need to drill into very specific cases and the anomalies that occur. Fivetran pulls the data in and we are able to do this kind of root cause analysis in a very timely manner,” said James Monico. “It's been very, very helpful.”