With its data-first approach, Welcome Tech is developing machine learning and security models to better serve the immigrant community. After building and maintaining a Postgres connector internally, Welcome Tech brings on Fivetran to scale its data architecture. With Fivetran, it is able to do more with less, while scaling.
- Pipeline: Fivetran
- Sources: Google Analytics, Postgres RDS, Branch.io, Zendesk
- Destination: Redshift
- Business Intelligence: Looker
Welcome Tech has developed the world’s first platform dedicated to connecting the 250-million-strong global immigrant community with the information, products and services they need to thrive in a new country.
Bringing Together Disparate Databases
In the last few years, Welcome Tech has developed a robust digital platform and launched a banking service for immigrants. CTO Joe Munoz and the engineering leadership at Welcome Tech had an enormous amount of user data to work with and knew that in order to continue to scale the company’s efforts, they would need to leverage a robust data architecture.
The first step was organizing the data and building the warehouse that would store it. Munoz led the charge, building out a high-level data model, developing the schema for the data warehouse, and bringing on Redshift as the destination. At first, Munoz attempted to load the data manually:
Initially, we were using AWS Glue and writing scripts. We needed to be faster. We started looking for other alternatives and that’s what led us to Fivetran. Fivetran allowed us to easily bring all the data into our Redshift schema so we could focus on building a data lake.
Being able to bring the many disparate databases into a single location with the Postgres connector was critical for Welcome Tech, as Munoz explains:
Maintaining the Postgres pipeline was basically a full-time job for a quarter for one of our engineers. We couldn’t continue to dedicate so much time building out all these connectors. With Fivetran, I look at our pipeline for maybe one to two hours a week.
Welcome Tech has an enormous amount of historical data, which it archived every few months into flat files on S3. It took too long to run scripts day and night to complete the historical sync from Postgres to Redshift. With Fivetran, it only took a few hours. Previously, if an issue occurred, Munoz needed to access the flat files to re-sync the data. Now all historical data lives in the lake, which means no one will ever have to go back and process archived data.
Adding Application Data to the Mix
With Postgres working, Welcome Tech added customer support data with the Zendesk connector and website data, including which articles people are looking at, how long they spend on a page, and more, with the GA connector. Unstructured data, such as call center data in CSVs, can be brought into the data lake via the Uploads function.
Welcome Tech’s banking service, PODERcard, launched a few months ago, and the team wanted a way for the app to access the data from Branch.io. Webhooks made it easy to capture the app installation data. “For connectors that don’t exist natively in Fivetran, the webhooks solution offers a great way to access the data,” he explains.
Using Fivetran is significantly cheaper than hiring additional engineers to build and maintain data pipelines. While Welcome Tech is building out its engineering team, the goal is to dedicate the team to advanced, value-adding work around the data lake, recommendations, machine learning and security models.
Leveraging Data to Serve the Community
Welcome Tech is building out its offerings to better serve the immigrant community. Next up? Building fraud and risk models for the financial services industry. Many algorithms are based on assumptions that exclude the immigrant community. The company wants to develop a better model to grant access to things like credit, banking and financial instruments that immigrants may not traditionally have access to. Achieving these big goals will require data:
We need to run the business with a data-first approach. Every great tech company I have been a part of tries to leverage data to make the most informed decisions. We know there are things we need to do to provide our users with a better experience and we’re taking the steps to get there quickly and efficiently.
About Fivetran: Shaped by the real-world needs of data analysts, Fivetran technology is the smartest, fastest way to replicate your applications, databases, events and files into a high-performance cloud warehouse. Fivetran connectors deploy in minutes, require zero maintenance, and automatically adjust to source changes — so your data team can stop worrying about engineering and focus on driving insights.
About Amazon Redshift: Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze all of your data using your existing business intelligence tools.
About Looker: Looker is the business intelligence (BI) and analytics platform part of the Google Cloud data and analytics suite. Transcending traditional BI, Looker powers data experiences that deliver actionable business insights at the point of decision and infuses data into products and workflows to allow organizations to extract value from data at web-scale.