In the first post of this series, we laid down two frameworks for thinking about the value of data teams. Now, it’s time to look at individual data roles and how we can measure their success.
Different roles have different outputs and the definition of their success will differ depending on how their work is done and how robust it is. For example, data engineers focus mostly on operational goals and system design, while data analysts focus mostly on informational goals and insight generation, but both should work in ways that are time efficient, inspectable, maintainable, and extendible.
Before going any further, it’s important to note that these roles may not map perfectly to your team. You may have some team members who are half data engineer and half analytics engineer, or you may even have a data team of one who is a jack of all trades. These roles help us group responsibilities together, but by no means is it necessary to follow these divisions exactly for this framework to apply.
Data engineer
- Almost entirely operational
- Centralize high quality source-centric data
The data engineer’s main job is to centralize high quality source data with little to no downtime. The value-add of data engineers is that they eliminate the manual work of extracting and loading data and increase capacity and unlock new potential applications by making data fresher, more reliable and more complete.
You can measure the success of a data engineer with the following metrics:
- Downtime: for a given period of time, what fraction was hindered by downtime of some sort? Downtime should be measured as the time between a stoppage being reported and being resolved, and can be further classified by source or systems affected (e.g. client- or internal-facing).
- Testing coverage: what percentage of lines, functions, or files are covered with tests
- Lead time: number of days between identification of bug or creation of request, and fulfillment of work for that bug or request; this time can then be grouped into ad hoc needs (i.e. bugs and minor requests) and planned needs (i.e. major requests).
- Failure rate: number of bugs identified divided by number of changes made to the code base (e.g. feature merges to the primary branch).
However, there are some things that are core to data engineering that are harder to measure:
- Tech debt: what code is in need of replacement or refactor and why? Tech debt consists of all the changes you should address today, but need to delay due to other priorities. As you push these items further and further, their costs compound and not always linearly. Tech debt is present in data just like in software and is very important to keep in mind as you grow your team.
- Robustness: High testing coverage and low failure rates are signs of robustness, but it’s difficult to have a larger measure of how easy it will be to fix your pipeline if it breaks. In an anecdotal sense it may be easy to spot fragile code, but it’s a much bigger challenge to track robustness through time.
- Adaptability: How easily can you add new logic or remove old logic, and how risky are these changes? Be it poor versioning, ambiguous naming conventions or dependency-heavy design, adaptability is hard to assess for a complex data pipeline.
A successful data engineer takes a systematic approach to creating and maintaining data pipelines. They constantly think about the longevity of the system and are careful about the compromises that they make when things are rushed or broken. This is often aided with the use of off-the-shelf automation tools, which enable engineers to bypass the grunt work of building data pipelines to instead focus on system-wide thinking.
An unsuccessful data engineer creates and updates pipelines in an ad hoc way, without consideration for future refactoring and redesigns. They are constantly patching and acting reactively to business needs and problems alike.
Analytics engineer
- Mostly operational with some informational delivery
- Translate source-centric data to business-centric data that’s ready to be used in application
The analytics engineer’s main job is to transform source-centric data into business-centric data, and in a way that is adaptable to change. The value-add of analytics engineers is that they eliminate the manual work of transforming and cleaning data and that they increase capacity and unlock new potential applications by making business logic more consistent and by making analyses more repeatable.
Like data engineers, analytics engineers can be evaluated on the basis of:
- Failure rate
- Testing coverage
- Lead time
As well as additional metrics, including:
- Adoption rate: of all the resources being created and maintained by this team, how many are being used, by how many unique users and how often by those users; these three variations can be represented individually to compare between resources or they can be summarized into an overall adoption metric (i.e. how many weekly active data consumers are there?).
- Trustworthiness: internal surveys can be useful for assessing trustworthiness and usefulness of data across departments; this can be achieved by sending short surveys to your team on a semi-frequent basis (e.g. quarterly).
Also like data engineers, analytics engineers must contend with difficult considerations such as:
- Tech debt
- Adaptability
And some metrics that data engineers typically are not concerned with:
- Business model alignment: There are many ways to model data and a data model that doesn’t accurately represent the business’s underlying model and operations can cause misinterpretation and misuse of data. Your organization’s business model features many business processes that the data model should reflect; how close or far the data model is from the business model is very challenging to measure.
- Substantive expertise: Data alone is often insufficient for answering business questions. Industry experience and substantive, subject matter expertise is often the immeasurable bridge that separates effective teams from ineffective ones.
A successful analytics engineer creates a data model that well encapsulates the business model and facilitates varied applications of the data. They think about the technical design of their core logic but also about the user experience of analysts that will use their data model.
An unsuccessful analytics engineer creates a data mode out of ad hoc queries, and designs for the request not the business model. They create many tables with little testing and little documentation, for every single variation of a use case that shows up, rather than proactively designing in flexibility and extensibility.
Data analyst
- Almost entirely informational
- Leverage business-centric data to generate insight and aid decision making
- Spends their time primarily in BI Tools: building reports, digging into the data, delivering insights that would not normally be found by business users.
- Their success metrics concern the quality of reporting, the depth of their knowledge and the insights they deliver to a business team.
The data analyst’s main job is to translate business questions and needs into data questions and applications. The value-add of data analysts is that they make decision making more efficient and less risky, be it a tactical decision about setting a week’s schedule or a strategic decision about which geography to target next.
Like analytics engineers, data analysts can be evaluated on the basis of:
- Lead time
- Adoption rate
- Trustworthiness
In addition, data analysts often closely collaborate with specific functional units within your organization. This means they can also be evaluated on the basis of:
- Various departmental metrics: department-specific metrics (e.g. conversion rates, retention rates, efficiency metrics like those for ad spend or scheduling) can also be useful for measuring the success of a team; the data team will only play a role in targeting these metrics, but putting this metric front and center will help the team understand how their efforts are enabling others across the larger organization.
As with analytics engineering, data analysts must also contend with:
- Business model alignment
- Substantive Expertise
Since data analysts specialize in answering questions and providing actionable insights, they must also consider:
- Analytical rigor: Some work is harder to test than others, but it is nonetheless important to be rigorous and thorough in your work and validation. Are your analyses rigorously designed and maintained? How can your stakeholders have confidence in these resources?
A successful data analyst creates thorough and applicable analyses for their team and for their stakeholders. They use both their subject matter expertise and their analytical expertise to bridge the gap between a business need and the available data. They design their analyses to use consistent and meaningful language such that it’s easy to understand reports and their role in the larger landscape of analysis.
An unsuccessful data analyst creates analyses that may answer some questions but ultimately lack depth and direction. They are not rigorous in their use of data (e.g. inconsistent sampling, inconsistent naming conventions) and they design for the language of the request, rather than adapting to the language of the business model. While each analysis may be useful on its own to an extent, the larger body of analysis is a morass of inconsistent and unsystematic naming conventions, approaches and assumptions.
Conclusion
In this post we dug into the value added by each role, helping bridge the gap between the ambitious goals of a data team and the day-to-day responsibilities of various data professionals.
While building and growing a team, it’s important to think about coverage over all points and recognize that each team member can fulfill many but not all needs. A successful team is able to meet all needs by carefully balancing the abilities of its members.
Stay tuned for more posts on how management and product ownership help round out the modern data team.
Montreal Analytics is a Modern Data Stack consulting firm of 45+ people based out of North America. We help our clients on the whole data journey: pipelines, warehousing, modeling, visualization and activation, using technologies like Fivetran, Snowflake, dbt, Sigma, Looker and Census. From strategic advisory to hands-on development and enablement, our agile team can deploy greenfield data platforms, tackle complex migrations and audit & refactor entangled data models.