Your organization uses production and operational systems that sit on top of databases, into which transactional data continuously streams. This data originates from many sources, including sensors embedded in production lines or vehicles, digital footprints from customer activity on websites or applications, and enterprise resource planning platforms (ERPs). The data in your databases can help you understand how well your operations run, how customers navigate your products, and how prospects learn about them.
Production and operational databases record every trackable event your organization experiences. This data is tremendously useful either alone or in combination with data from third-party applications for sales, marketing, project management software, ecommerce, payments and more. It is key to assembling a full, 360-degree view of your organization.
Metrics and key performance indicators (KPIs) you can track using data from production and operational databases tend to fall into the categories of 1) operations, 2) product, and 3) marketing.
Databases connected directly to your operational systems, project management software, human resources platforms, and enterprise resource planning (ERP) tools offer a wealth of insight into how well your organization functions. These KPIs include:
- Project velocity and productivity
- Quality assurance and quality control
- Inventory management
- Employee performance, turnover, and morale
- Financial health
Web-based or web-enabled products constantly stream product usage data to your databases. With product data, you can learn how your customers use your product and how satisfied they are. Product KPIs include:
- Daily and monthly active users (DAU/MAU)
- Traffic and usage
- Feature usage
- Customer journey
- User outcomes
- Time in application
- Net promoter score
With event-tracking data from your marketing website, you can learn how prospects learn about your product and ultimately decide that it meets (or doesn’t meet) their needs. Marketing KPIs include:
- Customer acquisition
- Page views
- Bounce rate
- Time on site, time on page
For a more comprehensive treatment of database analytics, check out our ebook on the subject.
Don’t Use Databases for Analytics
Production and operational databases are optimized to handle a high volume of simple queries and quickly update individual records. They typically follow a row-based, OLTP (online transactional processing) architecture. These simple queries include those typically involved in customer interactions, such as updating a shopping cart or checking out.
By contrast, mining data for insights and performing analytics typically involve a modest volume of complicated queries that connect the same entities across multiple tables and handle computations on many values in the same column. Databases that are optimized for analytical queries, i.e., data warehouses, follow a column-based, OLAP (online analytical processing) architecture, a fundamentally opposite approach to OLTP. An example of a complicated analytics query might be joining marketing data with sales data to determine which marketing collateral seems to influence purchasing decisions.
The basic difference between row- and column-based design is an important reason not to query production and operational databases for analytics. Your analytics queries will be sluggish because you are using the wrong type of machine to perform them. This problem will become more and more severe as the data sets become larger.
The other reason not to query your databases is that analytics queries will compete with production and operational queries for the same computation resources. You will bog your operations down by simultaneously using your databases to analyze and perform transactions on data. Slowed operations make for irate customers and missed opportunities.
In order to efficiently analyze your production and operational data without interfering with your operations, you must move the data from your databases to a data warehouse.
Extracting, Loading and Transforming From Databases
The process of moving data from sources such as databases to destinations such as data warehouses is called data integration. Modern data integration solutions involve extracting data from a source, loading it into a data warehouse, and transforming it within the data warehouse environment. The software used to extract and load data is a data pipeline, which is made up of data connectors.
Building and maintaining a data pipeline requires a significant outlay of time, labor and engineering talent. A well-functioning data pipeline also involves a number of non-trivial design considerations and features, including resilience to failed syncs and duplication, automatic schema updates, handling of data types, and scalability. These all require extensive experience and expertise to successfully execute.