Video Analytics at Scale: Challenges and Best Practices

Data integration enables machine learning. Learn how you can use video analytics to automatically detect patterns in video.
June 21, 2021

Data science rests on a solid foundation of data integration. Practical uses of data, from business intelligence to machine learning, depend on the ability to extract, load, and transform data from sources to a central destination.

As your organization continues to develop its data science capabilities, you will find more and more opportunities to productionize your data into automated, data-driven systems. This piece by Agile SEO is a high-level overview of video analytics, the automation of image recognition from video. Feel free to get in touch with the author with questions!

What is Video Analytics?

In recent years, there’s been a growing academic and industrial interest in video analytics. Significant advances to deep learning technology have enabled the automation of video analytics tasks that used to be exclusively carried out by humans.

Video analytics is an application of machine learning that involves automatically identifying spatial and temporal events in video content. A video analytics solution can recognize activity like sudden breakouts of fires, suspicious human movements and noncompliance with traffic signals.

Video analytics systems are typically used to monitor surveilled environments in real time, identifying objects and objects attributes, trajectories and behavioral patterns. Forensic use of video analytics can derive insights from historical data.

Use Cases for Large Scale Video Analytics

The following are common, real life use cases for video analytics.

Security and Surveillance

Organizations use surveillance in order to monitor activities and behavior. The goal is to gain insight into how corporate assets are used and understand the typical behavior of people or other entities. This helps establish a baseline of normal behavior, and detect abnormalities that may indicate malicious behavior or unauthorized access and use.

Once there is a baseline for normal behavior, there are several types of technologies that can help monitor and protect assets, including access control and intrusion detection. Crowd monitoring uses deep learning methods to count and identify people in a large gathering. For access control, organizations usually use face detection technology with CCTV video streams. Face detection technologies apply analysis to detect intruders and distinguish between them and authorized personnel.

Transport Monitoring

Video analytics can help improve the efficiency and accuracy of public transport monitoring systems, such as those implemented for trains, taxis, and buses. The insights from video analytics systems can provide cities and citizens with information about many aspects of traffic, including road conditions, traffic congestion, routes, and peak hours, for example.

Municipalities can leverage video analytics to monitor traffic flow and speed. For example, by using point detection and other tracking techniques, analysis of CCTV streams can help detect incidents, providing information about bad road conditions and vehicle breakdown. It is also possible to implement pedestrian monitoring systems and learn about motion and pedestrian density, to ensure the safety of pedestrians.


The healthcare industry can greatly benefit from video analysis technology. There are several technologies you can employ for this purpose, including:

  • Health status monitoring—for example, you can capture a video stream from a camera aimed at an infant. You can analyze this stream using video magnification as well as an optical flow algorithm and then detect the respiratory rate of the infant.
  • Telemedicine—enables physicians to use video analytics when conducting virtual appointments with patients via computers, tablets or mobile devices.
  • Surgical video analysis—employs computer vision and cameras to understand activities during surgeries. This process cross-references a library of surgical guides while attempting to predict the next steps in the operation. This provides surgical teams with real-time analysis through audio as well as visual cues, which they can control.

User Generated Content

Organizations are learning to use video content generated by their end users for marketing purposes. By leveraging user generated content (UGC) on social networks, organizations can create viral awareness of products and services and build a memorable brand image. Leveraging video-based UGC raises a few challenges, which can be solved by video analytics technology:

  • Moderating large quantities of video content to detect and block harmful or unsuitable content
  • Automatically converting and optimizing videos to make them suitable for web delivery
  • Adjusting videos to a size and format suitable for delivery and consistent with the brand image

AI-powered video analytics APIs are available which automate all the above, enabling organizations to create a stream of UGC video content which is automatically moderated, treated, and shared in a consistent manner on social networks.

The Technology Behind Video Analytics

To implement video analytics, organizations use a collection of technologies, including video processing, object detection, object recognition, and object tracking. Many of these technologies are based on deep learning algorithms trained on computer vision tasks.

Video Processing

Video processing technologies are responsible for extracting and analyzing information in video format, making it readily available for human users, autonomous systems, and robots. During the processing cycle, the video is read (often frame-by-frame) and then extracted as features . Specifically, each frame, which is an image, is turned into a matrix of numbers representing the color and position of each pixel.

You can create a manual video processing cycle, but this is very time consuming. To ensure efficiency, companies use mathematical functions and open source libraries to automate processes, often using machine learning algorithms. Notable open source resources include OpenCV and TensorFlow.

Object Detection

The main function of object detection is to analyze an image and search for objects within it. Object detection models treat an image or frame as an input, and then apply image classification and object localization techniques to detect the objects within the image. Typically, each detected object is surrounded by a bounding box and gets assigned a class.

Object Recognition

The main function of object recognition technologies is to train machines to accurately recognize objects, such as faces and cars. Object recognition is applied in video and image processing, as well as computer vision, and involves training machine learning models to identify each and all objects in a frame.

Object Tracking

The main function of object tracking is to monitor the behavior of objects over a certain period of time. This process uses object detection to locate and classify objects and then track the objects across a static video file or a live stream. Object tracking is a highly complex process that requires accurate identification of objects across a long duration of time.

3 Video Analytics Challenges and Solutions

Here are three key challenges faced by organizations implementing video analytics, and directions for resolving them.

Data Drift

Data drift, or content drift, occurs when training can’t keep up with changes to the input data, which ultimately results in low accuracy. When AI algorithms process data that is changing more frequently than the training data, there is a gradual shift in the distribution of data, which reduces their predictive power.  

To avoid excessive drift, you should make sure your analytics models are updated with ongoing training. Continuously monitor the system, and when you detect lower accuracy, refresh the algorithm with new training data.

Complexity of Video Processing Tools

Video content differs from text or static images in a number of ways, but the biggest difference is the scale of data involved. Videos contain large volumes of data, as well as various types of data. Capturing, processing and analyzing video therefore requires the use of complex software tools and often dedicated, specialized hardware.

Effective video analysis depends on the quality of your tools—from the cameras used to record the footage to the analytics software that extracts metadata from it. It can be difficult to maintain the right balance of effective tools, given that both hardware and software have limited lifecycles and may require frequent upgrades.

Furthermore, the technology required for processing video content may have a steep learning curve for inexperienced teams.

To address this problem, many organizations use managed video solutions offered as a cloud service. Most cloud providers offer platform as a service (PaaS) solutions for video processing, transcoding, and delivery, and many of them also have advanced AI capabilities. Instead of building their own stack, organizations adopt these solutions, reducing the learning curve and upfront investment needed to pursue video analytics.

Data Storage

Video analytics requires the collection of massive amounts of data, and the amount of data being collected is only growing—some studies show video content processed by analytics solutions grew by 500% between 2015 to 2019. Storing all this data can be challenging, both in terms of resources and management. Another issue to consider is the increasingly stringent requirements for data privacy.

Again, a convenient and readily available solution is cloud services. Cloud providers like Amazon, Azure and Google Cloud provide a range of storage services, including low cost, infinitely scalable object storage, which is suitable for delivering video content to users, and high performance storage options such as managed attached drives, which can be used to locally cache video data for fast processing.


In this article I reviewed the exciting new field of video analytics, its compelling use cases like surveillance, transport monitoring and healthcare, and its foundational technologies, including video processing, object detection, object recognition and tracking.

Finally, I described three key challenges raised by large-scale video analytics and their solutions:

  • Data drift - video content is complex, and live data can quickly “drift” away from the features originally present in the training data. This requires constant re-training and tuning of video analytics algorithms.
  • Complexity of tools - video analytics requires a complex stack of tools, including heavy computing infrastructure with hardware acceleration, software frameworks for video analysis, and management of the machine learning pipeline. Platform as a Service (PaaS) solutions can ease this complexity and provide an end-to-end solution.
  • Data storage - video content is massive, and for large datasets, storage costs can be prohibitive. A solution is to leverage low-cost elastic cloud storage services, moving videos to high performance storage only when analysis is actually performed.

I hope this will be of help as you explore the uses of video analytics in your organization.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.