The technology behind accurate redaction across CCTV, dashcam and bodycam footage
Video footage from different sources looks very different to an AI detection system. CCTV cameras mounted at height in a shopping centre produce footage that is nothing like the forward-facing perspective of a dashcam on a motorway, which is again nothing like the chest-mounted, constantly moving feed from a police body-worn camera. Each has different resolution characteristics, different motion profiles, different lighting conditions, and different distributions of where faces and licence plates appear in the frame.
A redaction system that works well on one type of footage and poorly on another is a system that will create compliance gaps the moment an organisation's footage mix doesn't match the system's training data. Understanding how the underlying technology handles this variation is the right question to ask when evaluating any AI redaction platform.
Object detection: the foundation
The first task in any video redaction pipeline is detection - identifying within each frame the objects that need to be redacted. For most compliance use cases, the primary targets are faces, heads, and vehicle licence plates, with additional categories including screens, PDAs, and scene text depending on the context.
Detection is performed by deep learning models - typically convolutional neural networks or more recent transformer-based architectures - that have been trained on large datasets of annotated images. The model learns to recognise the visual patterns associated with faces and plates across a wide range of appearances, poses, sizes, and conditions.
Several factors determine how well a detection model performs across different footage types:
Training data diversity - a model trained predominantly on frontal faces in good lighting will underperform on overhead CCTV showing partial faces at oblique angles, or on body-worn camera footage where motion blur and low resolution are common. Models trained on genuinely diverse security footage generalise better across source types.
Scale handling - faces appear at very different scales across footage types. A face occupying 10% of a dashcam frame looks nothing like a face occupying 0.1% of a wide-angle CCTV frame covering a large public space. Detection models must handle this range reliably.
Partial occlusion - in real-world footage, faces are frequently partially visible - obscured by objects, other people, or the frame boundary. Robust detection must identify partially visible faces, not just frontal full-face presentations.
Secure Redact's detection models exceed 99% recall on identifiable PII across security video, a figure that reflects training and validation across the diversity of real-world security footage rather than controlled benchmark conditions.
Tracking: maintaining consistency across frames
Detection per frame is necessary but not sufficient. A video is not a collection of independent images - it's a time-ordered sequence where the same individuals appear across many frames, often moving, turning, and sometimes briefly occluded before reappearing.
Tracking algorithms link individual detections across frames into persistent identity tracks. A face detected in frame 100 that belongs to the same individual as a face detected in frame 97 must be linked to the same track, so that the blur applied to the identity is consistent throughout the footage rather than flickering in and out as detection confidence varies frame by frame.
This is technically harder than detection. Good tracking requires:
Re-identification across frames where detection confidence is lower - maintaining the track through partial occlusion, motion blur, or the individual briefly looking away from camera
Handling identity switches - preventing tracks from being incorrectly merged when two individuals pass close to each other in the frame
Maintaining tracks over longer distances of footage rather than losing them when a subject moves out of frame and returns
The quality of tracking directly affects the legal defensibility and compliance value of the redacted output. Inconsistent blurring - where an individual is protected in most frames but visible in others due to tracking failures - is worse than consistently imperfect blurring, because it creates identifiable frames within footage that was supposed to be anonymised.
The challenge of each footage type
CCTV is the most variable category. Cameras range from low-resolution fixed installations to high-definition PTZ (pan-tilt-zoom) systems. Wide-angle lenses introduce distortion that changes how faces appear relative to the trained model. Overhead mounting angles mean faces are often seen from above rather than straight on. Large public spaces can contain dozens of individuals simultaneously, requiring the tracking system to maintain multiple concurrent identity tracks without confusion.
Dashcam footage presents different challenges. Forward-facing cameras in moving vehicles capture licence plates at varying distances, angles, and speeds. Motion blur at higher speeds affects detection quality. Wide dynamic range scenes - driving from a tunnel into bright sunlight, for example - create contrast conditions that challenge detection models trained on more consistent lighting.
Body-worn cameras are arguably the most challenging source. They're mounted on a moving person, which means the camera itself introduces significant motion and perspective change frame-by-frame. Officers are often close to subjects, producing large face detections that then rapidly change scale as distances vary. Low-light conditions in evening incidents reduce image quality substantially. Audio in body-worn footage also often contains sensitive PII - names, addresses, dates of birth - that requires audio redaction capabilities alongside video.
Secure Redact handles all three source types as well as drone footage, smartphone video, and other camera formats, with detection models validated across the range of conditions each produces.
Audio redaction: the often-missed layer
Video footage from body-worn cameras and interview recordings regularly contains spoken PII - names, dates of birth, addresses, reference numbers - that is as identifiable as a face but invisible to a system that only processes the visual channel.
Audio redaction requires a different technical pipeline. Named Entity Recognition (NER) models - trained to identify categories of personal information in text - are applied to automatic transcripts generated from the audio. Detected PII spans are then muted or beeped in the audio output, with the redaction points documented in the transcript for reviewer verification.
This combined visual and audio redaction capability is essential for body-worn camera footage in particular, where sensitive dialogue is frequently captured alongside sensitive visual content.
Screen and scene text redaction
Beyond faces and plates, screens visible in footage - monitor displays, PDAs, mobile devices, MDTs in police vehicles - frequently display sensitive information. Automatic screen detection identifies display surfaces and applies redaction to their contents without requiring manual identification of each screen.
Scene text redaction - targeting signs, house numbers, and text appearing in the environment - adds a further layer of PII protection for footage captured in residential or sensitive settings.
FAQs
-
Accuracy varies because different camera types produce very different image characteristics - resolution, angle, lighting, motion profile - and detection models must be trained across this diversity to generalise well. A model trained on high-quality frontal imagery and then deployed on low-resolution overhead CCTV will perform worse than a model trained to handle both. Secure Redact's models are validated across CCTV, body-worn, and dashcam footage specifically to address this variation.
-
Partial face detection is a core capability for real-world footage, where faces are frequently partly obscured by objects, other people, or the frame boundary. Detection models trained on diverse security footage learn to identify partial face presentations. Tracking algorithms maintain identity tracks across brief occlusion events so that a face that disappears behind an object and then reappears is still protected consistently.
-
Yes. The tracking system is designed to maintain multiple concurrent identity tracks within the same scene, which is essential for footage from busy public spaces, retail environments, or any setting where multiple individuals appear simultaneously.
-
Secure Redact's audio redaction pipeline uses Named Entity Recognition to identify spoken PII - names, addresses, dates, and other personal identifiers - within automatically generated transcripts. Identified PII is muted in the audio output, and the redaction points are documented in the transcript for reviewer verification.
-
Yes, primarily due to the movement of both the camera and the vehicles being filmed. Dashcam plate detection must handle plates at varying distances, angles, and speeds, as well as motion blur effects. The detection models handle these conditions through training on diverse dashcam footage, with tracking maintaining consistent plate redaction as vehicles move relative to the camera.
