A metric used in object detection and image segmentation that measures how much a predicted region overlaps with the ground truth region, calculated as the area of overlap divided by the area of the union of the two regions. IoU is the standard quality metric for object detection models and is directly relevant for agencies using AI to detect product placement, brand logos, faces, and other visual elements in video and image content.
Also known as IoU, Jaccard index for detection, overlap ratio
In object detection, a model predicts both the class and the location of objects in an image or video frame, with location typically represented as a bounding box defined by its corner coordinates. The ground truth for each object is a human-annotated bounding box that precisely outlines the object’s extent. Intersection over Union measures how accurately the predicted bounding box aligns with the ground truth by computing the ratio of the area of their intersection, the region covered by both boxes, to the area of their union, the total region covered by at least one box. A perfect prediction, where the predicted box exactly matches the ground truth, yields an IoU of 1.0. Non-overlapping boxes yield an IoU of 0. Values in between reflect partial overlap.
IoU is used as a threshold for determining whether a detection is a true positive or false positive. A common threshold is IoU greater than or equal to 0.5, meaning a predicted bounding box must overlap with the ground truth by at least 50% of their combined area to be counted as a correct detection. Stricter thresholds like 0.75 are used when precise localization matters; looser thresholds may be acceptable for coarse presence-or-absence detection tasks. Mean average precision, the standard summary metric for object detection benchmarks, is computed by averaging the area under the precision-recall curve across multiple IoU thresholds, producing a comprehensive evaluation that rewards both accurate classification and accurate localization.
For segmentation tasks, where the model must predict a pixel-level mask for each object rather than just a bounding box, IoU is computed at the pixel level: the numerator is the number of pixels predicted as belonging to the object that actually belong to it, and the denominator is the total number of pixels predicted or actually belonging to the object. This pixel-level IoU, sometimes called mean IoU or mIoU when averaged over classes, is more stringent than bounding box IoU because it requires not just approximate localization but precise delineation of the object boundary.
Computer vision models for brand safety, logo detection, product placement measurement, and content analysis all produce detections that are evaluated using IoU-based metrics. A working ad agency that understands IoU can evaluate vendor computer vision tools precisely, interpret detection quality reports, and set appropriate IoU thresholds for different detection tasks based on how precisely the localization needs to match for the application to function correctly.
Brand logo detection quality depends directly on IoU threshold choice. A brand safety tool that counts a logo as detected whenever the predicted bounding box has any overlap with the logo uses a low IoU threshold that will over-detect logos, flagging content where a logo is partially visible at the edge of frame. A tool that requires strict alignment will under-detect partially visible logos that still represent brand exposure. Agencies using logo detection for brand safety measurement or competitor intelligence should ask vendors what IoU threshold their detection system uses and validate that it matches the detection sensitivity appropriate for the use case.
Product placement measurement in video content uses frame-level IoU. Measuring the screen time, prominence, and clarity of product placements in video requires object detection followed by IoU-based assessment of how much of the frame the product occupies and how closely the detected region matches the actual product boundaries. Prominence scores derived from IoU and bounding box size are used to calculate product visibility metrics for brand partnership valuation. Understanding that these metrics derive from IoU-based detection quality helps agencies assess the reliability of measurement provider estimates and understand where measurement error is likely to be highest.
Model evaluation for visual AI tools requires IoU-aware quality assessment. When an agency evaluates a computer vision tool for content classification, product detection, or visual safety screening, requesting performance metrics that report precision and recall at multiple IoU thresholds provides a more complete picture of detection quality than a single accuracy number. A model with 90% accuracy at IoU 0.3 may have only 70% accuracy at IoU 0.5, indicating that it finds objects in the right general area but localizes them imprecisely. Whether this precision gap matters depends on the application: for presence-or-absence classification, loose localization may be acceptable; for exact measurement of product screen time, precise localization is required.
An agency is evaluating two computer vision tools for a consumer packaged goods client that wants to measure the share of shelf space its products occupy across hundreds of retail store photos uploaded weekly by field merchandising teams. The client needs an accurate pixel-area estimate of product presence relative to competitors on the shelf. Tool A achieves 91% detection accuracy using an IoU 0.5 threshold; Tool B achieves 85% detection accuracy at the same threshold. When both tools are evaluated at IoU 0.75, the stricter localization requirement, Tool A drops to 74% and Tool B drops to 82%. For shelf space measurement, precise product boundary detection matters because the area estimate depends on accurate bounding box coordinates, not just correct object identification. At IoU 0.75, Tool B is more accurate despite its lower performance at the looser threshold, because it localizes product boundaries more precisely even though it misses some products that Tool A would detect with imprecise localization. The agency selects Tool B based on the IoU 0.75 evaluation, which is appropriate for the precise area measurement the client requires.
The generative AI foundations module covers how computer vision models are evaluated, including the detection quality metrics that determine whether a visual AI tool is precise enough for the specific measurement or safety screening task at hand.