AI Glossary · Letter L

Labeling.

The process of attaching correct output values to raw input examples so a supervised machine learning model can learn the relationship between inputs and outputs. Labeling is the foundational data preparation step for training classifiers, detectors, and other supervised models, and its quality directly determines the quality ceiling of the resulting model.

Also known as data annotation, data tagging, ground truth labeling

What it is

A working definition of labeling.

Labeling transforms raw data into training data by attaching the correct answer to each example. For an image classification model, labeling means assigning each image a category such as “cat” or “dog.” For a sentiment analysis model, it means assigning each piece of text a sentiment value such as “positive,” “negative,” or “neutral.” For an object detection model, it means drawing bounding boxes around objects in images and assigning each box a category. The labeled dataset is then used to train the model: the model learns to map inputs to outputs by adjusting its parameters until its predictions on training examples match the labels as closely as possible.

Labeling quality is the most direct determinant of model quality. A model trained on mislabeled data will learn the wrong relationships and perform poorly regardless of how sophisticated its architecture is. Labeling consistency matters as much as labeling accuracy: if two annotators apply the same label to genuinely different examples, or apply different labels to genuinely similar examples, the resulting inconsistency introduces noise that the model cannot overcome. Clear labeling guidelines, annotator training, and inter-annotator agreement measurement are standard practices for maintaining labeling quality at scale.

Labeling methods range from manual human annotation, where trained annotators label examples one at a time according to a detailed schema, to programmatic labeling, where heuristics and weak supervision rules generate approximate labels at scale, to active learning, where a model selects the most informative unlabeled examples for human annotation to maximize label efficiency. The right labeling strategy depends on the volume of training data required, the cost per label, the required label accuracy, and whether sufficient labeled seed data exists to bootstrap more automated approaches.

Why ad agencies care

Why labeling quality is the highest-leverage input in any applied AI project.

A working ad agency building AI-powered creative scoring, audience classification, or brand safety tools lives or dies on the quality of its labeled training data. Model architecture choices matter less than label quality in most practical settings: a simple model trained on clean, consistent labels will outperform a sophisticated model trained on noisy or inconsistently labeled data. This means that labeling strategy, annotator selection, and quality control processes are genuinely high-leverage decisions, not just operational overhead.

Creative asset labeling enables training of brand safety and creative quality models. An agency that wants to automatically score creative assets for brand safety violations, visual quality, or message clarity needs a labeled dataset of creative examples where human reviewers have assessed each asset against a defined rubric. The labeling schema design matters enormously: labels that are too coarse will produce a model that cannot distinguish the nuanced cases that matter most, while labels that are too fine-grained will produce inconsistent annotations and a model that overfits to annotator idiosyncrasies. The labeling process itself surfaces edge cases that the schema designers did not anticipate, and iterating on the schema as annotation proceeds is standard practice.

Audience signal labeling underlies conversion prediction models. Training a model to predict which site visitors are likely to convert requires labeled examples of past visitors where each visitor is labeled with their eventual conversion outcome. This sounds simple but involves non-trivial decisions: what conversion events count as positive labels, how long to wait before assigning a negative label to a non-converting visitor, and how to handle cases where conversion happens across sessions or devices. Labeling conventions that seem arbitrary have downstream consequences for model behavior, so documenting and consistently applying labeling rules is as important as the annotation work itself.

Active learning reduces labeling cost without sacrificing model quality. For agencies building custom models for clients, annotation budgets are real constraints. Active learning addresses this by using the model’s current uncertainty to select which unlabeled examples are most informative to label next. Rather than labeling a random sample of data, annotators focus their effort on the examples where the model is most uncertain, which are typically the most distinctive and decision-relevant cases. In practice, active learning strategies can achieve the same model performance with 30 to 70 percent fewer labeled examples compared to random sampling, meaningfully reducing annotation cost for custom model development projects.

In practice

What labeling looks like inside a working ad agency.

An agency is building a creative effectiveness scoring model for a retail client to automatically predict which ad concepts are likely to drive high click-through rates before committing production budget. The first challenge is labeling: the agency has three years of historical creative performance data across 2,400 individual ad creatives, with measured click-through rates for each. Rather than using raw click-through rate as the label, the team defines a binary label: creatives that performed in the top quartile for their placement and format are labeled “high performer” and the rest are labeled “standard.” This framing controls for placement and format differences that are outside the creative’s control. A team of three annotators then reviews a sample of ambiguous threshold cases to ensure the labeling rule is being applied consistently, resolving disagreements by majority vote and documenting edge case decisions in a labeling guide. The resulting labeled dataset is used to train a multimodal classifier that takes visual and copy features as input and predicts the binary performance label. Before training, the team audits label quality by measuring inter-annotator agreement on a held-out sample and finds a Cohen’s kappa of 0.71, indicating substantial agreement. The final model achieves 74% accuracy on held-out test data, enabling the agency to pre-screen concepts and focus production investment on predicted high performers.

Understand the data foundations that determine AI model quality through The Creative Cadence Workshop.

The generative AI foundations module covers how training data quality shapes model behavior, including the labeling strategies and quality control practices that separate well-performing production models from unreliable prototypes.