AI Glossary · Letter L

Label.

The correct output assigned to a training example that tells a supervised model what to predict. Label quality sets the ceiling on what any supervised AI system can learn, making the discipline of data labeling the most undervalued form of AI infrastructure in most agencies.

Also known as ground-truth label, annotation, target label

What it is

A working definition of label.

A label is the ground-truth output attached to a training example in supervised machine learning. When a model learns to classify ad creative as high-performing or low-performing, each creative in the training set must carry a label saying which category it actually belongs to. When a model learns to score leads by purchase intent, each historical lead must be labeled with whether it converted. The model learns by comparing its predictions against these labels and adjusting its weights to reduce the prediction error.

Labels can be categorical (this image contains a product), numerical (this customer has a predicted lifetime value of $420), or sequential (these words in this sentence are brand mentions). What they share is that they represent the task definition: a label encodes what “correct” means for the model. If the labels are inconsistent, ambiguous, or misaligned with the actual business question, no amount of model sophistication can compensate. A model trained on bad labels learns to predict bad labels accurately.

The process of creating labels is called annotation or data labeling. Annotation requires clear guidelines defining what each label category means, qualified annotators who can apply those guidelines consistently, and quality control mechanisms that detect and correct disagreements. Inter-annotator agreement—measuring how often two different annotators assign the same label to the same example—is the primary diagnostic for label quality. Low agreement signals ambiguous guidelines, not incompetent annotators.

Why ad agencies care

Why label quality determines the ceiling of every AI model an agency trains or buys.

Every AI model an agency uses—whether for ad delivery optimization, lead scoring, creative performance prediction, or audience segmentation—was trained on labeled data. When an agency uses a platform that optimizes toward conversions, that platform’s model learned what “conversion” looks like from labeled training examples. The quality of those labels determines the quality of the optimization. An agency that understands this evaluates AI platforms not just on algorithmic claims but on how their training data was labeled and what ground-truth signal the labels capture.

Creative performance labels require careful design. Labeling an ad creative as a “high performer” requires deciding what metric to measure, what time window to use, what benchmark to compare against, and whether to control for audience and placement differences. An agency that trains a creative performance model on raw click-through rates is labeling on a metric that rewards clickbait. One that labels on revenue-per-impression is labeling on the metric that actually matters. The label design decision precedes every technical choice and determines what the model optimizes for.

Label consistency across time is fragile. When a model is trained on historical data spanning multiple years, organizational changes often introduce label inconsistency. A “qualified lead” defined by one sales team, deprecated by a second, and redefined by a third produces training data where the label means three different things in three different time periods. A model trained on this data learns to predict a mixture of criteria and produces unreliable scores. Auditing label consistency before training prevents this class of failure.

In practice

What label quality looks like inside a working ad agency.

A performance marketing agency inherits four years of CRM data from a client and begins building a lead scoring model. Before training, the data science lead audits the historical “qualified lead” labels and discovers that three sales managers applied the tag under three different definitions over the period. One required only a form fill with a business email; another required a discovery call booked; the third required explicit budget confirmation. The years with the loosest definition show a 60% higher qualification rate with no corresponding improvement in actual close rates. The agency drops the loosely-labeled period, harmonizes the definition with the client’s current sales process, and re-labels a random sample to verify consistency. The resulting model, trained on fewer but correctly labeled examples, outperforms the full-data model on held-out scoring tests by a margin that exceeds any algorithmic improvement the agency could have achieved on the original data.

Build the foundation to evaluate AI systems on what actually matters through The Creative Cadence Workshop.

The workshop covers how supervised learning works, why label quality determines model quality, and how to diagnose the data labeling decisions embedded in the AI platforms your agency buys.