A training regularization technique that replaces the hard 0/1 values in classification labels with softened values, preventing a model from becoming overconfident in its predictions and improving generalization to new data.
Also known as soft label, soft target, label relaxation
Label smoothing is a regularization technique applied during model training that replaces hard binary labels—where the correct class gets probability 1.0 and all other classes get 0.0—with softer distributions that assign a small probability to incorrect classes. Instead of training the model to output a confident 99% for the correct class, label smoothing trains it to output something like 95% for the correct class and a small amount distributed across the alternatives. The smoothing parameter controls how much probability is redistributed away from the true label.
The motivation is preventing overconfidence. A model trained with hard labels has an incentive to push its output probabilities toward 1.0 for the correct class and 0.0 for everything else. This creates a model that is extremely confident even when it should be uncertain—a failure mode called overconfidence or poor calibration. Label smoothing acts as a regularizer that discourages the model from assigning extreme probabilities, producing models that are better calibrated and tend to generalize better to new data.
Label smoothing is commonly used in image classification models and natural language processing models. It was notably used in training early large-scale transformer models and remains a standard technique in many production training pipelines. The computational cost is negligible—it requires only a minor modification to the training loss function—making it one of the easiest regularization improvements to implement.
Overconfident AI models create specific problems in agency workflows. A creative performance classifier that assigns 99.9% confidence to its “winner” prediction discourages exploration of alternatives. A sentiment classifier that is maximally confident in borderline cases produces misleading brand safety signals. Label smoothing is one of the techniques that prevents this class of failure, and understanding it helps agencies ask the right questions about model calibration when evaluating AI tools.
Calibration matters for downstream decisions. When a model’s confidence scores are used to make decisions—routing creative to different audiences, prioritizing leads, triggering alerts—those decisions assume that 90% confidence means approximately 90% accuracy. Poorly calibrated models break this assumption. Label smoothing is one of the techniques vendors use to improve calibration, and agencies should ask about calibration alongside accuracy when evaluating predictive AI tools.
Fine-tuned models benefit from label smoothing when training data is small. When an agency fine-tunes a model on proprietary data—for example, training a creative classifier on their own historical performance data—the small dataset size makes overconfidence more likely. Label smoothing is a low-cost regularization technique that tends to help in low-data fine-tuning scenarios, and it is worth including in any agency fine-tuning pipeline.
An agency data science team fine-tunes an image classification model to predict whether ad creative will perform above a client’s historical click-through rate benchmark. They train an initial version with standard hard labels and observe that the model assigns very high confidence (above 95%) to most predictions even on borderline examples that human reviewers would classify as uncertain. When they deploy this model to filter creative recommendations, the overconfidence causes the system to present its picks with misleading certainty, leading the creative team to stop second-guessing recommendations that deserve scrutiny. The team retrains with label smoothing at 0.1 and observes that the model’s confidence scores become more spread across the 0.5–0.9 range, better reflecting the actual difficulty of borderline cases. The creative team adopts a workflow that fast-tracks high-confidence predictions but flags low-confidence ones for human review, producing a system where model confidence has actionable meaning.
The workshop covers training regularization, model calibration, and how to evaluate AI tools on the right technical criteria for agency use cases.