AI Glossary · Letter U

Underfitting.

A model failure mode in which the model is too simple to capture the genuine patterns in the training data, producing poor performance on both training and validation data. Underfitting is the opposite of overfitting: where an overfit model has memorized training noise, an underfit model has failed to learn even the real signal. Recognizing underfitting in marketing AI models prevents premature deployment of systems that are too simple to deliver genuine predictive value.

Also known as high bias, undertrained model, insufficient complexity

What it is

A working definition of underfitting.

A model underfits when its hypothesis class, the family of functions it can represent, is not rich enough to capture the underlying relationship between input features and the target variable. A linear model fit to data that has a genuinely non-linear relationship will underfit: it can only represent straight-line relationships and cannot fit the curved true function, leaving systematic residual error that no adjustment of the linear model’s parameters can eliminate. Similarly, a shallow decision tree with too few splits will underfit complex data because it cannot represent the fine-grained conditional structure that distinguishes positive from negative examples.

Underfitting produces high training error and high validation error, with the two errors approximately equal to each other. This bias-dominated error pattern distinguishes underfitting from overfitting, which produces low training error but high validation error. In the bias-variance decomposition of model error, underfitting corresponds to high bias (the model’s average prediction systematically deviates from the true function) and low variance (the model makes similar errors across different training samples). Increasing model complexity reduces bias but increases variance; the optimal model complexity minimizes total error, which includes both bias and variance components.

Common causes of underfitting include using a model family that is too constrained for the problem (linear regression on non-linear data), over-regularizing the model (lambda too high, suppressing all learned signal along with noise), training for too few iterations (the model has not converged to even its in-family optimum), using too few features (the input representation does not contain enough predictive information for the model to distinguish classes), and applying feature transforms that destroy signal (such as bucketing continuous variables at granularity too coarse to preserve the relevant variation). Each cause has a different remedy: increase model complexity, reduce regularization, train longer, add features, or reconsider feature preprocessing.

Why ad agencies care

Why diagnosing underfitting prevents deployment of models too simple to deliver real predictive value in marketing applications.

A working ad agency building propensity models, response models, or audience classifiers for clients needs to identify underfitting before deployment, because an underfit model that performs poorly on training data will perform no better in production, regardless of how carefully it is validated. The trap with underfitting is that it is easy to miss when practitioners focus only on whether validation metrics exceed some minimum threshold, rather than examining whether training metrics themselves indicate the model has learned genuine patterns. A model with training AUC of 0.63 that has not been compared against the theoretical ceiling for the task may be deployed without recognizing that its predictions barely exceed chance and carry limited decision-making value.

Comparing model performance against the base rate and a strong baseline establishes whether the model has learned genuine signal or is still underfitting. A propensity model that achieves 57% accuracy on a dataset where the positive rate is 52% is performing barely above chance; this is an underfit model even if its accuracy is technically above 50%. The relevant benchmark is not 50% but the performance of a naive classifier that always predicts the majority class (52% accuracy), or the performance of a simple rule-based baseline (perhaps 61% accuracy from a single behavioral threshold). Comparing model performance to these baselines rather than to a fixed absolute threshold correctly identifies whether the model has meaningfully learned from the data or is still underfitting.

Feature selection errors cause underfitting by depriving the model of the signal it needs to learn the target relationship. A churn prediction model that uses only demographic features (age, geography, account tier) when behavioral features (engagement frequency, recency, product usage) are available will systematically underfit because demographics alone do not carry sufficient signal to discriminate churners from non-churners in most marketing contexts. Adding the behavioral features, even simple ones such as days since last login and number of sessions in the prior 30 days, typically provides the model with the genuine predictive signal it needs to move from underfitting to a well-fit state. Feature completeness, ensuring that the training data contains the features that are actually predictive of the outcome, is a prerequisite for avoiding underfitting independent of model architecture choices.

Over-regularization is a frequent cause of underfitting in marketing models trained on small datasets where practitioners apply strong regularization out of caution. A practitioner who applies heavy L2 regularization to prevent overfitting on a 2,000-example training set may overshoot and suppress all learned signal, producing a model where all coefficients are near zero and predictions cluster tightly around the mean regardless of input. Calibrating regularization strength through cross-validation rather than setting it based on general rules of thumb avoids this failure mode by finding the specific lambda value that balances bias and variance for the actual data volume and task complexity.

In practice

What underfitting looks like inside a working ad agency.

An agency builds a look-alike audience model for a direct-to-consumer apparel client to identify high-propensity prospects from a first-party seed of 8,000 recent customers. The model will score 2.4 million addressable users in the client’s data partner ecosystem. The initial model is a logistic regression with L2 regularization trained on 12 features including demographic signals, broad category interest flags, and device type. Training AUC is 0.61; validation AUC is 0.59. The agency diagnoses this as underfitting: the similarity between training and validation AUC indicates this is a bias problem rather than overfitting, and the absolute level of 0.61 barely exceeds the 0.58 AUC achievable with a naive majority-class baseline on this dataset. Three interventions are applied. First, 18 additional behavioral and purchase category features are added from the data partner’s available feature set, including apparel-adjacent category purchase history, fashion content engagement rate, and brand preference signals. Second, the model is upgraded from logistic regression to a gradient boosted tree ensemble, which can capture non-linear interactions among the new features. Third, the regularization lambda is reduced from the library default to the value selected by 5-fold cross-validation. After these changes, training AUC is 0.84 and validation AUC is 0.81, confirming the model has moved from underfitting to well-fit. The look-alike model is deployed to score the 2.4 million addressable users. A prospecting campaign targeting the top-scored decile achieves 2.1 times the conversion rate of a prior campaign that used platform-native look-alike targeting on the same seed audience, attributable to the richer feature set and non-linear model that captured the patterns the prior linear model had been too simple to represent.

Build the model diagnostics expertise that identifies and corrects underfitting before marketing models are deployed through The Creative Cadence Workshop.

The generative AI foundations module covers the bias-variance tradeoff comprehensively including underfitting diagnosis, feature completeness, model complexity selection, and regularization calibration for marketing prediction tasks.