AI Glossary · Letter I

Interpretability.

The degree to which a machine learning model’s predictions can be understood and explained by humans in terms of the inputs and reasoning process that produced them. Interpretability is not a luxury feature; for agency applications involving consequential decisions about audiences, content, credit, and spend, it is a practical requirement for client defensibility, bias detection, and regulatory compliance.

Also known as model interpretability, explainable AI, XAI

What it is

A working definition of interpretability.

Interpretability refers to the extent to which a model’s internal mechanisms and prediction logic can be understood at a level that enables meaningful human oversight. A linear regression model is inherently interpretable: the coefficient on each feature directly indicates how a unit change in that feature affects the predicted output, holding other features constant. A decision tree is interpretable in the sense that each prediction can be traced through a series of explicit if-then rules. A deep neural network with many layers and millions of parameters is much less interpretable: there is no simple mapping from any subset of its internal states to a natural language explanation of why it produced a specific prediction.

Post-hoc interpretability methods produce explanations for black-box model predictions without requiring that the model itself be interpretable. SHAP (SHapley Additive exPlanations) computes the contribution of each feature to a specific prediction by considering all possible subsets of features and averaging the marginal contribution of including each feature across those subsets. The resulting SHAP values sum to the model’s prediction and provide a principled attribution of each prediction to the input features. LIME (Local Interpretable Model-agnostic Explanations) fits a simple interpretable model locally around each prediction to approximate the black-box model’s behavior in the neighborhood of that specific input. Attention visualization for transformer models highlights which input tokens received the most attention when producing the output, though the relationship between attention weights and feature importance is more complex than direct attribution.

Global interpretability, understanding how a model behaves across its entire input space, is distinct from local interpretability, which explains individual predictions. Partial dependence plots show the marginal relationship between a single feature and the model’s predictions, averaged across the joint distribution of other features. Feature importance rankings aggregate local interpretability measures across the training set to identify which features the model relies on most overall. Both are useful for auditing model behavior for unexpected patterns, checking whether the model has learned spurious correlations, and communicating model logic to clients and stakeholders.

Why ad agencies care

Why interpretability might matter more in agency work than in most industries.

Agency clients make consequential decisions based on model outputs: which audiences to target, which content to amplify, which leads to prioritize, which customers to invest retention spend in. When those models are black boxes, clients cannot audit whether the model’s reasoning aligns with their business values and legal requirements. A working ad agency that delivers interpretable models, or rigorous post-hoc explanations for black-box model outputs, builds more durable client trust and catches more issues before they become production problems.

Bias detection requires interpretability. A lead scoring model that has learned to under-score leads from specific geographic areas or age brackets may be producing discriminatory outputs that the agency would reject if they could see them. Without interpretability tools, this bias is invisible until it produces a visible outcome that triggers investigation. Running SHAP analysis on lead scoring models before deployment to check whether protected-class proxies, such as zip code, name structure, or device type, are among the top-weighted features is a basic audit practice that catches potential discrimination before it reaches production.

Client-facing model deliverables are more defensible with interpretation artifacts. When an agency delivers a custom propensity model or segmentation scheme to a client, providing the feature importance rankings, partial dependence plots for the most important features, and example SHAP waterfall charts alongside the model performance metrics transforms the deliverable from a black box score file into an interpretable analytical system that the client can interrogate and build on. This documentation practice improves client confidence, facilitates client-side validation, and provides a record of the model’s reasoning that can be referenced if outputs are ever questioned.

Regulatory compliance in some verticals requires model explanation. GDPR’s right to explanation provision requires that automated decisions affecting individuals be explainable upon request. Financial services regulations in various jurisdictions require that credit, insurance, and lending decision models produce explanations that can be provided to individuals who receive adverse decisions. Agencies working with clients in regulated industries need to build interpretability infrastructure into their AI systems from the design stage, not as an afterthought when a compliance question arises.

In practice

What interpretability looks like inside a working ad agency.

An agency delivers a customer lifetime value prediction model to a retail bank client. The model uses a gradient boosted ensemble with 34 input features and achieves strong accuracy on the validation set. During the client acceptance review, the client’s risk and compliance team asks whether the model uses any features that could act as proxies for protected characteristics. The agency runs SHAP analysis on a representative sample of 5,000 predictions. The resulting feature importance report shows that one feature, transaction location cluster, is in the top 5 features by mean absolute SHAP value. Analysis of the transaction location cluster feature reveals that it is highly correlated with census tract income and racial composition data. The agency and client jointly decide to remove the transaction location cluster feature, replace it with individual-level transaction frequency and value features that are less geographically confounded, and retrain. The retrained model achieves accuracy within 2% of the original model and the SHAP analysis on the revised feature set no longer surfaces any potential proxy discrimination concerns. The interpretability audit caught a potential fair lending compliance issue before the model was deployed to production.

Build the model transparency practices that catch bias and build client trust before models reach production through The Creative Cadence Workshop.

The generative AI foundations module covers how to evaluate and explain AI model behavior, including the interpretability methods that make black-box model outputs auditable, defensible, and compliant with the transparency requirements of regulated client industries.