The use of machine learning models to estimate the probability that a specific individual will take a desired action, such as making a purchase, churning from a subscription, or responding to a promotion. Propensity scores enable marketing decisions to be personalized at the individual level: targeting only the highest-propensity prospects, offering retention incentives only to the most at-risk customers, or personalizing content to each person’s predicted interest.
Also known as propensity scoring, purchase propensity, conversion probability modeling
Propensity modeling trains a classification model on historical examples of individuals who did or did not take the target action, using features derived from their prior behavior, demographics, and context. The trained model generates a propensity score, a value between 0 and 1 representing the estimated probability of the target action, for each individual in a current audience. Conversion propensity models score website visitors on their likelihood of making a purchase. Churn propensity models score subscribers on their likelihood of canceling. Email engagement propensity models score recipients on their likelihood of opening or clicking.
The features most predictive of propensity vary by use case but typically include behavioral recency (how recently the person engaged), behavioral frequency (how often they have engaged), and behavioral monetary value for commerce applications, the classic RFM framework. Additional predictive features include session-level signals such as page depth and time on site, product category engagement history, channel preference signals, and demographic or firmographic attributes when available. The relative importance of these features is discovered during model training and varies across clients and industries: for a B2B SaaS client, job title seniority may be the strongest predictor; for a fashion retailer, category browse depth may dominate.
Propensity scores must be distinguished from the raw conversion rate at which people with those scores actually convert. A well-calibrated model produces scores where individuals with a score of 0.30 convert at approximately 30%, individuals with a score of 0.60 convert at approximately 60%, and so on. Calibration failure, where the model’s rank ordering is correct but the absolute scores do not correspond to actual probabilities, is common in models trained with oversampling or class weighting and must be corrected through post-training calibration before scores are used in bid multiplier or budget allocation calculations.
A working ad agency that has built reliable propensity models for client audiences has created a reusable asset that enables better decisions across acquisition targeting, retargeting prioritization, lifecycle communication, and retention investment. The propensity score becomes the signal that routes each individual through the right marketing workflow: high-purchase-propensity visitors get conversion-focused content; high-churn-propensity subscribers get retention outreach; low-engagement-propensity email recipients get reduced contact frequency. Without propensity scores, these routing decisions are made by segment rules that treat everyone in a segment identically, missing the individual-level variation that propensity models capture.
Paid media bidding using purchase propensity scores can reduce acquisition CPA by 20 to 40% through better impression selection. A programmatic bidding system that incorporates a client-specific purchase propensity model as a bid signal concentrates spend on the impressions where the probability of conversion is highest, rather than spreading bids evenly across all audience-eligible impressions. The CPA reduction comes from the model’s ability to identify behavioral signals predictive of conversion that the DSP’s default optimization may not capture, particularly when the client has conversion volume too low for the DSP’s own optimization to reach signal saturation. First-party propensity models built on the client’s own conversion history typically outperform DSP-native audience signals for this reason.
Uplift modeling extends propensity scoring by estimating the incremental effect of marketing intervention. Standard propensity models estimate the probability of action regardless of whether the individual received marketing. A customer with a high purchase propensity may buy without any marketing intervention; spending on them produces little incremental lift. Uplift modeling estimates the causal effect of receiving marketing for each individual, identifying the persuadables (low baseline propensity who respond to intervention) versus the sure things (high baseline propensity who convert regardless) and the do-not-disturbs (individuals who respond negatively to intervention). Uplift models require randomized experiment data to train and are the technically rigorous approach to intervention allocation when incremental lift, rather than conversion prediction, is the goal.
Propensity scores require regular recalibration as behavioral patterns and product offerings change. A purchase propensity model trained on last year’s behavioral data reflects last year’s purchase patterns, which may have shifted due to product catalog changes, seasonality, competitive dynamics, or macroeconomic conditions. Monitoring the calibration of deployed propensity models, by comparing predicted probabilities to actual conversion rates in rolling windows, detects when the model’s scores have drifted from reality. Agencies managing deployed propensity models should build this calibration monitoring into their standard model operations cadence, triggering retraining when calibration error exceeds a defined threshold.
An agency builds a first-party purchase propensity model for a home goods retailer client to improve the efficiency of their email marketing and retargeting programs. The training dataset contains 14 months of browse and purchase data for 340,000 identified customers: 28,000 who made a purchase in a 90-day observation window (8.2% positive class) and 312,000 who did not. The agency engineers 22 behavioral features including days since last visit, total visits in prior 90 days, number of distinct categories browsed, average session depth, prior purchase count, category of most recent browse session, time since last purchase (for repeat purchasers), and email engagement history. A gradient boosted model trained on these features achieves AUC 0.81 on a held-out test set and a Brier score indicating acceptable calibration. The agency deciles the customer base by propensity score and profiles each decile by actual purchase rate in the hold-out period. The top decile (propensity scores above 0.65) shows a 31% actual purchase rate in the hold-out period, versus the population average of 8.2%. The bottom 5 deciles (propensity below 0.15) show purchase rates below 3%. Email frequency optimization using the propensity scores reduces total email volume by 22% by eliminating weekly sends to the bottom 4 deciles and replacing them with monthly check-in emails, while increasing contact frequency for the top 2 deciles. Retargeting bids are set proportional to propensity score, with the top decile receiving 4x the base bid. Combined, these changes produce a 19% improvement in email revenue per send and a 26% reduction in retargeting CPA in the first quarter after implementation.
The generative AI foundations module covers propensity modeling end-to-end: feature engineering from behavioral data, model training and calibration, uplift modeling, and the deployment patterns that connect propensity scores to email, paid media, and lifecycle program decisions.