A classification model outcome in which the model predicts the positive class and the actual label is also positive, meaning the model correctly identified a genuine positive example. True positives are one of four outcomes in the confusion matrix alongside false positives, true negatives, and false negatives, and the relative distribution of these four outcomes determines precision, recall, and F1 score, the metrics that govern how well a model performs on imbalanced marketing prediction problems.
Also known as TP, correct positive prediction, hit
A classification model assigns each input example to one of two classes: the positive class (typically the outcome of interest, such as converted, churned, or clicked) and the negative class (the default outcome, such as not converted or not clicked). A true positive occurs when the model predicts positive and the ground truth label is also positive: the model has correctly identified a genuine positive example. A false positive occurs when the model predicts positive but the label is negative: a false alarm. A true negative occurs when the model predicts negative and the label is negative: a correct rejection. A false negative occurs when the model predicts negative but the label is positive: a missed case.
The four-cell matrix of true positives, false positives, true negatives, and false negatives is the confusion matrix. From the confusion matrix, precision is computed as true positives divided by (true positives plus false positives), measuring what fraction of positive predictions are correct. Recall (also called sensitivity or true positive rate) is computed as true positives divided by (true positives plus false negatives), measuring what fraction of actual positives are detected. F1 score is the harmonic mean of precision and recall. These three metrics capture different aspects of classification quality and respond differently to changing the model’s decision threshold.
The tradeoff between true positive rate and false positive rate as the decision threshold varies is captured by the ROC curve, and the area under the ROC curve (AUC) summarizes classification quality across all thresholds. A model that is good at producing true positives while minimizing false positives has a high AUC. The precision-recall curve captures the same tradeoff from a different angle, focusing specifically on the positive class performance, and is typically more informative than the ROC curve for highly imbalanced problems where positive examples are rare, which is common in marketing classification tasks such as conversion prediction.
A working ad agency building or evaluating classification models for audience scoring, lead qualification, churn prediction, or fraud detection needs to understand the business implications of the tradeoff between true positives and false positives. A model that catches 90% of true churners (high recall, high true positive rate) but also incorrectly flags 40% of loyal customers as churn risks (high false positive rate) may cost more in unnecessary retention spend than the churners it prevents from leaving. A model that flags only genuinely high-risk churners (high precision, low false positive rate) but misses 50% of actual churners (low recall, low true positive rate) fails to prevent the churn it was built to address. The right tradeoff depends on the relative cost of false positives versus false negatives in the specific business context.
Lead scoring models should be evaluated on true positive rate at specific false positive rate thresholds, not global AUC alone. A sales team that can follow up with the top 15% of scored leads needs a model that concentrates true positives in that top tier. A model with high global AUC but poor separation in the top decile, where many of the top-ranked leads are false positives, fails the business use case despite appearing strong in aggregate metrics. Precision at k (what fraction of the top k ranked leads are genuine positives) and recall at k (what fraction of all genuine positives are captured in the top k) are more business-relevant metrics for tiered action decisions than overall AUC.
Adjusting the classification threshold shifts the true positive rate versus false positive rate tradeoff to match the cost structure of the prediction task. The default classification threshold for binary classifiers is 0.5: predict positive if the predicted probability exceeds 50%. This threshold is often suboptimal for imbalanced marketing classification tasks. A churn prediction model with 8% positive rate and a business context where false negatives (missed churners) are much more costly than false positives (unnecessary retention outreach) should operate with a lower threshold, accepting more false positives in exchange for a higher true positive rate. Conversely, a fraud detection model where false positives (declined legitimate transactions) carry high customer experience cost should operate with a higher threshold, accepting more missed fraudulent transactions to minimize false alarms. The optimal threshold is not 0.5 but the threshold that minimizes expected cost given the specific false positive and false negative cost structure.
Comparing true positive rates across segments reveals model fairness issues that aggregate accuracy metrics conceal. A model with 85% overall accuracy may have 90% true positive rate for one demographic segment and 65% for another, meaning it correctly identifies qualified leads or conversion risks at very different rates across groups. This disparity in true positive rates across segments is a fairness concern with practical business implications: the segment with lower true positive rate is systematically underserved by the model, receiving fewer of the relevant interventions or experiencing more missed opportunities. Auditing model performance by segment using true positive rate is a standard fairness evaluation practice that aggregate metrics do not provide.
An agency builds a churn prediction model for a subscription software client to identify at-risk accounts for proactive outreach by the customer success team. The training dataset contains 24,000 subscription accounts with a 14% positive rate (churn within 90 days). The trained gradient boosted model achieves AUC of 0.83 on the test set. The client’s customer success team can proactively contact 600 accounts per month (2.5% of the subscription base of 24,000). The agency evaluates the model’s performance at the decision threshold corresponding to the top 2.5% of predicted churn probability. At this threshold, the true positive rate (recall at the 2.5% selection rate) is 31%: the model identifies 31% of all true churners in the top 2.5% of accounts. The precision at this threshold is 0.174, meaning 17.4% of the 600 accounts flagged are genuine churners. The agency also calculates the baseline performance of the random selection approach the client was using previously: at 2.5% selection rate with no model, expected true positive rate is 14% (equal to the base rate) and precision is also 14%. The model improves precision from 14% to 17.4% (a 24% lift) and true positive rate within the selection budget from 14% to 31% (a 121% lift) over random selection. The customer success team’s reach-per-genuine-churn intervention is 5.7 contacts per prevented churn versus 7.1 under random selection, a 20% efficiency improvement. The agency presents these metrics rather than overall AUC to the client stakeholders, as the business question is not model accuracy in the abstract but efficiency of churn prevention within the operational constraint of the customer success team’s outreach capacity.
The generative AI foundations module covers the full confusion matrix including true positives, false positives, and the precision-recall tradeoffs that govern classification model value in imbalanced marketing prediction tasks such as churn prediction and lead scoring.