A predictive modeling task where the output is one of exactly two categories, such as convert or not convert, brand-safe or not brand-safe. It is the structural foundation of many AI tools agencies rely on for lead scoring, content filtering, and audience qualification.
Also known as two-class classification, yes-no prediction model, binary prediction
Binary classification assigns inputs to one of two output classes. A spam filter decides: spam or not spam. A lead scoring model decides: high-value or not high-value. A content safety tool decides: flagged or not flagged. In each case, the model learns a boundary in the input space that separates the two classes, and applies that boundary to new inputs it has not seen before.
The two outputs are typically labeled positive and negative (or 1 and 0), and the model produces either a hard prediction (one class or the other) or a probability score (how confident the model is that an input belongs to the positive class). The threshold at which a probability score gets converted to a hard prediction is a configurable parameter that changes the tradeoff between false positives and false negatives.
Understanding that threshold tradeoff is important in agency work. A brand safety classifier set to flag aggressively will block some legitimate placements. Set to flag conservatively, it will miss some risky ones. Neither setting is objectively correct: the right threshold depends on which type of error the client considers more costly.
The most consequential AI tools in agency workflows are binary classifiers at their core: the tool that decides whether a lead is worth pursuing, whether a placement is brand-safe, whether a piece of content passes moderation. Understanding how they work changes how you configure, evaluate, and defend them to clients.
The threshold is a policy decision, not a technical one. The probability score a classifier produces is a fact. The threshold at which that score translates into an action is a choice. Agencies setting up AI-assisted content review or lead qualification should own that threshold decision explicitly, not accept whatever the vendor sets as default.
Class imbalance affects reliability. Most real classification problems are imbalanced: 99% of impressions are brand-safe, 1% are not. A model that classifies everything as safe gets 99% accuracy but is useless. Agencies evaluating classification tools should ask for precision and recall metrics, not just accuracy, especially when positive cases are rare.
False positives and false negatives have asymmetric costs. In lead scoring, a false positive (scoring a bad lead as high-value) wastes sales time. A false negative (missing a high-value lead) loses revenue. The relative cost of these errors should drive threshold selection. Vendors who do not help clients think through this tradeoff are not giving complete advice.
An agency deploys a binary classification model to pre-qualify inbound leads before routing them to the client’s sales team. At the default threshold of 0.5, the model routes 60% of leads as high-value, which the sales team finds overwhelming. The agency reviews the false positive rate and adjusts the threshold to 0.7, which reduces routed leads to 30% while retaining 85% of the leads that actually converted in the historical validation set. The threshold change is treated as a business decision, reviewed quarterly as the client’s definition of a qualified lead evolves.
The generative AI foundations module of the workshop covers how today’s models work, what they can and can’t do, and how to choose between them.