An ensemble machine learning method that builds a large number of decision trees, each trained on a random subset of the training data and a random subset of features, and combines their predictions by majority vote or averaging. Random forests are among the most reliably accurate and robust algorithms for structured tabular data, and are widely used in audience scoring, churn prediction, and creative performance modeling.
Also known as random forest classifier, ensemble tree model, bagged decision trees
A random forest trains many decision trees, typically hundreds to thousands, each on a bootstrapped sample of the training data: a random sample drawn with replacement from the original dataset. Each tree also uses a random subset of features at each split point, rather than choosing the best feature from all available features. These two sources of randomness, random data sampling and random feature selection, ensure that the trees in the forest are diverse and not all making the same mistakes. When predictions are needed, every tree votes and the forest returns the majority class for classification or the average prediction for regression.
The core insight behind random forests is that averaging many imperfect but diverse predictors produces a more accurate predictor than any single tree. A single decision tree is a high-variance model: small changes in the training data produce large changes in the tree structure and predictions. Averaging many such trees, each trained on slightly different data and features, smooths out this variance. The result is a model that is much more stable and accurate than any single tree, while retaining the ability to capture non-linear patterns and interactions that simpler models such as logistic regression cannot represent.
Random forests provide several practically valuable features beyond accuracy. Feature importance scores measure how much each input feature reduces prediction error across all the trees, providing a ranked list of the most predictive variables that is useful for feature selection and business interpretation. Out-of-bag error estimation uses the examples not included in each tree’s bootstrap sample as a built-in validation set, providing an unbiased error estimate without requiring a held-out validation set. These properties make random forests a practical first-choice algorithm for new structured prediction problems where model behavior needs to be understood, not just optimized.
A working ad agency building propensity models, churn predictors, or lifetime value estimators for clients should default to random forest as the first algorithm to try on any structured tabular dataset. Random forests require minimal preprocessing, handle missing values and mixed feature types gracefully, are resistant to overfitting relative to single trees, and provide feature importance scores that are useful for both model improvement and client explanation. In benchmarks across structured marketing data problems, random forests consistently outperform simpler models like logistic regression and typically match or approach gradient boosted tree methods with less hyperparameter tuning effort.
Feature importance from random forests identifies which audience signals actually drive conversion and churn predictions. A random forest trained to predict 90-day churn for a subscription client returns feature importances that rank every input feature by its predictive contribution. The top features in a typical churn model are behavioral signals such as days since last login, number of sessions in the prior 30 days, and customer support contact frequency. These importances guide both model improvement, by focusing feature engineering effort on the most predictive signal categories, and business decisions, by identifying which behavioral interventions are most likely to prevent churn.
Random forests handle the messy, heterogeneous data that characterizes real marketing datasets without extensive preprocessing. Marketing datasets combine numeric features such as spend and impressions, categorical features such as device type and channel, and binary flags, all potentially with missing values. Random forests handle this heterogeneity without requiring manual scaling, normalization, or one-hot encoding of categoricals beyond simple integer encoding. This preprocessing flexibility reduces the time to first useful model and allows the agency to iterate rapidly on feature engineering rather than spending disproportionate effort on data transformation.
Bootstrapped confidence intervals from random forests provide uncertainty estimates around individual predictions. By collecting the individual tree predictions for a given input, the distribution of tree votes provides an informal confidence estimate: inputs where 95% of trees agree on the same class are predicted with high confidence, while inputs where the vote is close to 50/50 are predicted with high uncertainty. This prediction uncertainty is valuable for tiered decision rules that take different actions for high-confidence versus uncertain predictions, routing uncertain cases for human review while automating decisions on high-confidence ones.
An agency is building a lead quality scoring model for a B2B software client whose sales team receives 800 to 1,200 inbound leads per month and can only prioritize follow-up for the top 200. The training dataset contains 14 months of lead records with 38 features including firmographic data (company size, industry, geography), behavioral data (pages visited, content downloaded, time on site), lead source, and form fill content, with a binary label indicating whether each lead converted to a sales-qualified opportunity within 60 days. Positive rate is 18% (leads converting to SQLs). The agency trains a random forest with 500 trees using the scikit-learn default hyperparameters as a baseline. Out-of-bag AUC is 0.83. Feature importance analysis reveals that three features dominate: number of product feature pages visited (most important), company size band (second), and whether the lead downloaded a pricing guide (third). The agency uses these importances to engineer two additional features: a recency-weighted page visit score and a product intent signal combining pricing guide, case study, and demo page visits. Retraining with the engineered features improves OOB AUC to 0.87. The model is deployed to score new leads daily. Over the subsequent quarter, the sales team’s contact-to-SQL rate improves from 22% to 38% when working from the model-ranked lead list versus the prior volume-based prioritization. The feature importance report doubles as the agency’s analysis deliverable explaining which behavioral signals most predict sales readiness.
The generative AI foundations module covers decision tree methods including random forests, feature importance analysis, and the ensemble learning principles that make structured marketing prediction models reliable and interpretable.