A metric used in decision tree learning to measure how mixed the class labels are in a set of training examples, guiding which feature and threshold to split on at each node. Lower Gini impurity means a more homogeneous set; a value of zero means all examples in the set belong to the same class. Gini impurity is the splitting criterion behind the CART algorithm that underlies most decision tree and gradient boosted ensemble implementations.
Also known as Gini index, Gini coefficient for trees, impurity measure
Gini impurity measures the probability that a randomly chosen element from a set would be incorrectly labeled if it were assigned a label randomly drawn from the distribution of labels in that set. For a perfectly pure set where all elements belong to the same class, Gini impurity is zero. For a maximally impure set with two equally represented classes, Gini impurity is 0.5. At each node in a decision tree, the algorithm considers every possible feature-threshold combination, computes the weighted average Gini impurity of the two resulting subsets, and selects the split that minimizes this weighted impurity. The result is a tree that partitions the data progressively into purer subsets with each level of splits.
Gini impurity and entropy are the two standard splitting criteria for classification trees, and they typically produce similar tree structures in practice. Gini impurity is computationally slightly faster because it does not require computing logarithms. The CART (Classification and Regression Trees) algorithm, which is the implementation used in scikit-learn and most production gradient boosting libraries, uses Gini impurity as its default classification splitting criterion. Understanding Gini impurity is relevant for interpreting decision tree feature importance, which is computed based on the total reduction in Gini impurity attributable to each feature across all splits in the tree or ensemble.
In ensemble methods like random forests and gradient boosted trees, feature importance is often reported as “Gini importance” or “mean decrease in impurity,” which aggregates the Gini impurity reduction from each split involving a feature across all trees in the ensemble. This importance measure can be biased toward features with many distinct values because they have more possible split points, which increases their probability of producing high-impurity-reduction splits by chance. Permutation importance, which measures actual performance degradation when a feature is shuffled, is a less biased alternative for comparing feature importances across features with different cardinalities.
Decision trees and gradient boosted ensembles are the most commonly used model families for structured tabular data in advertising technology, CRM analytics, and propensity modeling. A working ad agency using these models, or interpreting feature importance reports from vendors whose tools are built on them, encounters Gini impurity at the foundation of those systems. Understanding what it measures and where it can mislead is a practical literacy requirement for using tree-based models responsibly.
Feature importance rankings based on Gini impurity require careful interpretation. When a gradient boosted churn model reports that a high-cardinality feature like customer ID has high feature importance, that is almost certainly a Gini impurity bias artifact, not a genuine signal: the model has found spurious splits on the high-cardinality feature that happen to reduce impurity in the training set. Agencies interpreting vendor-provided feature importance reports should check whether reported high-importance features have many distinct values and validate their importance using permutation methods before drawing conclusions.
It explains why decision tree depth controls model complexity. A decision tree that is allowed to split until every leaf is perfectly pure, Gini impurity of zero, will memorize the training data and overfit severely. Limiting tree depth constrains how far the Gini-driven splitting process can go, controlling the bias-variance tradeoff. Understanding that tree depth is a regularization parameter governing how many successive Gini-minimizing splits the model can make helps agencies configure tree-based models appropriately rather than using default depth settings that may produce either too-simple or too-complex trees for the available data.
It underlies the interpretability advantage of tree-based models. The splitting logic of a decision tree is directly interpretable as business rules: “if days since last purchase is greater than 90 and lifetime value is below $200, classify as high churn risk.” Each split is driven by Gini impurity minimization, which means each rule in the tree represents the most discriminative boundary the model found in the data at that point. This interpretability is why decision trees remain useful for generating hypotheses about what drives client outcomes, even when gradient boosted ensembles are used for production prediction.
An agency builds a lead qualification model for a B2B software client using a gradient boosted ensemble. After training, the model’s feature importance report, computed using mean decrease in Gini impurity, ranks “company name” as the third most important feature. The agency recognizes this as a Gini impurity bias artifact: company name is a high-cardinality categorical feature with thousands of unique values, and the model has found spurious splits on it that reduce impurity in the training set but will not generalize to new leads. The agency recomputes feature importance using permutation importance on the validation set and finds that company name drops to the bottom of the importance ranking. The three genuinely important features that permutation importance surfaces, technology stack category, headcount growth rate, and number of previous trial starts, are all directly interpretable as sales signals. The agency removes company name from the feature set, retrains, and validates that the revised model maintains the same accuracy while producing importance rankings that make sense to the client’s sales team.
The generative AI foundations module covers how tree-based models work and how to interpret their outputs correctly, including the feature importance pitfalls that produce misleading conclusions when not understood.