The preprocessing step of rescaling numeric feature values to a common range or distribution before model training, ensuring that features with different natural scales contribute proportionally rather than by the size of their numbers. For agencies, normalization is one of the invisible preprocessing choices that determines whether a model learns the right patterns or is overwhelmed by the loudest numbers in the dataset.
Also known as feature scaling, data standardization, min-max scaling
Machine learning models that compute distances between data points or update weights based on gradients are sensitive to the scale of their input features. A model trained on customer attributes that include age (ranging from 18 to 90) and annual spend (ranging from 100 to 500,000) will have the spend variable dominate its learning simply because its numerical range is far larger, regardless of whether spend is actually more predictive than age.
Normalization addresses this by rescaling features to a comparable range. Min-max scaling maps all values to a 0-to-1 range. Standardization subtracts the mean and divides by the standard deviation, producing a distribution centered at zero with unit variance. Log transformation compresses right-skewed distributions by converting the scale to a logarithmic one.
Not all algorithms require normalization. Tree-based models like decision trees and random forests are scale-invariant; they split on feature values rather than computing distances, so raw scale does not affect their behavior. Distance-based models like k-nearest neighbors and support vector machines, and gradient-based models like neural networks, are highly sensitive to scale and require normalization to perform reliably.
Agencies evaluating or building models on client data encounter normalization decisions constantly, often implicitly in the preprocessing steps of tools they use. Understanding when normalization is required and what it does helps prevent misapplication and misinterpretation of model outputs.
It affects which features appear important. In a model without normalization, features with large numerical ranges appear to have high importance even if they are not actually predictive. A feature importance report on un-normalized data may rank a large-scale variable as the primary driver of predictions when its apparent importance is a function of its scale, not its predictive content.
Preprocessing choices are modeling decisions. The sequence of normalization, imputation, and transformation applied before training is part of the model, not a neutral technical setup step. Agencies that treat preprocessing as configuration rather than modeling work may apply inappropriate transformations without understanding the downstream effects on model behavior.
The inverse transformation matters for output interpretation. If input features are normalized before training, the model’s outputs and feature importances are in the normalized scale. Presenting results to clients requires understanding what normalization was applied and converting back to the original scale correctly, which is easy to get wrong under deadline pressure.
An agency trains a customer value prediction model and the feature importance report shows “customer ID number” as the second most important predictor. Investigation reveals that no normalization was applied: customer IDs, which range from 1 to 500,000 because they were assigned sequentially since launch, are dominating the model because of their scale. The model is learning customer recency from the ID number, which also introduces data leakage. The fix requires both normalization and removing the ID from the feature set. Both problems were invisible until someone asked why a database artifact was appearing in the importance report.
The generative AI foundations module of the workshop covers how today’s models work, what they require from data, and how to choose between them for the real-world data realities agencies face with clients.