AI Glossary · Letter O

Online Learning.

A machine learning paradigm in which a model updates its parameters continuously as new data arrives, rather than waiting for a complete dataset to be assembled before training. Online learning enables models to adapt to changing data distributions in real time, making it appropriate for applications where behavior patterns shift continuously, such as real-time bidding, recommendation, and fraud detection.

Also known as incremental learning, streaming learning, real-time learning

What it is

A working definition of online learning.

In online learning, the model receives one or a small batch of new examples at a time, updates its parameters based on those examples, and immediately uses the updated model for the next prediction. This contrasts with offline (batch) learning, which trains on a fixed complete dataset. Online learning algorithms must balance stability, retaining what has been learned from prior data, with plasticity, adapting quickly to new patterns. Algorithms that update too aggressively from each new example will forget previously learned patterns (catastrophic forgetting); algorithms that update too conservatively will adapt too slowly to genuine distribution changes.

The learning rate controls the stability-plasticity tradeoff in online learning. A high learning rate produces fast adaptation to new patterns but high sensitivity to noise in individual training examples. A low learning rate produces stable, slowly adapting parameters. Adaptive learning rate methods such as AdaGrad and Adam maintain per-parameter learning rates that decrease as a parameter accumulates updates, naturally reducing the learning rate for frequently updated parameters over time. These adaptive methods are widely used in online deep learning because they balance initial rapid learning with later stability.

Online learning is particularly valuable when the data distribution changes faster than the offline retraining cadence, when examples are too numerous to store and retrain from scratch, or when predictions must be updated immediately after each new observation. Real-time ad click-through rate prediction, where user behavior patterns shift continuously with context changes and the model must incorporate each new click observation to improve future predictions, is a canonical online learning application. Bayesian online learning methods maintain uncertainty estimates over model parameters and update them with each new observation, enabling the model to track parameter uncertainty alongside point estimates.

Why ad agencies care

Why online learning is the right paradigm for production systems where stale models degrade campaign performance.

A working ad agency managing campaigns where the AI optimization layer must adapt to real-time changes, such as breaking news events, sudden competitive price changes, and flash promotions, benefits from understanding the difference between online and offline learning systems and what each implies for responsiveness. An offline-trained bidding model that was retrained last week will not adapt to today’s sudden auction dynamics change; an online-learning system that updates continuously from live auction feedback will. Knowing which systems behind agency tools use which paradigm enables better decisions about when manual overrides are needed to compensate for slow-adapting offline models.

Real-time bidding models use online learning to adapt to intraday auction dynamics. The clearing prices and win rates in programmatic auctions change throughout the day as competitive pressure, user composition, and inventory supply shift. A bid optimization model that uses online learning updates its price predictions with each new auction observation, adapting to the current supply-demand dynamics without waiting for an offline retraining cycle. This continuous adaptation is the mechanism that makes automated bidding systems responsive to real-time market changes rather than lagging behind them by weeks.

Recommendation systems use online learning to incorporate real-time engagement feedback. A content recommendation model that updates with each new user click or engagement event learns which content is performing well in real time, enabling it to boost newly trending content and suppress content that is no longer engaging faster than a weekly-retrained offline model. This recency sensitivity is particularly valuable for news, entertainment, and trend-driven commerce categories where content relevance has a short half-life and stale recommendations rapidly degrade the user experience.

Fraud detection models require online learning to stay ahead of evolving attack patterns. Ad fraud techniques evolve rapidly as fraudsters adapt to detection methods. An offline fraud detection model that is retrained weekly will be blind to attack patterns that appear in the days between retraining cycles. Online learning models that incorporate feedback from each detected fraudulent impression or click update their detection patterns continuously, closing the window of vulnerability between the appearance of a new attack pattern and the model’s ability to detect it.

In practice

What online learning looks like inside a working ad agency.

An agency manages a dynamic pricing and promotion model for a flash-sale e-commerce client that runs 3 to 5 flash sales per week with unpredictable timing and discount depths. The existing conversion prediction model is an offline gradient boosting model retrained nightly on the previous 7 days of transaction data. During flash sales, conversion rates spike 4x to 8x above baseline within minutes of a sale announcement and return to near-baseline levels within 2 to 4 hours after the sale ends. The nightly retrained offline model does not capture these intraday dynamics and consistently underestimates conversion probability during active sales, causing the automated bidding system to under-bid during the highest-conversion windows. The agency implements an online learning layer: a lightweight logistic regression model with an adaptive learning rate that updates after every 500 new conversion observations, running alongside the offline model. During normal periods, the online model’s predictions are close to the offline model’s. When a flash sale begins and conversion rates spike, the online model detects the change within 15 to 20 minutes and updates its parameters to reflect the elevated conversion probability. The bidding system uses a weighted blend of the offline and online models, with the online model receiving higher weight during periods when its predictions diverge significantly from the offline baseline. Post-implementation analysis shows a 23% increase in conversions during flash sale windows attributable to improved bid calibration, with the online learning model correctly identifying the elevated conversion environment within an average of 18 minutes of sale launch.

Online Learning.

A working definition of online learning.

Why online learning is the right paradigm for production systems where stale models degrade campaign performance.

What online learning looks like inside a working ad agency.

Build the adaptive learning and real-time optimization expertise that improves campaign responsiveness through The Creative Cadence Workshop.

Concepts in online learning’s territory.