What is Linear Regression?

What it is

A working definition of linear regression.

Linear regression models the relationship between inputs and a continuous numerical output by estimating a coefficient for each input variable that represents its marginal effect on the output, holding all other inputs constant. A simple linear regression with one input has the form output = intercept + coefficient times input; a multiple linear regression with many inputs has one coefficient per input. The model is “linear” because the output is a linear function of the coefficients, even if the inputs themselves are transformed nonlinearly before being used in the model.

The coefficients are estimated by minimizing the sum of squared differences between the model’s predictions and the actual observed values in the training data, the ordinary least squares criterion. This minimization problem has a closed-form solution, meaning coefficients can be computed directly without iterative optimization, which makes linear regression computationally efficient and mathematically transparent. The estimated coefficients have a direct causal interpretation under the assumptions of the model: if the coefficient for ad spend is 0.3, the model estimates that each additional dollar of ad spend is associated with a 0.3-unit increase in the output, holding all other inputs constant.

Linear regression’s assumptions include that the relationship between inputs and output is actually linear, that the residual errors are normally distributed with constant variance, and that the input variables are not perfectly correlated with each other. Violations of these assumptions do not necessarily render the model useless, but they do affect the reliability of coefficient estimates and the validity of associated statistical tests. Diagnostic plots that examine residuals against fitted values and against each input are the standard tool for detecting assumption violations, and transformations such as log-transforming skewed variables can often resolve common issues.

Why ad agencies care

Why linear regression remains the most useful model for understanding the quantitative relationship between marketing inputs and outputs.

A working ad agency building media mix models, pricing analyses, or budget optimization frameworks will use linear regression more than any other single modeling technique, not because it is the most sophisticated tool available but because it produces interpretable coefficient estimates that directly answer the business questions clients actually ask. A client who wants to know how much incremental revenue they can expect from an additional $1 million in search spend needs a coefficient estimate with a confidence interval, not a black-box prediction from a neural network. Linear regression provides that estimate in a form that is auditable, defensible, and actionable.

Media mix modeling is applied linear regression with careful variable construction. The core of a media mix model is a regression of weekly or monthly sales on weekly or monthly media spend across channels, plus control variables for price, distribution, competitive activity, and seasonality. The regression coefficients estimate how much of the observed sales variation is attributable to each media channel after controlling for the other factors. The marginal return on investment for each channel is derived from its coefficient: how much sales increases per dollar of additional spend in that channel. Agencies that understand regression mechanics can audit vendor-supplied media mix model outputs, identify when the model specification is producing implausible coefficient estimates, and make informed decisions about model improvements.

Advertising response curves use log-transformed regression to capture diminishing returns. Raw spending variables in media mix models often violate the linearity assumption because advertising effectiveness typically exhibits diminishing returns: the first dollar of spend in a channel produces more incremental sales than the thousandth dollar. Log-transforming the spend variable converts the diminishing-returns relationship to a linear one in log-spend, making linear regression appropriate for the transformed data. The resulting coefficient on log-spend represents the elasticity of sales with respect to spend: a coefficient of 0.15 means a 10% increase in spend is associated with a 1.5% increase in sales.

Pricing analysis and demand modeling rely on regression to estimate price elasticity. Understanding how changes in price affect sales volume requires estimating the price coefficient in a demand regression that controls for advertising, distribution, competitive pricing, and seasonality. The estimated price elasticity from this regression is the key input to pricing optimization: it quantifies the revenue tradeoff between higher margins at higher prices and higher volume at lower prices. Agencies advising clients on promotional pricing strategy need defensible elasticity estimates, and regression provides the standard method for producing them from historical sales and pricing data.

In practice

What linear regression looks like inside a working ad agency.

An agency builds a media mix model for a national restaurant chain client to evaluate the contribution of each media channel to weekly customer visits during a 2-year measurement period. The model regresses weekly same-store customer visits on weekly spend across six channels: television, digital video, paid search, social, radio, and out-of-home; plus control variables for average weekly temperature, a school-holiday indicator, a COVID period indicator, and a linear time trend. The team log-transforms all spend variables to capture diminishing returns and applies an adstock transformation to each channel to model the carryover effect of advertising from prior weeks. Ordinary least squares estimation produces coefficient estimates with 90% confidence intervals for each channel’s contribution. The regression explains 83% of the variance in weekly visits, and residual diagnostics show no severe violations of model assumptions. The coefficients reveal that paid search has the highest marginal ROI at $4.20 in incremental revenue per ad dollar, followed by digital video at $2.80, while television has the lowest at $1.40. The client uses these estimates to reallocate 15% of the television budget to digital video and paid search, projected to increase total incremental revenue by 8% at the same total media budget.

Linear Regression.

A working definition of linear regression.

Why linear regression remains the most useful model for understanding the quantitative relationship between marketing inputs and outputs.

What linear regression looks like inside a working ad agency.

Build the quantitative modeling foundations for media mix and attribution work through The Creative Cadence Workshop.

Linear Regression.

A working definition of linear regression.

Why linear regression remains the most useful model for understanding the quantitative relationship between marketing inputs and outputs.

What linear regression looks like inside a working ad agency.

Build the quantitative modeling foundations for media mix and attribution work through The Creative Cadence Workshop.

Concepts in linear regression’s territory.