A mathematical function that maps any real-valued input to an output between 0 and 1 following an S-shaped curve, enabling neural networks to model probability-like outputs and making it the foundational activation function in binary classification and the gating mechanisms of recurrent neural networks. The sigmoid function is the mathematical basis for logistic regression, a workhorse of propensity modeling in marketing.
Also known as logistic function, S-curve, sigma function
The sigmoid function is defined as sigma(x) = 1 / (1 + e^(-x)), where e is Euler’s number approximately equal to 2.718. As x approaches positive infinity, the function output approaches 1. As x approaches negative infinity, the output approaches 0. At x = 0, the output is exactly 0.5. The S-shape of the curve comes from its smooth transition from near-0 to near-1 through the logistic growth region around 0, with increasingly flat tails at extreme positive and negative values. This bounded, smooth, S-shaped property makes it well-suited for outputs that represent probabilities or proportions, which must lie between 0 and 1.
In logistic regression, the sigmoid function converts the linear combination of input features and learned weights into a predicted probability. The model computes w1*x1 + w2*x2 + … + b (the log-odds of the positive class) and passes this through the sigmoid to produce a probability between 0 and 1. The decision boundary, where the predicted probability equals 0.5, corresponds to the linear combination equaling 0, making logistic regression a linear classifier in feature space despite the nonlinear sigmoid output transform. Training minimizes binary cross-entropy loss, which is the negative log-likelihood of the sigmoid-transformed predictions under the observed binary labels.
In neural networks, the sigmoid function was historically used as a hidden layer activation function but has been largely replaced by ReLU and its variants for most deep learning applications. The primary issue is vanishing gradients: in the tails of the sigmoid (large positive or large negative inputs), the function is very flat and its gradient is near zero, providing almost no learning signal for layers below. When many sigmoid-activated layers are stacked, gradients become exponentially smaller with each layer, making early layers nearly impossible to train. ReLU, which is linear for positive inputs and zero for negative inputs, avoids this problem because its gradient is either 1 or 0, preventing exponential gradient decay. Sigmoid remains in use for binary classification output layers, where its probability interpretation is useful, and in LSTM gating mechanisms, where the saturating property serves a deliberate architectural function.
A working ad agency building propensity models for churn prediction, purchase likelihood scoring, or lead quality classification is working with logistic regression or neural network classifiers that use sigmoid outputs to produce predicted probabilities. Understanding the sigmoid function explains why logistic regression outputs are probabilities (values between 0 and 1), why the decision threshold of 0.5 corresponds to the boundary where the model is equally likely to classify as positive or negative, and why threshold adjustment changes the precision-recall tradeoff for downstream audience segmentation decisions.
Logistic regression with sigmoid output is frequently the right model choice for marketing propensity tasks because its outputs are well-calibrated probabilities and its coefficients are directly interpretable. A logistic regression model trained to predict 90-day purchase probability outputs scores between 0 and 1 that can be interpreted directly as probabilities, unlike tree models whose raw scores require post-hoc calibration to be meaningfully probabilistic. The model coefficients represent the change in log-odds of conversion associated with a one-unit change in each feature, enabling direct interpretation of each feature’s contribution. This interpretability and calibration are particularly valuable for client reporting and for audit purposes in regulated industries where model inputs and their effects must be documented and justified.
Threshold tuning on sigmoid-output propensity models allows agencies to adjust the precision-recall tradeoff to match client capacity constraints. A purchase propensity model with sigmoid output produces a ranked list of customers by predicted probability. The fraction of customers targeted (and thus the precision-recall tradeoff) is determined by the threshold chosen for positive classification. A client with capacity to contact 5,000 customers per month from a scored population of 200,000 should target the top 2.5% of scored customers by predicted probability, a threshold that maximizes precision by concentrating contacts on the highest-probability customers. A client with capacity for 25,000 contacts should use a lower threshold that increases recall at the cost of precision. The sigmoid-output probability score directly supports this threshold-based targeting without requiring recalibration of the model for different capacity scenarios.
The marketing response function is frequently S-shaped, making the sigmoid a natural model for diminishing returns and saturation in spend-response relationships. The relationship between advertising spend and brand awareness, or between contact frequency and conversion rate, often follows an S-shaped pattern: initial spend produces increasing returns as awareness builds, then diminishing returns as the audience saturates, then near-zero incremental response above saturation. Fitting a sigmoid function to spend-response data in marketing mix models captures this S-shaped response curve, enabling accurate budget optimization that accounts for both the below-saturation efficiency improvement from increasing spend and the above-saturation inefficiency of spending beyond the response ceiling. This is distinct from log-log regression for diminishing returns that lacks the lower saturation and inflection point characteristics of a full S-curve.
An agency builds a free trial conversion propensity model for a project management software client. The model predicts which free trial users will convert to paid within 30 days of trial expiration. The training set contains 22,400 trial users (18.6% conversion rate) with 24 features including days active in trial, features used, team invitations sent, integrations connected, file import volume, and support ticket count. The agency trains a logistic regression model with L2 regularization (ridge penalty), intentionally choosing logistic regression over gradient boosted trees to prioritize interpretable coefficients and well-calibrated probability outputs for the client’s sales team. The model achieves validation AUC of 0.79 and Brier score of 0.116 (indicating good probability calibration: a Brier score below 0.125 indicates predictions are well-calibrated relative to the 18.6% base rate). Coefficient analysis reveals the three strongest positive predictors: team invitations sent in trial (coefficient: +0.82), integrations connected (coefficient: +0.61), and days active in trial exceeding 14 (coefficient: +0.55). The three strongest negative predictors are: support ticket opened in first 3 days (coefficient: -0.47), zero file imports (coefficient: -0.39), and single-user trial with no team features used (coefficient: -0.33). The sales team uses the sigmoid-output probability scores to prioritize outreach, contacting trial users in the top 25% of predicted conversion probability (probability above 0.35, which is 1.88 times the 18.6% base rate). Within 45 days of deployment, the trial-to-paid conversion rate from sales-contacted users improves from 31% (prior rule-based contact list) to 48% (model-scored list), while total outreach volume decreases by 18%, demonstrating that the logistic regression propensity model correctly concentrates sales effort on trials with the highest conversion potential.
The generative AI foundations module covers the sigmoid function, logistic regression, and propensity modeling, including coefficient interpretation, threshold tuning, and how sigmoid-based models enable the probability-calibrated audience scoring that drives direct response campaign efficiency.