A numerical value inside a machine learning model that is learned from training data and determines how the model transforms inputs into outputs. Parameters are the internal quantities that the training algorithm adjusts to minimize prediction error; the full set of a model’s parameters encodes everything the model has learned from its training data and determines its behavior on new inputs.
Also known as model parameter, weight, learnable parameter
Every machine learning model is a mathematical function parameterized by a set of numbers. A linear regression model with two input features has three parameters: a coefficient for each feature and a bias term. A deep neural network may have hundreds of millions or billions of parameters, each a floating-point number that is adjusted during training to reduce the model’s error on the training data. The training algorithm, typically gradient descent, iteratively adjusts each parameter by a small amount in the direction that reduces the loss function. After training, the parameters are fixed and the model is deployed to make predictions on new inputs.
Parameters are distinct from hyperparameters. Parameters are learned from data during training; hyperparameters are set by the practitioner before training and control how the training process operates. The number of layers in a neural network, the learning rate for gradient descent, and the regularization strength are all hyperparameters that must be chosen before training begins. The weights within each layer are parameters that are adjusted by the training algorithm. This distinction matters for understanding model capacity and what can be tuned automatically versus what requires human decisions.
Parameter count is a primary indicator of model capacity, which determines the complexity of patterns a model can learn. A model with more parameters can represent more complex input-output relationships. GPT-3 has 175 billion parameters, enabling it to represent extraordinarily complex patterns in language. A logistic regression model for conversion prediction might have 50 parameters. Both are parameterized functions; the scale difference reflects the difference in representational capacity needed for generating coherent paragraphs versus classifying purchase intent. Larger parameter counts come with greater computational cost and greater risk of overfitting when training data is limited.
A working ad agency evaluating AI tools, commissioning custom model development, or managing fine-tuning projects needs a working understanding of parameters to navigate conversations about model capacity, compute cost, and fine-tuning feasibility. The parameter count of a model determines its inference cost, its memory requirements, its training cost, and its sensitivity to overfitting on small datasets. These factors directly affect the operational economics of AI deployment and the practical feasibility of client-specific customization.
Fine-tuning a large language model on client-specific data requires adjusting only a subset of parameters to be feasible. A language model with 7 billion parameters cannot be fully fine-tuned on a typical client’s dataset of several hundred examples without severe overfitting and without substantial GPU compute costs. Parameter-efficient fine-tuning methods such as LoRA (Low-Rank Adaptation) address this by training only a small number of additional adapter parameters while freezing the original model parameters. LoRA fine-tuning of a 7B parameter model may require updating only 4 to 8 million additional parameters, reducing compute and memory requirements by orders of magnitude while achieving comparable adaptation quality to full fine-tuning.
Model parameter count determines inference latency and hosting cost for deployed AI tools. A language model with 70 billion parameters requires significantly more GPU memory and compute per inference request than a 7 billion parameter model. For high-volume applications such as real-time creative scoring, bid-level prediction, or dynamic copy generation, the per-inference cost and latency of a large-parameter model may be prohibitive. Understanding the parameter count of AI tools under evaluation enables cost-benefit analysis: is the quality improvement of the larger model worth the 5x to 10x higher inference cost? This tradeoff analysis requires knowing both the parameter counts of the models being compared and the volume of inference requests in the deployment context.
Parameter sharing and distillation enable smaller models to approximate larger ones for cost-sensitive deployments. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model, transferring the teacher’s learned parameter configurations into a smaller model with fewer parameters. The distilled model achieves much of the teacher’s performance at a fraction of the inference cost. Agencies building cost-sensitive production deployments can use distillation to compress a high-quality large model into a deployable smaller model, preserving most of the quality gains from the larger model without paying the full inference cost at scale.
An agency is evaluating AI copywriting tools for a retail client that generates 40,000 product descriptions per month. Three candidate tools use models with different parameter scales: Tool A uses a 70B parameter model with high output quality, Tool B uses a 7B parameter model with slightly lower quality, and Tool C uses a fine-tuned 1.3B parameter model trained specifically on the client’s product category. The agency runs a quality evaluation with the client’s merchandising team rating 100 descriptions from each tool on accuracy, brand voice, and conversion potential. Tool A achieves mean ratings of 4.2/5.0; Tool B achieves 3.9/5.0; Tool C achieves 4.0/5.0. The quality differences are modest, but the economics differ substantially. Tool A charges per token at a rate that produces an estimated $0.018 per description, totaling $8,640 per month at 40,000 descriptions. Tool B charges $0.004 per description ($1,920 per month). Tool C, as a fine-tuned smaller model hosted on the agency’s own infrastructure, costs $0.0008 per description ($384 per month). The agency recommends Tool C: the category-specific fine-tuning compensates for the smaller parameter count, achieving comparable quality to the 7B model, while the cost savings of $1,536 per month versus Tool B and $8,256 per month versus Tool A justify the infrastructure investment in the fine-tuning project. The parameter-count conversation frames the tool evaluation correctly as a cost-quality tradeoff rather than a purely technical comparison.
The generative AI foundations module covers model parameters, capacity, and the parameter-efficient fine-tuning techniques that make large-model quality accessible at small-model cost for agency client applications.