The process of fitting a machine learning model to data by iteratively adjusting its internal parameters until it can generalize patterns and produce useful predictions or outputs. For agencies, understanding training helps explain why models have biases, why they have knowledge cutoffs, and why the data you put into a custom model shapes everything it produces afterward.
Also known as model training, training a model
Machine learning models learn from examples. Training is the process of exposing a model to a large dataset, having it make predictions on that data, measuring how wrong those predictions are, and adjusting the model’s parameters to reduce the error. That cycle repeats many times until the model’s predictions are accurate enough to be useful.
For a large language model, training involves processing billions of text tokens and learning the statistical relationships between them. For an image classifier, it means learning which visual features correspond to which categories. The training data shapes the model’s capabilities and its blind spots equally. Models trained on narrow data generalize poorly. Models trained on biased data reproduce those biases.
Most agencies will never train a model from scratch; that requires compute and data resources far beyond typical agency scale. But training is conceptually upstream of everything else: fine-tuning, prompt engineering, and custom model adaptation all build on top of a base model that was trained by someone else on someone else’s data. Understanding that starting point is essential for understanding what the model can and cannot do for your use case.
Agencies evaluate and adopt AI tools constantly. Understanding training at a conceptual level helps creative directors, strategists, and account teams ask better questions when a vendor presents a model: what was it trained on, when, and with what quality controls? The answers have real implications for how reliably the model will serve agency use cases.
Knowledge cutoffs and currency. Every trained model has a date beyond which it has no knowledge of events, campaigns, cultural moments, or platform changes. For agencies working in fast-moving categories, that cutoff can make a model effectively useless for current trend analysis without supplementary context. Knowing this saves teams from relying on AI outputs that are factually stale.
Bias in training data. A model trained predominantly on English-language text from mainstream Western sources will reflect the cultural assumptions of that corpus. For agencies producing campaigns that require cultural fluency beyond those defaults, model outputs need more rigorous review and supplementation than teams often realize.
Custom model conversations. Some enterprise clients are beginning to explore training or adapting their own models on proprietary data. Agencies positioned as AI advisors need enough fluency with training concepts to participate meaningfully in those conversations, even if the engineering happens elsewhere.
A strategy team at an agency is evaluating a new AI writing tool for a client in a technical B2B category. Before piloting the tool, they ask the vendor three questions: what corpus was the model trained on, what is the knowledge cutoff date, and has the model been evaluated for accuracy in this specific category. The vendor confirms the model’s training data does not include specialized industry publications relevant to the client’s field. Based on that, the team decides to use the tool only for structural drafting tasks and routes all technical content claims through a human subject-matter expert before delivery. Understanding training limitations shapes how they deploy the tool, which prevents quality failures that would have been attributed to AI use generally rather than to a specific deployment decision.
The generative AI foundations module of the workshop covers how today’s models work, what they can and can’t do, and how to choose between them.