AI Glossary · Letter F

Foundation Model.

A large AI model trained on broad, general-purpose data at massive scale that can be adapted to a wide range of downstream tasks through prompting, fine-tuning, or adding task-specific components. Foundation models are the infrastructure layer of modern AI: GPT-4, Claude, Gemini, Llama, Stable Diffusion, and DALL-E are all foundation models, and they underlie nearly every AI tool agencies use today.

Also known as base model, pre-trained model, large pre-trained model

What it is

A working definition of the foundation model.

Foundation models are distinguished from earlier task-specific models by their scale and generality. A traditional machine learning model is trained on a specific dataset for a specific task: a spam classifier, a sentiment analyzer, a conversion predictor. A foundation model is trained on enormous quantities of general-purpose data, such as a large fraction of the internet’s text or billions of images, using self-supervised objectives that require no manual labeling at scale. The result is a model that develops broad, general representations of language, images, or both that can be applied to tasks the model was not explicitly trained for.

The term “foundation model” was coined at Stanford in 2021 to describe this new paradigm in which a single large pre-trained model serves as the foundation for building many specialized systems. Before foundation models, building a capable AI system for a new task required collecting task-specific labeled data, training a specialized model, and accepting performance limited by the training data available. After foundation models, building a capable AI system for many tasks requires only specifying the task in a prompt or providing a small number of examples for few-shot adaptation, with the foundation model supplying the broad capabilities needed.

Foundation models exist for different modalities. Language foundation models like GPT, Claude, and Llama process and generate text. Vision foundation models like CLIP and DINOv2 produce rich visual representations. Multimodal foundation models like GPT-4V and Gemini process both text and images. Audio foundation models like Whisper handle speech transcription and voice understanding. Each modality has its own leading models and its own scaling properties, and multimodal foundation models that handle multiple input types are an increasingly important category for agency use cases that span content types.

Why ad agencies care

Why foundation models matter more in agency work than in most industries.

Every significant AI tool a working ad agency uses today is either a foundation model or built on top of one. Understanding what foundation models are, how they differ from one another, and what their capabilities and limitations are is the prerequisite for making informed decisions about which AI tools to use, when to rely on prompting versus fine-tuning, and how to evaluate whether a vendor’s AI claims are grounded in actual model capability.

Model selection is a consequential decision that requires understanding what distinguishes foundation models. Foundation models differ in training data composition, context window length, reasoning capability, instruction-following quality, and how well they handle specific domains. A foundation model with weak training data coverage of a client’s industry will produce outputs that hallucinate industry-specific details with high confidence. A foundation model with a short context window cannot process long documents. Choosing a foundation model, or accepting a vendor’s choice, without understanding these properties is making a consequential decision with insufficient information.

The cost and capability gap between foundation model tiers is significant. Frontier foundation models are the most capable but the most expensive to run and the most restricted in how they can be deployed. Open-weight foundation models can be self-hosted and fine-tuned but require infrastructure investment and technical expertise to operate. Small, efficient models can run on edge hardware but have lower capability ceilings. Matching model tier to task requirement is a cost and architecture decision that agencies making procurement and build choices encounter constantly.

Foundation model capability advances rapidly and invalidates previous assumptions. A task that required custom model training 18 months ago may be solvable by prompting a current foundation model. A performance ceiling that defined what was possible last year may have been removed by a new model release. Agencies that maintain a working understanding of foundation model capability across modalities are better positioned to identify when new model releases create new opportunities for client programs than agencies that treat AI tools as a fixed menu of available capabilities.

In practice

What foundation model looks like inside a working ad agency.

An agency is evaluating AI tools to support creative brief analysis for a consumer goods client. The use case requires reading a creative brief, identifying the primary audience insight, flagging potential messaging risks, and summarizing the brief in a standard format the creative team uses. Two years prior, this would have required a custom model trained on hundreds of labeled briefs. The agency evaluates three current foundation models by prompting each with 20 real briefs and comparing outputs against the creative director’s own assessments. One model consistently misidentifies the primary audience insight on briefs that use indirect or inferential language, suggesting weaker instruction-following for complex analytical tasks. Another produces the correct analysis but formats its output inconsistently across brief types, requiring post-processing. The third produces correctly structured, accurate output on 18 of 20 briefs with no additional prompting. The agency deploys the third model with a structured output schema in the prompt and integrates it into the client’s brief intake workflow, automating a task that previously required 20 to 30 minutes of a senior planner’s time per brief.

Build the foundation model fluency that makes every AI tool evaluation and deployment decision better-informed through The Creative Cadence Workshop.

The generative AI foundations module of the workshop covers how foundation models work, what distinguishes them from each other, and how to evaluate which model tier and type is appropriate for specific agency use cases.