One complete pass through the full training dataset during model training. Training a model for the right number of epochs is a fundamental configuration decision: too few produces an underfit model that has not learned the patterns in the data; too many produces an overfit model that has memorized the training examples rather than generalizing from them.
Also known as training epoch, training pass, training iteration
During training, a model updates its weights based on the errors it makes on individual examples or small groups of examples called batches. One epoch is complete when the model has processed every example in the training dataset once. Training then continues with another epoch: the model sees all the same examples again, but because its weights have changed, it makes different errors and updates its weights differently. This process repeats for a specified number of epochs or until a stopping criterion is met.
Early in training, each epoch produces substantial improvement because the model is still learning the basic patterns in the data. As training progresses, the improvement per epoch decreases. Eventually, the model stops improving on the validation set, the held-out data used to assess generalization, and may begin to degrade as it starts overfitting to the specific details of the training examples. Monitoring performance on the validation set at the end of each epoch and stopping training when validation performance peaks is called early stopping, and is the standard approach to avoiding the overfitting that occurs from training too many epochs.
For fine-tuning large pre-trained models, the number of epochs is much lower than for training from scratch. Fine-tuning a foundation model on a small client dataset typically requires one to five epochs. More epochs cause the model to forget the broad capabilities it learned during pre-training and overfit to the narrow fine-tuning data, a phenomenon called catastrophic forgetting.
Epoch count is a direct lever on model quality, and it is one of the first configuration decisions that produces observable problems when set incorrectly. A working ad agency deploying custom models needs to monitor training behavior across epochs and use that monitoring to set appropriate stopping points, rather than training for an arbitrary fixed number of epochs and hoping the result is acceptable.
Fine-tuning language models for brand voice is an epoch-sensitive task. Training for too many epochs on a small brand content dataset causes the model to reproduce specific phrases from the training examples rather than developing a generalizable brand voice. One to three epochs on a curated 500-1,000 document dataset typically produces the best balance between brand voice adaptation and retention of the model’s general language capabilities.
Epoch monitoring surfaces data quality problems. If validation accuracy does not improve after the first two or three epochs, the training data likely has quality issues: inconsistent labeling, insufficient examples, or a poor match between the training distribution and the intended use case. Epoch-by-epoch monitoring converts what would otherwise be a silent failure into an observable signal that something needs to be fixed before training continues.
It informs vendor conversations about training cost and quality. Vendors who claim to train custom models quickly on small datasets are often training for very few epochs on small data, which produces models that appear functional in demo conditions but degrade quickly in production. Asking how many epochs a vendor trains on client data, and what their early stopping criterion is, surfaces the rigor of their training practice without requiring deep technical expertise.
An agency is fine-tuning a language model to generate product descriptions in the brand voice of a luxury accessories client. Initial training runs use a fixed 20-epoch schedule recommended in a tutorial. The resulting model produces fluent text but shows signs of overfitting: it repeatedly uses specific phrases lifted directly from the 300-document training set rather than generalizing the voice. The agency implements validation monitoring by holding out 60 documents from fine-tuning and measuring perplexity on the held-out set after each epoch. Validation perplexity bottoms out at epoch 4 and begins rising from epoch 6, indicating overfitting onset. Retraining with early stopping at epoch 4 produces a model that generalizes the brand voice across new product categories that were not represented in the training set.
The generative AI foundations module of the workshop covers how model training works, what the key configuration decisions determine, and how to evaluate whether a custom or fine-tuned model is ready for client deployment.