AI Glossary · Letter D

Data Quality.

The degree to which data is accurate, complete, consistent, timely, and fit for its intended use. For agencies, data quality is the single variable with the most leverage over AI tool performance, and the one most often treated as the client’s problem rather than the agency’s concern.

Also known as data accuracy, data reliability, data integrity

What it is

A working definition of data quality.

Data quality is measured across several dimensions. Accuracy: does the data reflect the real world correctly? Completeness: are the required fields present? Consistency: do values for the same entity agree across systems? Timeliness: is the data current enough for the use case it supports? Validity: do values conform to the expected format and range? Each dimension can fail independently, and each type of failure has different consequences for models trained on the data.

Quality problems are systematic, not random. A CRM field that was optional during data entry will have a non-random pattern of missing values: certain sales reps, certain time periods, or certain customer segments will be underrepresented. A model trained on that data will learn patterns that partially reflect data collection behavior rather than the underlying reality, and no one will know unless someone investigates the missing-value pattern.

Data quality is not a one-time state. Data that was high quality at collection degrades over time as the world changes: customers move, phone numbers are reassigned, businesses close, and product SKUs are discontinued. Quality monitoring is an ongoing practice, not a pre-project check.

Why ad agencies care

Why data quality might matter more in agency work than in most industries.

AI models learn exactly what the data says, including its errors. A high-quality model trained on low-quality data will produce low-quality predictions confidently. Agencies that allow poor data quality to pass unchallenged into model training pipelines are building problems into the foundation of their AI work, and those problems compound with every subsequent use of the model.

Quality is a shared responsibility, not a client obligation. Clients often do not know the quality of their own data. They have not run the checks, they do not track the metrics, and they assume the systems they use produce reliable data because the systems are expensive. Agencies that proactively assess data quality before scoping AI work protect themselves from building on a bad foundation.

Quality defines the ceiling on model performance. No model architecture, no amount of fine-tuning, and no increase in training data volume will overcome fundamentally poor data quality. The ceiling on model performance is set by the quality of the data before the model is trained. Agencies need to communicate this to clients rather than accepting poor data as a constraint to work around.

Quality metrics belong in campaign reporting. Agencies that report on model performance without also reporting on the quality of the data feeding those models are presenting an incomplete picture. A personalization model performing below expectation may be doing its job perfectly well on data that is no longer representative of the audience it was designed to serve.

In practice

What data quality looks like inside a working ad agency.

An agency inherits a client’s historical email engagement data to train a re-engagement model. Before training, a quality assessment reveals that 31% of the “open” events were recorded by bot activity from an email security scanner that the client’s previous ESP incorrectly attributed to human opens. The agency removes the bot-attributed opens from the training set. The resulting model, trained on 69% of the original data volume, performs substantially better on human re-engagement than a model trained on the full contaminated dataset would have.

Build data practices that make your AI work reliable through The Creative Cadence Workshop.

The governance and disclosure module of the workshop covers the internal standards your agency needs to use AI honestly and evaluate what your tools are actually learning from.