AI Glossary · Letter D

Data Transformation.

The process of converting data from one format, structure, or representation to another so it is usable by downstream systems, models, or analyses. For agencies, data transformation is the largely invisible work that determines whether two systems can actually talk to each other and whether historical data can be used for AI training.

Also known as data conversion, ETL transformation, data reshaping

What it is

A working definition of data transformation.

Data transformation covers any operation that changes the form of data: aggregating individual events into daily or weekly metrics, joining records from two separate systems on a shared key, pivoting rows into columns, extracting a date component from a timestamp, normalizing text to lowercase, or encoding a categorical variable as a numeric representation. The inputs are raw or intermediate data; the outputs are data shaped for a specific downstream purpose.

In analytics and machine learning workflows, transformation typically happens inside a data pipeline between extraction and loading, which is why this work is often called ETL (Extract, Transform, Load) or ELT when the sequence is reversed. Modern data tooling like dbt allows transformation logic to be written in SQL, version-controlled like software, and tested for correctness automatically.

Transformation choices are not neutral. Aggregating click events by day versus by session versus by user produces different features that a model will learn from differently. These choices encode assumptions about what time window is meaningful for the prediction task, and the right choice requires both domain knowledge and understanding of how the model uses temporal information.

Why ad agencies care

Why data transformation might matter more in agency work than in most industries.

Agency campaigns draw on data from multiple platforms with incompatible formats: ad platforms with their own event taxonomies, CRMs with their own field structures, analytics platforms with their own session definitions. Making these sources work together requires transformation, and the transformation logic determines the quality of the analysis built on top of it.

Transformation bugs are campaign analysis bugs. A transformation that incorrectly joins records from two systems will produce attribution numbers that are internally consistent but wrong. These bugs are often invisible because the numbers look plausible and no single data point is obviously impossible. They are typically discovered when a client cross-references results against a third source.

The same raw data can produce very different features. Whether to use session duration, page depth, or recency of last visit as a model feature is a transformation decision. Which of these best represents purchase intent for a specific client is a strategic decision. Agencies that treat transformation as purely technical lose a significant lever on model performance.

Documentation is the difference between a one-time project and a repeatable capability. Transformation logic that is embedded in an undocumented script cannot be maintained, reproduced, or audited. Documented transformation logic, version-controlled and tested, is an asset that persists across team changes and client questions.

In practice

What data transformation looks like inside a working ad agency.

An agency is building a multi-touch attribution model for a client and needs to join ad impression data with CRM conversion data. The ad platform uses a user identifier based on device cookies; the CRM uses email addresses. Neither system shares an identifier with the other. The transformation work involves building a probabilistic identity linkage step that connects the two identifier spaces using a combination of timestamps, device signals, and email-to-device matching from the client’s email ESP. That identity resolution step is the most consequential transformation in the pipeline, and getting it wrong means the attribution model joins the wrong events.

Build campaign data infrastructure that produces reliable analysis through The Creative Cadence Workshop.

The automations and agents module of the workshop teaches you how to build AI workflows that connect your clients’ data to campaign execution without the transformation errors that corrupt the analysis.