An unsupervised machine learning technique that groups data points by similarity without using predefined labels, allowing patterns and segments to emerge from the data itself. For agencies, clustering is how AI-powered audience segmentation discovers groups that demographic targeting would never find.
Also known as unsupervised clustering, cluster analysis, data grouping
Clustering algorithms find structure in unlabeled data by placing similar observations into groups. The definition of similarity varies by algorithm and task, but typically involves measuring distance in a feature space. Points close together are assigned to the same cluster; points far apart are assigned to different ones. The clusters are not defined in advance: they emerge from the data’s structure.
Common clustering methods include k-means (which partitions data into a specified number of groups), hierarchical clustering (which builds a tree of nested clusters), and density-based methods like DBSCAN (which identify clusters of arbitrary shape and handle outliers as noise). Each has different strengths and is appropriate for different data structures and business questions.
Clustering is foundational to AI-powered customer segmentation workflows. Rather than assigning customers to predefined demographic buckets, clustering identifies behavioral and attitudinal groups that actually exist in the data, which often produce more predictive segments for campaign targeting.
Audience segmentation is one of the most consequential inputs to campaign strategy. Clustering enables a more evidence-based approach to segmentation by letting the data define the groups rather than projecting assumed categories onto the audience. The result is often segments that are more predictively useful and more surprising to the client than any demographic persona framework would produce.
It surfaces segments the client did not know existed. A clustering analysis of a client’s CRM data might reveal that a group of customers who look demographically identical actually split into two distinct behavioral clusters with very different purchase patterns and content preferences. Neither segment would have been identified by a standard demographic framework. Each warrants a different message and a different retention strategy.
The number of clusters is a business decision. Most clustering algorithms require specifying how many groups to produce, or tuning a parameter that controls cluster density. There is no purely statistical right answer to how many audience segments should exist. The decision should be driven by whether the resulting segments are actionable and distinguishable enough to warrant separate strategies.
Clusters require interpretation, not just generation. Running a clustering algorithm produces groups labeled 0, 1, 2, and 3. Making those groups useful requires human interpretation: examining the characteristic features of each cluster and translating them into a business-meaningful description. This is where strategists add value to the output of an unsupervised learning tool.
An agency runs a clustering analysis on a retail client’s two-year purchase and browsing history. Expecting to reproduce the client’s existing four customer personas, the clustering algorithm instead identifies six distinct groups. Two of the existing personas split into behavioral subgroups with meaningfully different purchase triggers. One new cluster turns out to be a high-value segment the client had been inadvertently under-serving. The agency presents the revised segmentation with cluster profiles built from behavioral and transactional features rather than demographics. The client revises three campaign briefs based on the new segment definitions.
The generative AI foundations module of the workshop covers how today’s models work, what they can and can’t do, and how to choose between them.