AI Glossary · Letter H

Hierarchical Clustering.

An unsupervised machine learning method that groups data points into a hierarchy of nested clusters, from individual points up to a single cluster containing everything, producing a dendrogram that lets you cut the clustering at different levels of granularity without rerunning the algorithm. Hierarchical clustering is used in audience segmentation, content taxonomy development, and competitive landscape analysis where the hierarchical structure of the clusters is itself informative.

Also known as agglomerative clustering, dendrogram clustering, hierarchical agglomeration

What it is

A working definition of hierarchical clustering.

Hierarchical clustering builds a tree-structured partition of the data by either starting with each point in its own cluster and successively merging the two most similar clusters, the agglomerative approach, or starting with all points in one cluster and successively splitting the most heterogeneous cluster, the divisive approach. Agglomerative hierarchical clustering is much more commonly used in practice because it is computationally efficient and produces a dendrogram, a branching tree diagram, that visualizes the full hierarchy of cluster relationships. The choice of linkage criterion, the rule for measuring the distance between clusters, determines the shape of the resulting hierarchy: single linkage uses the distance between the closest pair of points across clusters and produces elongated chain-like clusters; complete linkage uses the distance between the farthest pair and produces compact spherical clusters; average linkage uses the mean distance between all pairs and produces clusters with intermediate properties.

The key advantage of hierarchical clustering over flat clustering methods like k-means is that it does not require specifying the number of clusters in advance. The dendrogram can be cut at any level, producing any number of clusters from 2 to n, and the optimal cut can be determined by visual inspection of the dendrogram, by computing the cophenetic correlation at different cuts, or by subject matter judgment about the right level of granularity for the application. This is particularly useful when the number of natural clusters in the data is not known and the hierarchical structure of the solution provides additional insight beyond a flat partition.

Hierarchical clustering is computationally expensive for large datasets: the naive agglomerative algorithm scales as O(n squared) in memory and O(n cubed) in computation, making it impractical for datasets with more than a few tens of thousands of points. For larger datasets, approximation methods including BIRCH and methods that apply hierarchical clustering to a subset of representative points before assigning the full dataset to the resulting clusters extend the approach to larger scales. For audience datasets with millions of rows, hierarchical clustering is typically applied to summary statistics or representative samples rather than to the full dataset.

Why ad agencies care

Why hierarchical clustering might matter more in agency work than in most industries.

Audience segmentation, content taxonomy development, and competitive landscape mapping all have natural hierarchical structure that flat clustering methods fail to capture. A working ad agency that can apply hierarchical clustering gets segmentation outputs that reveal the structure of the audience at multiple granularity levels simultaneously, enabling both broad targeting at the category level and precise targeting at the sub-segment level without running separate analyses.

Audience hierarchies enable flexible targeting at multiple granularities. A hierarchical clustering of a client’s customer base might reveal a broad cluster of value-oriented shoppers that subdivides into deal-hunters, quality-seekers-who-prioritize-price, and brand-loyal-discount-waiters at a finer level. Having the hierarchy rather than just the flat segmentation enables media planning that uses the broad category for awareness campaigns and the fine-grained sub-segments for personalized retention programs, all from the same analytical output. The hierarchy also enables the agency to recommend segment consolidation or expansion as campaign objectives change without rerunning the full analysis.

Content taxonomy development benefits from hierarchical cluster structure. When organizing a large content library, product catalog, or keyword set into a structured taxonomy, hierarchical clustering provides a data-driven starting point that reflects the actual similarity structure of the content. The resulting taxonomy can be cut at different depths: a broad taxonomy for editorial organization and navigation, a fine-grained taxonomy for content recommendation and targeting. Having a single hierarchical analysis that supports both use cases is more efficient than developing separate flat categorizations for each purpose.

Dendrogram inspection reveals cluster quality before committing to a solution. Unlike k-means, which produces a flat partition without information about cluster quality or structure, hierarchical clustering makes the cluster formation process visible. Large merges in the dendrogram, where two distant clusters are joined at a high level, indicate natural cluster boundaries. Many small merges at similar levels indicate a region of the data with fine-grained structure. Reading the dendrogram before cutting gives the analyst information about whether the data has clear natural cluster boundaries or a continuous gradation, which informs how much confidence to have in the resulting segmentation.

In practice

What hierarchical clustering looks like inside a working ad agency.

An agency is developing an audience segmentation strategy for a home improvement retailer with 1.2 million active loyalty members. Rather than a flat k-means segmentation, the agency applies agglomerative hierarchical clustering to a sample of 50,000 members using behavioral features including purchase category mix, transaction frequency, average basket size, seasonal purchase patterns, and online vs. in-store channel preference. The dendrogram reveals three major cluster branches at the top level: project-driven buyers who make large infrequent purchases across multiple categories, maintenance buyers who make frequent small purchases in a narrow category range, and DIY enthusiasts who show high engagement with instructional content and buy across both categories. Each major cluster subdivides at a finer level: project-driven buyers split into new homeowners and established homeowners with different category needs; DIY enthusiasts split into beginners who primarily buy tools and materials and advanced DIYers who also buy finishing products and specialty supplies. The agency presents the three-cluster view to the client for campaign planning and the six-cluster view for personalized loyalty program communications, using the same hierarchical analysis for both applications.

Build the audience analytics capability that reveals segmentation structure at multiple levels of granularity through The Creative Cadence Workshop.

The generative AI foundations module covers how machine learning methods discover structure in data, including the clustering approaches that produce audience and content taxonomies that reflect actual similarity relationships rather than arbitrary category assignments.