What is Metric Learning?

What it is

A working definition of metric learning.

Metric learning trains a neural network or other function to map inputs to a vector representation, called an embedding, such that the distance between embeddings reflects the desired notion of similarity. Two products that are functionally similar should have embeddings that are close together; two products that are unrelated should have embeddings that are far apart. The model learns the right embedding function from labeled pairs or triplets of examples: pairs of similar items that should be embedded close together, and pairs of dissimilar items that should be embedded far apart. The resulting embedding space can then be used to find similar items through nearest-neighbor search without any further model inference.

Contrastive learning, the dominant modern approach to metric learning, trains by pulling the embeddings of similar pairs toward each other while pushing the embeddings of dissimilar pairs apart in a shared representation space. SimCLR, MoCo, and CLIP are prominent examples: CLIP trains a shared embedding space for images and text where an image’s embedding is close to the embedding of its matching text description and far from non-matching descriptions. This alignment enables zero-shot cross-modal retrieval: finding images that match a text query without any labeled image-text pairs for that specific query type.

The practical advantage of metric learning over classification is generalization to novel categories. A classification model for product categories produces one output score per category and cannot generalize to categories not seen during training. A metric learning model produces embeddings that can be compared for any items, including items from categories not seen during training, as long as the relevant features are captured in the embedding. This open-world generalization is essential for product recommendation and visual search systems that must handle constantly expanding product catalogs with new items that have no interaction history.

Why ad agencies care

Why metric learning is the technology underlying the similarity-based AI tools agencies increasingly depend on.

A working ad agency using AI tools for creative asset search, audience lookalike construction, or product recommendation is working with metric learning systems even when the vendor documentation does not use that term. Any tool that retrieves items based on semantic similarity, finds audiences that resemble a seed set, or recommends products based on behavioral similarity is built on a learned embedding space where metric learning principles determine whether the similarity scores it produces are meaningful. Understanding metric learning helps agencies evaluate the quality of these similarity measures and diagnose why they sometimes produce unexpected results.

Creative asset retrieval systems based on semantic embeddings use metric learning to enable visual and text search. An agency creative library with thousands of approved assets needs a search system that finds assets by semantic content rather than keyword tags alone, such as finding “energetic outdoor images with warm color palette” without requiring that exact phrase to appear in metadata. CLIP-based retrieval systems embed both the query text and all creative assets into a shared metric space, then return the assets whose embeddings are closest to the query embedding. The quality of retrieval depends entirely on the quality of the underlying metric space: a model trained on general web data may not produce a metric space that captures the brand-specific notions of similarity that matter for the agency’s search queries.

Lookalike audience construction is nearest-neighbor search in a user embedding space. The core operation in building a lookalike audience is finding users in the broader population who are closest in some embedding space to the seed audience members. Platform-native lookalike tools use proprietary embedding spaces; first-party lookalike models built by agencies define their own embedding spaces from behavioral features. The quality of the lookalike depends on whether the embedding space captures the behavioral dimensions that are predictive of the target conversion outcome. An embedding space that is excellent for one product category may not capture the relevant similarity dimensions for a different category with different purchase drivers.

Product recommendation quality depends on the validity of the similarity metric. A recommendation system that suggests “similar products” is performing nearest-neighbor search in a learned product embedding space. If the embedding space conflates visual similarity with functional similarity, the system will recommend visually similar products that serve different functions, which is unhelpful. If the embedding space is learned from purchase co-occurrence patterns, it will recommend products that are frequently purchased together, which may reflect promotional bundling rather than genuine preference similarity. Agencies evaluating product recommendation vendors should ask what the embedding space was trained on and what notion of similarity it encodes.

In practice

What metric learning looks like inside a working ad agency.

An agency is building a visual content discovery tool for a fashion retailer client that enables buyers to upload a reference image and find similar items from the client’s catalog of 85,000 SKUs. The initial implementation uses a general-purpose image embedding model pre-trained on ImageNet, which produces embeddings based on visual features including color, texture, and shape. Testing reveals that the system often returns visually similar items that are not functionally relevant: searching for a white silk blouse frequently returns white tablecloths and white bedsheets because the color and texture features dominate the general-purpose embedding. The team retrains the embedding model using metric learning with fashion-specific similarity labels: pairs of garments that stylists have rated as appropriate substitutes are used as positive pairs (should be close in embedding space), and pairs rated as inappropriate substitutes are used as negative pairs (should be far apart). The retraining uses a contrastive loss that pulls positive pair embeddings closer and pushes negative pairs farther apart. After fine-tuning on 12,000 labeled pairs collected from the client’s styling team over three weeks, nearest-neighbor searches in the new embedding space return garments that are both visually similar and functionally appropriate substitutes in over 84% of test queries, compared to 41% for the general-purpose embedding. The fine-tuned metric space has learned to weight the fashion-relevant similarity dimensions that the general-purpose model learned to ignore.

Metric Learning.

A working definition of metric learning.

Why metric learning is the technology underlying the similarity-based AI tools agencies increasingly depend on.

What metric learning looks like inside a working ad agency.

Build the representation learning foundations that improve AI-powered similarity tools through The Creative Cadence Workshop.

Metric Learning.

A working definition of metric learning.

Why metric learning is the technology underlying the similarity-based AI tools agencies increasingly depend on.

What metric learning looks like inside a working ad agency.

Build the representation learning foundations that improve AI-powered similarity tools through The Creative Cadence Workshop.

Concepts in metric learning’s territory.