What is Tensor? - Flux+Form

What it is

A working definition of tensors.

A tensor is characterized by its shape: the tuple specifying the size along each dimension. A scalar has shape () (zero dimensions). A vector has shape (n,) (one dimension of size n). A matrix has shape (m, n) (two dimensions). A 3D tensor has shape (a, b, c). In deep learning practice, tensors typically have shapes that encode semantic meaning: a batch of 32 grayscale images of size 64 by 64 pixels has shape (32, 64, 64), where the first dimension is the batch size, the second is height, and the third is width. A batch of color images would have shape (32, 64, 64, 3), with the fourth dimension encoding RGB channels. A batch of 32 text sequences, each padded to 128 tokens with 512-dimensional embeddings, has shape (32, 128, 512).

Tensor operations form the mathematical substrate of neural network computations. Matrix multiplication between weight matrices and activation tensors produces the linear transformations at each layer. Element-wise activation functions (ReLU, sigmoid, softmax) are applied to activation tensors without changing their shape. Convolutions apply weight-sharing operations across spatial dimensions of image tensors. Attention mechanisms compute query-key dot products and value-weighted sums across the sequence dimension of transformer input tensors. All of these operations are expressed as tensor operations and executed by hardware-accelerated tensor computation libraries including CUDA on NVIDIA GPUs and Apple Metal on Apple silicon, which can execute thousands of tensor operations in parallel.

Deep learning frameworks including PyTorch and TensorFlow provide tensor computation APIs that support automatic differentiation: the framework tracks the computational graph of tensor operations and automatically computes gradients of any scalar loss with respect to any tensor in the graph. This automatic differentiation capability eliminates the need to manually derive and implement gradient formulas for each model architecture, enabling practitioners to define new model architectures as sequences of tensor operations and immediately train them via backpropagation without mathematical derivation work beyond defining the forward pass and loss function.

Why ad agencies care

Why tensors are the universal data structure for AI system inputs and why understanding tensor shapes prevents common configuration errors.

A working ad agency that uses AI APIs for image processing, text generation, or multimodal content analysis is working with tensors at the API boundary: image inputs are encoded as image tensors, text inputs are tokenized and embedded as sequence tensors, and model outputs are tensors of logit scores or probability distributions over possible outputs. Understanding tensor shapes helps agencies correctly format API inputs, interpret API output structures, and diagnose configuration errors that produce cryptic shape mismatch errors when incorrect input formats are passed to models or inference pipelines.

Image tensor shape conventions differ across AI frameworks and understanding them prevents the most common image processing API errors. Different AI frameworks use different shape conventions for image data: some expect (batch, channels, height, width) (NCHW format, used by PyTorch) while others expect (batch, height, width, channels) (NHWC format, used by TensorFlow and many APIs). Passing an image tensor in NCHW format to an API expecting NHWC format produces outputs that are numerically nonsensical but may not produce an error, because the dimensions are compatible in size (the batch dimension and channel dimension are both typically small integers that may match in size for some inputs). Understanding the expected tensor format for each API call prevents this silent corruption of image inputs.

Batch size in the first tensor dimension controls the throughput-latency tradeoff for AI inference in production marketing applications. Running inference with batch size 1 processes one example at a time with minimal latency, which is appropriate for real-time personalization responses that must be returned in under 100ms. Running inference with batch size 128 processes 128 examples simultaneously on the same GPU hardware, achieving 10 to 30 times higher throughput at the cost of higher per-example latency (the system must wait for the full batch before returning any results). Batch inference is appropriate for offline scoring tasks like weekly propensity model refresh, where throughput is more important than individual request latency. Understanding the batch size dimension of tensors enables agencies to configure AI inference systems with the right throughput-latency tradeoff for each use case.

Embedding tensors from language models encode semantic content in their values and similarity between rows corresponds to semantic similarity between the corresponding inputs. A sentence transformer that encodes each text passage as a 768-dimensional embedding vector produces a 2D tensor of shape (n_passages, 768) for a batch of n passages. The rows of this tensor are embedding vectors; computing cosine similarity between rows measures the semantic similarity between the corresponding passages. This embedding tensor can be stored in a vector database, searched efficiently using approximate nearest neighbor algorithms, and used as input features to downstream classification or regression models. The semantic structure of the embedding space is encoded in the tensor values, making the tensor not just a data container but a structured representation of semantic relationships.

In practice

What tensor looks like inside a working ad agency.

An agency is building a product image similarity search tool for a fashion e-commerce client that enables shoppers to upload an image and find visually similar products in the catalog. The tool uses a pre-trained vision model to encode each product image as a 2,048-dimensional embedding vector, producing a catalog embedding tensor of shape (48,000, 2048) for the 48,000 active product images. This tensor is stored in a vector database (Pinecone) that supports approximate nearest neighbor search. At inference time, when a shopper uploads an image, the model encodes it as a (1, 2048) tensor (single query embedding), and the vector database retrieves the 20 product embedding rows from the catalog tensor with highest cosine similarity to the query embedding. The agency encounters two tensor-related configuration issues during development. First: the pre-trained model expects image inputs as (batch, 3, 224, 224) tensors in NCHW format, but the image preprocessing pipeline was producing (batch, 224, 224, 3) NHWC tensors. Passing NHWC images to an NCHW model produced valid-looking embeddings (no dimension errors) that were actually random noise, causing similarity search to return irrelevant products. Identifying and fixing the channel-dimension ordering resolves the issue. Second: processing the 48,000 catalog images for initial indexing at batch size 1 took 6.8 hours. Increasing batch size to 64 (fitting within GPU memory at the model’s memory requirements) reduces the catalog indexing time to 42 minutes, a 9.7x throughput improvement from the batch dimension change alone. The deployed tool achieves sub-200ms query latency including embedding computation and vector search, with evaluations showing that 78% of the top-5 returned similar products are rated as visually similar by a human review panel.

Tensor.

A working definition of tensors.

Why tensors are the universal data structure for AI system inputs and why understanding tensor shapes prevents common configuration errors.

What tensor looks like inside a working ad agency.

Build the deep learning data structure literacy that enables correct AI API integration and production deployment through The Creative Cadence Workshop.

Tensor.

A working definition of tensors.

Why tensors are the universal data structure for AI system inputs and why understanding tensor shapes prevents common configuration errors.

What tensor looks like inside a working ad agency.

Build the deep learning data structure literacy that enables correct AI API integration and production deployment through The Creative Cadence Workshop.

Concepts in tensor’s territory.