AI Glossary · Letter V

Voxel.

A three-dimensional pixel: the smallest unit of a volumetric representation of a three-dimensional space, analogous to a pixel in a 2D image but extended into three dimensions by adding depth. Voxels are the native representation for 3D medical imaging, LiDAR point cloud processing, and 3D generative AI systems that produce volumetric content for virtual and augmented reality applications, and understanding voxels clarifies how 3D generative AI models create and manipulate three-dimensional space rather than flat images.

Also known as volume element, 3D pixel, volumetric unit

What it is

A working definition of voxel.

A voxel (volume element) is a value on a regular three-dimensional grid, representing the properties of the space at a specific 3D location. Where a pixel in a 2D image has x and y coordinates plus a color value, a voxel has x, y, and z coordinates plus one or more values representing the properties of that point in 3D space. These values might represent occupancy (is this location inside a solid object or empty?), density (how dense is the material at this location?), color, or any other property of interest. A volumetric representation is a 3D array of voxels, which together describe the properties of a 3D volume at the resolution determined by the voxel grid spacing.

Voxel representations are used in applications where 3D spatial structure matters. Medical imaging systems such as CT and MRI scanners produce volumetric scans naturally represented as voxel grids, where each voxel’s value represents tissue density or magnetic resonance signal intensity. LiDAR sensors used in autonomous vehicles and 3D mapping produce point clouds that are often converted to voxel grids for processing by 3D convolutional neural networks. 3D content creation and game engines use voxel-based world representations for certain types of procedurally generated environments. 3D generative AI models such as NeRF (Neural Radiance Field) and 3D Gaussian Splatting use implicit or explicit 3D representations to model the appearance of a scene from multiple viewpoints.

The tradeoff between voxel grid resolution and memory requirement limits the practical resolution of voxel-based 3D representations. A voxel grid at resolution N x N x N requires N-cubed voxel values: doubling the resolution in each dimension multiplies the storage and compute requirement by 8. A 256 x 256 x 256 voxel grid has over 16 million voxels. This cubic scaling makes high-resolution voxel representations computationally expensive, motivating alternative 3D representations such as surface meshes (which only represent the boundary of objects rather than their full volume), point clouds (which sample 3D points sparsely), and implicit neural representations that learn a continuous function over 3D space rather than storing discrete voxel values.

Why ad agencies care

Why voxels and volumetric AI representations underlie 3D generative tools and spatial computing platforms relevant to forward-looking agency creative capabilities.

A working ad agency developing capabilities in 3D product visualization, virtual reality brand experiences, or augmented reality advertising formats encounters voxel and volumetric representation concepts in the AI tools and rendering pipelines that power these formats. Understanding what voxels are, why high-resolution 3D content is computationally expensive to generate and render, and how neural radiance fields and other implicit 3D representations differ from voxel grids in efficiency and quality provides the conceptual foundation for making informed tool and format choices as 3D and spatial computing formats mature in the advertising ecosystem.

3D product scan pipelines that produce voxel or mesh representations enable photorealistic product visualization for virtual try-on and AR e-commerce. Consumer brands with physical products can create accurate 3D digital twins of their products through photogrammetry (reconstructing 3D shape from multiple 2D photographs) or structured light scanning. These 3D representations, whether stored as voxel grids, surface meshes, or neural radiance fields, enable photorealistic rendering of the product from any viewpoint and in any environment, powering AR try-on (place this sofa in your living room) and 3D product pages (rotate this product in 3D in the browser). The quality of the 3D scan, measured in terms of geometric accuracy and texture fidelity, determines whether the virtual product representation is convincing enough to support purchase confidence in the absence of physical handling.

Neural radiance fields outperform voxel grids for photorealistic 3D scene reconstruction from photograph sets, with implications for virtual brand environment production. NeRF models learn an implicit representation of a 3D scene from a set of 2D photographs taken from different viewpoints, enabling photorealistic rendering of the scene from novel viewpoints not present in the training photographs. Because NeRF uses a continuous neural function rather than a discrete voxel grid, it achieves much higher visual quality at lower memory cost for complex scenes with fine detail, reflective surfaces, and smooth geometry. For brand and product photography workflows, NeRF-based 3D reconstruction enables the creation of 3D interactive product experiences from a standard photography shoot rather than requiring specialized 3D scanning equipment, reducing the production cost of 3D product content.

Understanding voxel resolution constraints clarifies why real-time 3D advertising formats require highly optimized mesh representations rather than volumetric storage. Delivering a 3D product visualization in an AR advertisement that renders in real time on a consumer mobile device requires a geometric representation that is computationally efficient enough to render at 60 frames per second with the device’s GPU. A high-resolution voxel representation of a complex product object would require memory and compute that exceeds mobile device capacity. The production workflow therefore converts 3D scan data to optimized low-polygon surface meshes, baking lighting and material properties into texture maps, producing a real-time renderable asset from the high-resolution scan. Understanding this conversion pipeline helps agencies brief and evaluate 3D asset production for AR advertising formats.

In practice

What voxel looks like inside a working ad agency.

An agency develops a 3D product visualization capability for a premium footwear client launching a direct-to-consumer e-commerce channel. The client has 140 active SKUs and wants to offer 3D interactive product views and an augmented reality “see it on your floor” feature for its 8 hero products. The agency evaluates three 3D content production approaches. Option 1 (traditional photogrammetry to voxel grid): captures 180 photographs of each shoe from standardized angles, processes with photogrammetry software to produce a dense voxel reconstruction, then converts to a mesh. Estimated production time: 6 hours per SKU; estimated cost: $180 per SKU for 140 SKUs = $25,200 total. Resulting mesh quality: high geometric accuracy, good texture for mid-range and premium SKUs. Option 2 (NeRF-based reconstruction): captures the same photograph set, trains a NeRF model per product, and uses the NeRF rendering directly for the interactive viewer. Estimated production time: 8 hours per SKU including NeRF training; higher visual quality for complex materials (patent leather, metallic details); but NeRF rendering is too computationally intensive for mobile AR use without conversion to a mesh. Option 3 (manual 3D modeling): contracts a 3D artist to produce an optimized low-polygon mesh for each hero SKU. Estimated cost: $2,200 per SKU for 8 hero SKUs = $17,600; too expensive to scale to all 140 SKUs. The agency recommends a hybrid: manual 3D modeling for the 8 hero products (highest quality, real-time AR capable), photogrammetry-to-mesh for the remaining 132 SKUs (cost-effective, adequate quality), and NeRF rendering for the 3D interactive product viewer on the website where real-time mobile AR performance is not required. The AR feature for the 8 hero products achieves an average 3D interaction rate of 34% of product page visitors and is associated with 22% higher add-to-cart rate for those products compared to the standard photography product pages, supporting expansion of the AR feature to additional SKUs in the next production cycle.

Build the 3D content and spatial computing expertise that positions your agency for volumetric advertising formats and AR product visualization through The Creative Cadence Workshop.

The generative AI foundations module covers 3D representations including voxels, point clouds, surface meshes, and neural radiance fields, and how volumetric AI capabilities apply to product visualization, AR advertising, and virtual brand experience production.