AI Glossary · Letter D

Defensive AI.

Techniques and frameworks designed to make AI systems resistant to adversarial attacks, manipulation attempts, and unintended failure modes that emerge from inputs specifically crafted to cause errors. For agencies, defensive AI is increasingly relevant as AI-powered content and targeting tools become surfaces that bad actors actively probe for weaknesses.

Also known as adversarial defense, robust AI, AI security

What it is

A working definition of defensive AI.

Defensive AI addresses the fact that machine learning models are vulnerable to adversarial inputs: specially constructed examples designed to cause errors or extract sensitive information. An image classifier can be fooled by imperceptible pixel-level perturbations that make it misclassify a stop sign as a speed limit sign. A content moderation model can be evaded by text substitutions that preserve meaning for humans but change the model’s classification. A language model can be manipulated through prompt injection to ignore its instructions.

Defensive techniques include adversarial training (exposing the model to adversarial examples during training), input validation (detecting and rejecting inputs that show signs of adversarial manipulation), ensemble methods (using multiple models so an attack on one is less likely to fool all), and output monitoring (detecting when a model’s outputs show patterns inconsistent with normal operation).

The field evolves continuously in an adversarial dynamic: as new defenses are developed, new attacks are designed to evade them. AI governance frameworks increasingly treat adversarial robustness as a baseline requirement for high-stakes deployments rather than an optional enhancement.

Why ad agencies care

Why defensive AI might matter more in agency work than in most industries.

Agencies build and recommend AI systems that face adversarial inputs in production: content moderation tools that spammers try to evade, brand safety classifiers that low-quality publishers try to fool, chatbots that users attempt to manipulate, and targeting algorithms that fraudsters probe for exploitable patterns. Understanding the adversarial threat landscape is part of responsible AI deployment.

Prompt injection is a real attack on agency workflows. Any system that takes user-provided text and feeds it to a language model is potentially vulnerable to prompt injection: input crafted to override the model’s instructions. Agency chatbots, AI-powered content analysis tools, and automated brief processing systems all have this exposure if not designed with injection defense in mind.

Brand safety tools can be evaded. Publishers who want to capture brand-safe advertising revenue while running brand-unsafe content have incentives to understand how classifiers work and optimize their content to evade them. Agencies responsible for brand safety monitoring need to test their tools against adversarial examples, not just benign test sets.

It intersects with data governance and cybersecurity. Responsible AI frameworks that address adversarial robustness overlap with cybersecurity practices around input validation, access controls, and monitoring. Agencies building AI systems for clients in regulated industries need to treat adversarial robustness as a security requirement, not just a model quality metric.

In practice

What defensive AI looks like inside a working ad agency.

An agency deploys an AI-powered customer service chatbot for a retail client. Within the first week, a monitoring review reveals that several users have discovered a prompt injection pattern that causes the chatbot to reveal information about its underlying instructions and to offer discounts that are not in the approved response library. The agency implements input sanitization that detects and neutralizes known injection patterns, adds a response filter that flags outputs containing pricing information for immediate review, and tightens the system prompt to reduce the attack surface. Ongoing adversarial testing becomes part of the monthly maintenance protocol.

Build AI systems your clients can trust to behave as intended through The Creative Cadence Workshop.

The governance and disclosure module of the workshop covers the internal standards your agency needs to deploy AI without creating security and trust risks for your clients.