A discipline that studies how to attack AI systems with crafted inputs and how to build defenses that make models more resilient. For agencies deploying AI tools in production, adversarial ML is the reason you pressure-test outputs before you trust them.
Also known as adversarial ML, ML security, adversarial robustness
Adversarial machine learning sits at the intersection of security and AI. Researchers in the field study two things: how to construct inputs that cause AI systems to fail in predictable ways, and how to build models that resist those failures. The attacks range from subtle pixel-level changes to images that flip a classifier’s output, to more practical manipulations of text or structured data that cause a model to behave unexpectedly.
On the defense side, the field drives techniques like adversarial training (exposing models to attack examples during training to build robustness) and systematic evaluation protocols. The work matters most in high-stakes applications where a bad output is costly: medical imaging, fraud detection, content moderation, and increasingly, creative evaluation and brand safety tools used in agency work.
Agencies may not build models, but they deploy them. Understanding how AI tools can be gamed is part of responsible deployment, especially when those tools are making decisions about brand placement, content quality, or audience eligibility.
Brand safety tools are not immune. AI-driven brand safety systems that block ads from running next to unsafe content can be fooled by adversarial inputs. Knowing this means knowing what backstops are necessary when the automated layer misses something.
Content moderation is a cat-and-mouse game. Client platforms that use AI to moderate user content are in continuous adversarial competition with bad actors who probe the system’s limits. Agencies advising on content strategy need to understand why moderation is not a solved problem.
Trust in AI outputs requires knowing their failure modes. When an agency tells a client their AI-powered creative evaluation or synthetic testing tool is reliable, that claim should be grounded in knowing what kinds of inputs the model handles poorly and what the review process catches.
An agency rolling out an AI-based creative quality scoring tool runs it against a sample of deliberately tricky inputs before presenting results to clients: ads with unusual layouts, copy that mixes languages, imagery that is borderline rather than clearly on-brand. What breaks the score? What produces confident wrong answers? That exercise is applied adversarial ML, even if no one calls it that. The goal is the same: find the failure modes before the client does.
The governance and disclosure module of the workshop covers the internal standards your agency needs to use AI without losing client trust or the integrity of the work.