Traditional application security is mostly about who gets in. Agent security is about what happens once the AI is operating. An agent with write access to your database, the ability to send emails, and a vulnerability to prompt injection is a serious risk โ regardless of how well your auth layer is hardened.
The 5 critical agent threats
1. Prompt Injection
CriticalMalicious instructions embedded in data the agent processes โ web pages, uploaded files, tool outputs โ that override the agent's original task. Example: a webpage that tells your research agent to exfiltrate API keys to an attacker-controlled endpoint.
2. Excessive Privilege
CriticalAgents granted broader permissions than needed for their task. A research agent with write access to production databases, or a customer support agent with admin privileges. When exploited (or when any error occurs), the blast radius is total.
3. Data Exfiltration via Tool Use
HighAn agent prompted to summarize internal documents then instructed (via injection) to POST their contents to an external URL. Most LLMs will comply with tool calls they believe are part of the task.
4. Runaway Tool Execution
HighAn agent stuck in a loop โ or manipulated into one โ that repeatedly calls expensive or destructive tools: deleting files, spamming emails, or exhausting API quotas.
5. Model Jailbreaks
MediumAdversarial prompts that override safety guidelines baked into the base model. Less of a concern with modern frontier models (Claude Opus 4, GPT-5) but relevant for fine-tuned and open-weight models.
Hardening checklist
The security principle that matters most
Every agent should operate with exactly the permissions it needs for its task and nothing more. A research agent needs read access to documents and web search โ not email send, not database write, not code execution. Scope permissions to task at deploy time, not at request time. This single principle prevents most real-world agent security incidents.
Production-ready agent security on MoltBot
Granular permissions, domain allowlists, human-in-the-loop controls, and full audit logs. 14-day free trial.
Start Free Trial โ