AI Agent Security: How to Harden AI Agents for Production

Traditional application security is mostly about who gets in. Agent security is about what happens once the AI is operating. An agent with write access to your database, the ability to send emails, and a vulnerability to prompt injection is a serious risk — regardless of how well your auth layer is hardened.

The 5 critical agent threats

1. Prompt Injection

Critical

Malicious instructions embedded in data the agent processes — web pages, uploaded files, tool outputs — that override the agent's original task. Example: a webpage that tells your research agent to exfiltrate API keys to an attacker-controlled endpoint.

✓ Mitigation: Input sanitization, tool output sandboxing, task-scoped permissions, human-in-the-loop for sensitive actions

2. Excessive Privilege

Critical

Agents granted broader permissions than needed for their task. A research agent with write access to production databases, or a customer support agent with admin privileges. When exploited (or when any error occurs), the blast radius is total.

✓ Mitigation: Principle of least privilege per task, scoped API credentials, read-only by default

3. Data Exfiltration via Tool Use

High

An agent prompted to summarize internal documents then instructed (via injection) to POST their contents to an external URL. Most LLMs will comply with tool calls they believe are part of the task.

✓ Mitigation: Allowlist external domains for HTTP tools, log all outbound calls, block large data exfiltration

4. Runaway Tool Execution

High

An agent stuck in a loop — or manipulated into one — that repeatedly calls expensive or destructive tools: deleting files, spamming emails, or exhausting API quotas.

✓ Mitigation: Hard caps on tool call counts per run, cost alerts, idempotency checks before write operations

5. Model Jailbreaks

Medium

Adversarial prompts that override safety guidelines baked into the base model. Less of a concern with modern frontier models (Claude Opus 4, GPT-5) but relevant for fine-tuned and open-weight models.

✓ Mitigation: Use frontier models with strong safety training, add a safety classifier layer on output, filter sensitive action categories

Hardening checklist

# MoltBot security configuration
agent = Agent(
    model="claude-opus-4",
    permissions=Permissions(
        http=HTTPPermissions(
            allowed_domains=["api.company.com", "trusted-data-source.com"],
            max_response_size_kb=512
        ),
        filesystem=None,     # No filesystem access
        code_exec=False,        # No arbitrary code execution
        db_write=False           # Read-only DB access
    ),
    limits=Limits(
        max_tool_calls=20,
        max_duration_seconds=120,
        max_cost_usd=0.50
    ),
    human_in_loop=["send_email", "delete_record"]  # Require approval
)
      

The security principle that matters most

Every agent should operate with exactly the permissions it needs for its task and nothing more. A research agent needs read access to documents and web search — not email send, not database write, not code execution. Scope permissions to task at deploy time, not at request time. This single principle prevents most real-world agent security incidents.

Production-ready agent security on MoltBot

Granular permissions, domain allowlists, human-in-the-loop controls, and full audit logs. 14-day free trial.

Start Free Trial →

AI Agent Security: How to Harden Agents for Production