API Security for AI Agents: Beyond Rate Limiting

Rate limiting was built for bots that hammer endpoints at high volume. It is a poor fit for AI agents, which can cause significant damage by making a small number of well-targeted calls. An agent that reads one sensitive record per second for ten minutes, stays comfortably within a 60 requests-per-minute rate limit, and exfiltrates 600 records. The rate limiter never fires. The right controls are different.

The Problem with Session-Level Rate Limiting

Most API gateways rate-limit at the API key or IP level. For an AI agent, the relevant unit is the agent session, not the API key. A single API key might be shared across multiple agent instances (bad security practice, but common). Rate limiting at the key level means you are measuring total volume across all instances using that key, which tells you very little about whether any individual instance is behaving anomalously.

Per-session rate limiting, where the session is the agent instance's cryptographic identity, is a completely different control. It says "this specific agent may make at most N calls to endpoint E per session lifetime." The limit applies regardless of how many other agent instances are running simultaneously, and it is tight enough to catch within-session abuse that session-blind rate limiting misses entirely.

Context-Aware Access Control

The most important API security control for AI agents is not rate limiting — it is context-aware access control. This means the API validates not just that the caller has a valid key, but that the specific request makes sense given the caller's current context.

Concretely: a support agent handling ticket T-4891 for customer C-771 should only be permitted to access records where customer_id = C-771. An API gateway enforcing context-aware access rejects requests for customer C-332's records even though the caller has a valid key, because the task context says this session is scoped to C-771.

Implementing this requires that the task context be included in the agent's identity token and that the API gateway is configured to validate the token's scope against the requested resource. Kong, AWS API Gateway, and Apigee all support custom authorization policies that can extract claims from JWT tokens and use them in access decisions. The agent identity certificate functions as the JWT, with the task scope as a custom claim.

Write vs Read Asymmetry

Read operations and write operations carry different risk profiles. A compromised agent that reads data causes a confidentiality breach. A compromised agent that writes data can cause a confidentiality breach, an integrity breach, and potentially a denial-of-service if it floods writable resources.

Apply stricter controls to write operations. Rate limits on writes should be tighter than on reads. The contextual scope for writes should be narrower: an agent may read any customer record it was assigned, but it should only write to the specific ticket it was assigned to handle. Write operations should generate audit events at higher priority than reads, and write anomalies should trigger immediate alerts rather than queued analysis.

Protecting Your Third-Party API Surface

AI agents frequently call external APIs: OpenAI for LLM inference, Twilio for messaging, Stripe for payment processing. These APIs present a different security challenge: you cannot control how the external API validates your agent's identity, and a compromised agent with your Stripe API key can initiate fraudulent charges.

The control here is a credential proxy: agents never hold the real third-party API keys. They hold proxy tokens issued by your identity infrastructure, which is the only system that knows the real keys. Every agent call to a third-party API goes through the proxy, which validates the agent's identity, checks the requested operation against the agent's policy, and only then presents the real credential to the external API. If the proxy is auditing correctly, you have a complete log of every external API call made by every agent instance, with the agent identity attached.

Rate limiting is part of this picture: set external API rate limits at the proxy level, per agent identity, to prevent a single compromised agent from exhausting your API quota or running up costs on usage-priced APIs. But it is one control in a stack, not the primary defense.

Marcus Chen leads security research at Riptides. Questions: [email protected]

← Back to Blog