What a Compromised LLM Agent Actually Does on Your Network

Most security teams have a mental model of what a compromised server or endpoint does. The attacker escalates privileges, moves laterally, and exfiltrates data. A compromised AI agent runs the same playbook, but the steps happen through API calls rather than command-line tools, and the speed is dramatically faster because the agent can execute at machine speed without human decision latency.

The First 60 Seconds

Here is a concrete scenario. You are running a customer support agent that has access to your CRM API, your ticketing system, and a summarization tool that reads internal documentation. A user submits a support ticket containing an indirect prompt injection payload embedded in a code snippet. The agent processes the ticket and the injection succeeds.

Seconds 0-10: The agent enumerates available tools. It already knows them from its system prompt, but the injection may include instructions to test which tools are actually accessible. It makes test calls to each tool endpoint, confirming what it can reach.

Seconds 10-25: CRM enumeration. The agent calls the CRM list-customers endpoint, iterating through pages of results. Depending on the API's pagination limits, it can collect hundreds of customer records per request. If the CRM API includes contact details, account values, or payment information, those are now in the agent's context window.

Seconds 25-40: Exfiltration attempt. The agent attempts to write the collected data to an external destination. Common vectors include: creating a support ticket with the data in the description field (which might be visible to the original user), calling the documentation summarization tool with the data as input (to test if the tool's output is logged externally), or making a direct HTTP call to an attacker-controlled URL if the agent has any HTTP tool.

Seconds 40-60: Persistence or secondary compromise. The agent may attempt to modify tickets, create new service accounts through available APIs, or inject content into internal documentation that will be read by future agents.

What Stops Each Step

The scenario above assumes the agent has unconstrained tool access. With a runtime identity layer, each step either fails or triggers a detection event:

Tool enumeration: The agent's identity certificate lists the tools it is permitted to call. Calls to unlisted tools are rejected before they reach the tool endpoint. The rejection is logged with the agent's identity and the rejected target, creating an immediate detection signal.

CRM bulk enumeration: The policy can include rate limits per identity. A support agent processing one ticket should make at most a handful of CRM API calls per session. A rate limit of 20 CRM calls per agent lifetime catches bulk enumeration without affecting legitimate use. The agent hits the limit, all further CRM calls are blocked, and the rate-limit event is logged.

Exfiltration attempt: The agent's policy permits writing to the ticketing system and the documentation tool. It does not permit outbound HTTP to arbitrary URLs. The direct exfiltration call is rejected at the network layer. The ticketing system write is permitted but the behavior of writing a large volume of structured data to a ticket description triggers the anomaly detection threshold (normal support agents write short text, not paginated JSON records).

Persistence attempt: Creating service accounts requires calls to your identity provider API. That endpoint is not in the support agent's permitted tool list. Documentation writes that include content matching known sensitive data patterns (large JSON blobs, Base64-encoded strings) trigger an alert that pauses the agent pending review.

The Key Insight About Speed

A human attacker with shell access might take hours to enumerate a CRM and exfiltrate data because they are manually running commands and interpreting results. An AI agent runs the same sequence in under a minute because it has no decision latency and can parallelize tool calls.

Detection systems designed around human-speed attacks will not catch this. By the time a SIEM alert fires on the volume anomaly, the agent may have already written the exfiltrated data to a third-party system. Runtime policy enforcement at the identity layer stops each individual step before it completes, which is why it needs to happen at the boundary rather than in a post-hoc detection layer.

Marcus Chen leads security research at Riptides. Questions: [email protected]

← Back to Blog