Anatomy of an Agent Data Leak
We walked through a real scenario where a coding agent leaked a database password to a third-party API. Here's exactly how it happened and what would have stopped it.
The setup
Here's a scenario based on something we actually debugged. Names and specifics are changed, but the mechanics are real.
A team is running a coding agent that helps developers with their codebase. The agent has access to a project directory inside a sandbox. A developer asks it to help debug a database connection issue.
Seems harmless.
What the agent did
The agent read the project's .env file to understand the database configuration. Standard move — it needs the connection string to debug the issue.
The .env file contained:
DB_HOST=prod-db.internal.company.com
DB_USER=app_service
DB_PASSWORD=k8s-prod-Xj7!mN2@qR
DB_NAME=customers
OPENAI_API_KEY=sk-proj-abc123...
The agent then sent its analysis to the completions API. The prompt included the full .env contents as "context" so it could reason about the connection configuration.
That means DB_PASSWORD=k8s-prod-Xj7!mN2@qR and sk-proj-abc123... left the sandbox in a POST body to a third-party API.
Why this is bad
The sandbox worked perfectly. The agent didn't escape. It didn't write to the host filesystem. It didn't spawn rogue processes.
But a production database password and an API key are now sitting in a third-party's request logs. Depending on the provider's retention policy, that data could persist for 30 days or more. If that provider ever gets breached, your credentials are in the dump.
What would have caught this
Let's walk through what Declaw's pipeline does with this exact scenario.
Step 1: PII and secrets detection. Before the request leaves the sandbox, the security proxy scans the body. It flags DB_PASSWORD=k8s-prod-Xj7!mN2@qR as a credential and sk-proj-abc123... as an API key.
Step 2: Redaction. The values are replaced with tokens: [CREDENTIAL_REDACTED] and [API_KEY_REDACTED]. The mapping is stored locally inside the sandbox.
Step 3: The request goes out. The LLM receives the sanitized version. It can still reason about the database configuration — it sees the variable names, the host, the database name. It just can't see the actual secrets.
Step 4: Response rehydration. If the LLM's response references the redacted values (say, it suggests a connection string), the proxy restores the originals before the agent sees the response. The agent's workflow isn't interrupted.
Step 5: Audit log. The redaction event is logged with timestamp, request URL, what was redacted, and which sandbox it came from. No secrets in the log — just the fact that redaction happened.
The non-obvious part
The thing that makes this hard to solve externally is context. An external guardrails service would see the outbound HTTP request, sure. But it wouldn't necessarily know that the body contains file contents from the sandbox. It doesn't have access to the .env file to know those are real credentials vs. example values.
Declaw's proxy runs inside the VM. It can cross-reference what the agent read from disk with what it's sending over the network. That shared context is what makes the detection accurate instead of just pattern-matching on regex.
Practical takeaways
A few things we learned from debugging scenarios like this:
- Agents will always find secrets. If there's a credential in the sandbox filesystem, assume the agent will read it and forward it somewhere. Don't rely on the agent being "smart enough" not to.
- Redaction beats blocking. If you just block the request, the agent retries or fails. Redaction lets the workflow continue while keeping secrets out of third-party systems.
- Audit trails matter more than you think. When a security team asks "did any credentials leave our environment in the last 30 days," you want a definitive answer, not a shrug.
The Declaw SDK handles all of this out of the box. Three lines to create a sandbox with a security policy, and you're covered.
from declaw import Sandbox, SecurityPolicy, PIIConfig, CodeSecurityConfig
policy = SecurityPolicy(
pii=PIIConfig(enabled=True),
code_security=CodeSecurityConfig(enabled=True),
)
sbx = Sandbox.create(security=policy)
That's it. No external service to configure, no traffic routing to set up.