Benchmarks

How Declaw measures up

Sandbox TTI benchmarked using the computesdk/benchmarks methodology. Guardrails benchmarked against public datasets (gandalf, InjecAgent, deepset, XSTest, AgentDojo, PII-Masking-300k) and documented attack classes, with anonymized industry context.

Sandbox Performance

TTI (time to interactive) measures wall-clock from the sandbox.create() API call to the first command executing inside the sandbox. All results are from the ComputeSDK public leaderboard — 100 iterations per provider, scored as 60% median / 25% p95 / 15% p99, multiplied by success rate. Benchmarks run daily via automated GitHub Actions.

Declaw ranks #1 across all three modes — sequential (99.6), staggered (99.6), and burst (97.4). 40ms median TTI in sequential/staggered, 240ms under 100-sandbox concurrent burst. 100% success rate across all modes, 14 providers benchmarked.

Full results

CategoryBenchmarkDeclawBest public competitorDelta
Sequential TTIComposite score99.6 / 100Tensorlake: 98.0+1.6
Sequential TTIMedian40 msTensorlake: 190 ms−150 ms
Sequential TTIp9540 msTensorlake: 220 ms−180 ms
Sequential TTIp9940 msTensorlake: 220 ms−180 ms
Staggered TTI (200ms apart)Composite score99.6 / 100Tensorlake: 98.0+1.6
Staggered TTIMedian40 msTensorlake: 190 ms−150 ms
Staggered TTIp9540 msTensorlake: 210 ms−170 ms
Staggered TTIp9940 msTensorlake: 220 ms−180 ms
Burst TTI (100 simultaneous)Composite score97.4 / 100Tensorlake: 95.1+2.3
Burst TTIMedian240 msTensorlake: 430 ms−190 ms
Burst TTIp95290 msTensorlake: 560 ms−270 ms
Burst TTIp99290 msTensorlake: 580 ms−290 ms
Burst TTIWall clock (100 sandboxes)0.31 sTensorlake: 0.60 s−0.29 s

Source: ComputeSDK Sandbox Leaderboard, May 9, 2026. 100 iterations per provider, automated daily runs via GitHub Actions. Methodology: github.com/computesdk/benchmarks.

Burst concurrency — 100,000 sandboxes

The ComputeSDK Scale Invitational is a separate burst-concurrency benchmark: launch 100,000 concurrent sandboxes in a single run and measure time-to-target. Only a handful of providers can complete it. Declaw reached 100% of target — all 100,000 concurrent — one of six providers to do so.

100,000 / 100,000 peak concurrent sandboxes reached (1 vCPU configuration), June 18, 2026. Alongside e2b, Modal, TensorLake, Northflank, and Isorun — the six providers that completed the 100k burst.

Source: ComputeSDK 2026 Scale Invitational.

Guardrails

1. Prompt Injection

Declaw's full prompt-injection defense is measured as one black box — payload in, BLOCK or ALLOW out — against the deployed services with every layer active: static signatures, an ML classifier, a policy gate, and an LLM judge. It's evaluated across real-world attacks, indirect (tool-output) injection, balanced detection, and over-refusal on benign prompts.

Results — full deployed defense, measured as one black box

BenchmarkWhat it measuresDeclaw
gandalf — 1,000 real attacksRecall on real, human-authored injection attacks99.9%
InjecAgent — 1,054 casesIndirect-injection recall: poisoned tool output → hijacked egress99.9%
deepset — 662 promptsBalanced detection (recall · precision · F1 · false-positives)0.954 F1 · 0.916 R · ~0.996 P · 0.3% FP
XSTest-safe — 250 promptsOver-refusal: benign prompts wrongly blocked (lower is better)1.6%

Near-total recall on real-world attacks (99.9%) and indirect injection (99.9%), with strong precision — 0.954 F1 at a 0.3% false-positive rate on balanced detection, and just 1.6% over-refusal on benign prompts that look unsafe. For context, on held-out industry comparisons the best-published commercial guardrails score in the mid-90s and open guardrail models around 62–79%; recall-skewed open classifiers reach ~0.86 F1 but at roughly 24% false-positives.

Measured against the deployed services as one fused BLOCK/ALLOW decision per case — static signatures + ML classifier + policy gate + LLM judge, every layer active. Industry context uses anonymized published ranges. deepset is not a fair cross-system board (several published detectors are fine-tuned on its train split), so Declaw is reported on an absolute basis; models reporting ~0.99 there are trained in-distribution.

2. Agentic Indirect Injection (AgentDojo)

The numbers above measure detection on static datasets. The harder question is whether a guardrail actually stops an attack inside a live agent loop, where injected instructions hide inside tool outputs (emails, Slack messages, documents) and the LLM must be prevented from acting on them. AgentDojo (ETH Zürich SPY Lab, NeurIPS 2024) is the canonical benchmark for this: it runs real tool-using agents through curated user tasks with adversarial instructions planted in returned tool data.

Declaw ships as an egress gate: it sits in front of the agent's tool calls and judges each proposed action against the untrusted tool output that preceded it — the same indirect-injection path that scores 99.9% on InjecAgent above. Run inside AgentDojo's live loop (banking suite, gpt-4.1), the gate is utility-neutral — it does not degrade task completion (0.847 vs 0.819 baseline) while gating malicious egress.

Near-total on overt indirect injection, utility held to account. Declaw blocks the poisoned-tool-output → hijacked-egress attacks it's built for almost entirely (InjecAgent 99.9%), without taxing the agent's real work in a live agent loop.

Honest scope: AgentDojo also includes covert action-redirection — a legitimately-shaped action quietly retargeted to the attacker — a harder, different class than the overt indirect injection Declaw blocks near-totally. Targeted-redirection detection is in active development.

3. Invisible-Text Attacks

Invisible-text attacks use Unicode encoding tricks to hide adversarial instructions from reviewers and LLM interfaces. Recent disclosures include Sourcegraph Amp Code and Google Jules coding agent (both patched in 2025), plus documented research from AWS, Cisco Talos, Embrace the Red, and Keysight.

No public benchmark exists for this category. Declaw's coverage is evaluated against a curated corpus of 78 attack payloads drawn from the above disclosures plus 35 benign controls (CJK, Arabic, Hebrew, emoji, URLs, code). Overall F1 = 0.961 at production threshold.

Coverage by documented attack class

Attack classCoverageRepresentative
Unicode Tags block (U+E0000–U+E007F)100% (20/20)Sourcegraph Amp, Google Jules, AWS 2025
Zero-width characters100% (20/20)ZWSP, ZWNJ, ZWJ, BOM
Bidi control overrides100% (10/10)Trojan-source-style RLO
Private Use Area100% (8/8)Custom-glyph smuggling
Combined / in-the-wild payloads100% (10/10)Embrace the Red "ASCII Smuggler"
Soft-hyphen / format controls75% (3/4)Soft hyphen caught; U+FFFC missed
Unassigned codepoints67% (2/3)
Variation selectors0% (by design)Legitimate emoji use

Declaw covers 100% of the documented 2024–2026 in-the-wild invisible-text attack classes (Tags block, zero-width, bidi, PUA). Lakera Guard, ProtectAI LLM Guard, and AWS Bedrock Guardrails do not publish comparable coverage numbers for this category.

4. PII Detection

Evaluation uses the ai4privacy/pii-masking-300k validation set (1,000 samples), with entity-level overlap scoring on the aligned-type intersection (PERSON, EMAIL, PHONE, SSN, IP, CREDIT_CARD, LOCATION, DATE). Declaw targets US-format PII — US SSN (XXX-XX-XXXX), North American phone, plus format-deterministic global types (email, IP, credit card).

Strong categories (format-deterministic, Declaw's product scope)

TypePrecisionRecallF1
EMAIL0.9570.9970.977
IP address0.9721.0000.986
US SSN (US-format subset)1.0001.0001.000

For the types enterprise compliance teams most frequently audit — email, IP, and US SSN — Declaw's detection is on par with or exceeds leading PII-specific solutions. Declaw pairs detection with redaction and a full audit log for SOC2 / HIPAA workflows.