Benchmarks

How Declaw measures up

Sandbox TTI benchmarked using the computesdk/benchmarks methodology. Guardrails benchmarked against public datasets (Deepset, Lakera Gandalf, jailbreak-classification, AgentDojo, PII-Masking-300k) and documented attack classes. All results compared directly to leading industry solutions.

Sandbox Performance

TTI (time to interactive) measures wall-clock from the sandbox.create() API call to the first command executing inside the sandbox. All results are from the ComputeSDK public leaderboard — 100 iterations per provider, scored as 60% median / 25% p95 / 15% p99, multiplied by success rate. Benchmarks run daily via automated GitHub Actions.

Declaw ranks #1 across all three modes — sequential (99.6), staggered (99.6), and burst (97.4). 40ms median TTI in sequential/staggered, 240ms under 100-sandbox concurrent burst. 100% success rate across all modes, 14 providers benchmarked.

Full results

CategoryBenchmarkDeclawBest public competitorDelta
Sequential TTIComposite score99.6 / 100Tensorlake: 98.0+1.6
Sequential TTIMedian40 msTensorlake: 190 ms−150 ms
Sequential TTIp9540 msTensorlake: 220 ms−180 ms
Sequential TTIp9940 msTensorlake: 220 ms−180 ms
Staggered TTI (200ms apart)Composite score99.6 / 100Tensorlake: 98.0+1.6
Staggered TTIMedian40 msTensorlake: 190 ms−150 ms
Staggered TTIp9540 msTensorlake: 210 ms−170 ms
Staggered TTIp9940 msTensorlake: 220 ms−180 ms
Burst TTI (100 simultaneous)Composite score97.4 / 100Tensorlake: 95.1+2.3
Burst TTIMedian240 msTensorlake: 430 ms−190 ms
Burst TTIp95290 msTensorlake: 560 ms−270 ms
Burst TTIp99290 msTensorlake: 580 ms−290 ms
Burst TTIWall clock (100 sandboxes)0.31 sTensorlake: 0.60 s−0.29 s

Source: ComputeSDK Sandbox Leaderboard, May 9, 2026. 100 iterations per provider, automated daily runs via GitHub Actions. Methodology: github.com/computesdk/benchmarks.

Guardrails

1. Prompt Injection

Declaw's prompt-injection detection was evaluated on 2,965 prompts across three public datasets: the canonical Deepset dataset, Lakera's Gandalf user-attack corpus, and the balanced jailbreak-classification set. Lakera's PINT dataset is proprietary; these three constitute the equivalent public benchmark set.

Results at production settings

DatasetnAccuracyPrecisionRecallF1
deepset/prompt-injections66295.92%99.58%90.08%0.946
Lakera gandalf_ignore_instructions1,00099.90%100.00%99.90%0.9995
jackhhao/jailbreak-classification1,30698.70%99.09%98.35%0.987
Combined2,96598.48%99.63%98.03%0.988

Industry comparison — Deepset accuracy

SystemAccuracyF1
Declaw95.92%0.946
Lakera Guard87.91%0.823
Vigil77.49%0.615
Azure Prompt Shield77.28%0.560
ProtectAI LLM Guard76.00%0.579
Calypso AI Moderator73.56%0.572
Rebuff72.96%0.689
LangKit Similarity70.20%0.588
LangKit Canary61.02%0.044

Competitor numbers from Palit & Woods (2025), arXiv:2505.13028, Table 5. Declaw evaluated 2026-04-20 on the same Deepset dataset (662 prompts, train+test splits) at production settings.

2. Agentic Indirect Injection (AgentDojo)

The numbers above measure detection on static datasets. The harder question is whether a guardrail actually stops an attack inside a live agent loop, where injected instructions hide inside tool outputs (emails, Slack messages, documents) and the LLM must be prevented from acting on them. AgentDojo (ETH Zürich SPY Lab, NeurIPS 2024) is the canonical benchmark for this: it runs real tool-using agents through curated user tasks with adversarial instructions planted in returned tool data.

Declaw was integrated into AgentDojo's pipeline and evaluated on the full slack suite (105 trials, gpt-4o-mini) under the important_instructions attack — AgentDojo's strongest attack class.

VariantAttack success rateInjections blockedTask utility
Baseline (no defense)62.9%39 / 10551.4%
Declaw0.0%105 / 10517.1%

Declaw blocks 100% of injections (0 of 105 succeed) versus a 62.9% success rate with no defense — a −62.9 pp reduction in attack success rate on a benchmark explicitly designed to probe live agentic pipelines.

On the utility tradeoff: AgentDojo's evaluator treats an entire tool-output message as tainted once Declaw flags an injection. Because the slack suite intentionally concentrates attacks inside the same messages that carry legitimate data (the attacker's goal is to ride real content), blocking the injection also redacts some useful context — which lowers measured utility. This is a property of the benchmark (adversarial-maximal placement + whole-message redaction), not an ordinary workload. We consider the ASR reduction the primary signal.

3. Invisible-Text Attacks

Invisible-text attacks use Unicode encoding tricks to hide adversarial instructions from reviewers and LLM interfaces. Recent disclosures include Sourcegraph Amp Code and Google Jules coding agent (both patched in 2025), plus documented research from AWS, Cisco Talos, Embrace the Red, and Keysight.

No public benchmark exists for this category. Declaw's coverage is evaluated against a curated corpus of 78 attack payloads drawn from the above disclosures plus 35 benign controls (CJK, Arabic, Hebrew, emoji, URLs, code). Overall F1 = 0.961 at production threshold.

Coverage by documented attack class

Attack classCoverageRepresentative
Unicode Tags block (U+E0000–U+E007F)100% (20/20)Sourcegraph Amp, Google Jules, AWS 2025
Zero-width characters100% (20/20)ZWSP, ZWNJ, ZWJ, BOM
Bidi control overrides100% (10/10)Trojan-source-style RLO
Private Use Area100% (8/8)Custom-glyph smuggling
Combined / in-the-wild payloads100% (10/10)Embrace the Red "ASCII Smuggler"
Soft-hyphen / format controls75% (3/4)Soft hyphen caught; U+FFFC missed
Unassigned codepoints67% (2/3)
Variation selectors0% (by design)Legitimate emoji use

Declaw covers 100% of the documented 2024–2026 in-the-wild invisible-text attack classes (Tags block, zero-width, bidi, PUA). Lakera Guard, ProtectAI LLM Guard, and AWS Bedrock Guardrails do not publish comparable coverage numbers for this category.

4. PII Detection

Evaluation uses the ai4privacy/pii-masking-300k validation set (1,000 samples), with entity-level overlap scoring on the aligned-type intersection (PERSON, EMAIL, PHONE, SSN, IP, CREDIT_CARD, LOCATION, DATE). Declaw targets US-format PII — US SSN (XXX-XX-XXXX), North American phone, plus format-deterministic global types (email, IP, credit card).

Strong categories (format-deterministic, Declaw's product scope)

TypePrecisionRecallF1
EMAIL0.9570.9970.977
IP address0.9721.0000.986
US SSN (US-format subset)1.0001.0001.000

For the types enterprise compliance teams most frequently audit — email, IP, and US SSN — Declaw's detection is on par with or exceeds leading PII-specific solutions. Declaw pairs detection with redaction and a full audit log for SOC2 / HIPAA workflows.