ThreatPrint benchmark
April 13, 2026 benchmark claim: 96% detection / 0% false positives. This page summarizes what is published and labels unpublished fields honestly.
Methodology
The published run used the Pulse-Proxy extended benchmark script against the production proxy with attack and clean prompt corpora. Attack prompts count as detected when blocked or warned. Clean prompts count as false positives when blocked or warned.
Corpus size
Published summary: 100 attack prompts and 100 clean prompts. If this changes, update the benchmark artifact and this page together.
Confusion matrix
| True positives | 96 | published April 13 run |
|---|---|---|
| False negatives | 4 | published April 13 run |
| False positives | 0 | published April 13 run |
| True negatives | 100 | published April 13 run |
Latency table
| p50 | pending publication |
|---|---|
| p95 | pending publication |
| p99 | pending publication |
Reproducibility instructions
- Clone Pulse-Proxy.
- Set PULSE_KEY for a valid test key.
- Run scripts/extended-benchmark.ps1.
- Publish results/benchmark-summary.md and the commit/version used.
Failure cases
The April 13 run included missed attack prompts in encoded/obfuscated categories. Do not claim 100% detection.
Commit/version
Package version and rule bundle are published in the benchmark artifact; exact release commit for this public page is pending publication.