
Why We Built a Transparent AI Firewall
When you deploy an LLM application, you are effectively giving every user a command line to your backend.
If you're building a chatbot, that command line is restricted—it can only generate text. If you're building an agent with tools, that command line has sudo. It can read files, query databases, send emails, and issue refunds. The only thing standing between a user and those capabilities is a system prompt that says "please don't do bad things."
That's not security. That's a suggestion.
The industry's answer to this problem has been "safety layers"—proprietary APIs that return a binary safe or unsafe verdict. We tried using them. We hated them. Here's why, and what we built instead.
The Black Box Problem
Imagine a Web Application Firewall (WAF) that didn't tell you why it blocked a request. It just returned 403 Forbidden. You check your logs—nothing. You check the vendor dashboard—"Threat Detected." You call support—"Our model flagged it as suspicious."
How do you debug that? You don't. You turn the WAF off, eat the risk, and move on.
That is the state of most AI security tools today. They are proprietary models hidden behind APIs. You send your user's prompt, and you pray the vendor's definition of "unsafe" matches yours.
We saw three fundamental problems with this approach:
1. No explainability. When a legitimate user gets blocked, you can't tell them why. You can't tell your engineering team why. You can't file a meaningful bug report. The vendor's model is a black box, and you're building your product on top of it.
2. No auditability. You're trusting a third party with every prompt your users send. Are they training on your data? Are they logging it? You don't know, because you can't see the code. For regulated industries (healthcare, finance, defense), this is a non-starter.
3. No customization. Every application has different security requirements. A creative writing tool and a banking chatbot have opposite definitions of "safe content." Black-box tools can't be tuned for your specific use case.
What We Built
PromptGuard is an AI firewall designed for transparency. It sits between your application and your LLM provider as a transparent proxy, with open-source SDKs and fully explainable decisions.
Your App → PromptGuard → OpenAI / Anthropic / Gemini / etc.Integration is one line of code—change your base_url. No SDK required, no code refactoring, no middleware to write.
Seven Threat Detectors
We don't run one generic "is this safe?" model. We run seven specialized detectors, each focused on a specific threat type:
- Prompt Injection: Catches direct overrides ("ignore instructions"), roleplay manipulation, delimiter injection, and semantic evasion techniques.
- PII Detection: Identifies 39+ types of personally identifiable information using layered regex, checksum validation (Luhn, ABA, Verhoeff, and more), ML-based named entity recognition for unstructured PII like names and addresses, and encoded PII detection—then replaces them with safe tokens.
- Data Exfiltration: Detects attempts to extract system prompts, training data, user data, or knowledge base contents.
- Toxicity: Five-model ML ensemble covering hate speech, violence, self-harm, sexual content, and harassment.
- API Key Detection: Catches leaked credentials (OpenAI, AWS, GitHub, Google, generic API keys and tokens).
- Fraud Detection: Identifies social engineering patterns like wire transfer scams, gift card scams, and credential harvesting.
- Malware Detection: Blocks destructive commands (rm -rf, format c:), reverse shells, and encoded PowerShell payloads.
The Hybrid Architecture
Most AI security tools are just another LLM call. They send your prompt to GPT-4 with "is this safe?" and add 500ms to your latency.
We take a different approach. Our detection pipeline combines:
- Regex patterns for known attack signatures (fast, deterministic, zero false negatives on known patterns)
- A 5-model ML ensemble (Llama-Prompt-Guard, DeBERTa, ALBERT, toxic-bert, RoBERTa) running in parallel for semantic understanding
- Confidence calibration via Platt scaling, so our confidence scores are actually calibrated probabilities
Total overhead: approximately 150ms. Not zero, but an order of magnitude faster than an LLM-based security check.
Three Decisions, Not Two
Most security tools give you ALLOW or BLOCK. We add a third: REDACT.
If a user says "My SSN is 123-45-6789, can you help with my tax return?", the intent is legitimate. Blocking them would be hostile. Instead, we replace the SSN with [SSN_REDACTED], forward the sanitized prompt to the LLM, and the user gets their answer without exposing sensitive data.
Every decision is returned with full metadata:
X-PromptGuard-Event-ID: Unique trace ID for debuggingX-PromptGuard-Decision: ALLOW, BLOCK, or REDACTX-PromptGuard-Confidence: Calibrated confidence score (0.0-1.0)X-PromptGuard-Threat-Type: Which threat category was detected
Six Use-Case Presets
Every application has different security needs. A support bot needs strict PII protection. A code assistant needs to allow technical content that looks like injection. A creative writing tool needs relaxed toxicity thresholds.
We ship six composable presets: support_bot, code_assistant, rag_system, creative_writing, data_analysis, and default. Each preset can be combined with a strictness level (strict, moderate, permissive) to fine-tune thresholds across all detectors simultaneously.
Beyond Detection
PromptGuard isn't just a scanner. It's a complete security platform:
- Multi-provider routing with automatic failover across OpenAI, Anthropic, Gemini, Mistral, Groq, and Azure OpenAI
- Bot detection with behavioral analysis (rate limiting, timing analysis, payload fingerprinting, session analysis, reputation scoring)
- Red team testing with 20 built-in attack vectors you can run from the API or dashboard
- AI agent security with tool call validation, argument inspection, sequence analysis, and velocity limiting
- Webhook and email alerting when threats are detected (Slack-compatible)
- Shadow mode / A/B testing for safely evaluating detection model changes without affecting production traffic
- Feedback-driven model recalibration that automatically adjusts confidence thresholds based on false positive/negative reports
Why Transparency Matters
We designed PromptGuard for one simple reason: security tools must be auditable.
If you're trusting a tool to sit between your users and your AI, you need to understand its decisions. You need to see why a request was blocked, what detector triggered, and what confidence threshold was applied.
When PromptGuard blocks a request, you can trace the decision through the security engine to the specific detector and pattern that triggered. Every decision includes a human-readable reason, threat type, confidence score, and event ID. No black box. No "trust us." No mystery.
Our SDKs are open source (MIT licensed), so you can inspect exactly how the client-side integration works. Enterprise customers who self-host get full access to the platform source code for audit purposes.
The Free Tier
We offer 10,000 requests per month for free. Not a trial. Not "free for 14 days." Free forever.
We do this because we believe AI security shouldn't be gated by budget. A solo developer building a side project deserves the same injection protection as an enterprise team. The free tier includes the two most critical detectors (prompt injection and PII), the dashboard, and the full proxy functionality.
If you need ML and LLM-powered detection, all ten security detectors, custom policies, or webhook alerting, the paid plans start at $49/month.
Try It
You can use the hosted version at promptguard.co—swap your base_url and you're done.
Enterprise customers can self-host the entire stack in their VPC — contact sales@promptguard.co for access. No data leaves your infrastructure. We never see your prompts.
If your AI application has users, it has an attack surface. The question isn't whether to add security—it's whether your security layer will explain itself when things go wrong.
Ours does. Every blocked request comes with a full explanation.
READ MORE

The OWASP Top 10 for LLM Applications: A Practitioner's Guide to the Risks That Actually Matter
OWASP published the definitive list of security risks for LLM applications. We've seen every one of them exploited in production. Here's what the list gets right, what it underemphasizes, and the engineering decisions that determine whether each risk becomes a headline.

Inside Our 5-Model ML Ensemble: How We Detect Attacks Without Adding Latency
A technical deep dive into how PromptGuard's ensemble of Llama-Prompt-Guard, DeBERTa, ALBERT, toxic-bert, and RoBERTa classifies threats—covering parallel inference, weighted voting, category-specific thresholds, confidence calibration, and why five small models beat one large one.

Securing LangChain Applications: The Complete Guide
LangChain makes it easy to build powerful agents. It also makes it easy to build security vulnerabilities. Here's how to add production-grade security to your chains, agents, and RAG pipelines without rewriting your application.