Guardrails AI
IncubatingAdd input/output validation and safety rails to LLM calls
Cleanest Python API for defining validators on LLM inputs and outputs. RAIL spec and Pydantic-based guards. Retry on failure is well-designed. Server mode needed for prod latency.
NeMo Guardrails
SandboxNVIDIA toolkit for programmable guardrails via Colang language
Unique dialogue-flow approach using Colang DSL. Best for complex multi-turn conversation policies. Steeper learning curve than Guardrails AI but richer for conversation steering.
Llama Guard 3
SandboxMeta's fine-tuned safety classifier for prompt and response screening
Production-deployable content moderation model. Run it as a sidecar to screen every input/output. MLCommons hazard taxonomy built in. Free, open-weight, and fast on a single GPU.
Rebuff
SandboxPrompt injection detection API for LLM applications
Purpose-built for prompt injection detection โ a real attack vector in RAG systems. Uses a canary token technique alongside an LLM classifier. Early-stage; combine with input sanitization.
Microsoft Presidio
GraduatedData protection and anonymization for PII in LLM pipelines
Best open-source PII detection and anonymization library. Detect-and-replace SSN, emails, credit cards before sending to LLMs. Critical for GDPR/HIPAA compliance in RAG.
Lakera Guard
IncubatingReal-time prompt injection and jailbreak protection API
Sub-millisecond inference for prompt injection detection. Managed API means zero infra. Integrates as middleware. Best for teams that want safety without building classifiers.
Azure AI Content Safety
GraduatedMicrosoft's content moderation API for text and images
Enterprise-grade content filtering integrated into Azure OpenAI. Fine-grained severity thresholds for hate, violence, sexual content, and self-harm. Required for regulated use cases on Azure.
Anthropic Constitutional AI
IncubatingPrinciple-based self-improvement for harmlessness and helpfulness
Pioneering approach where the model critiques and revises its own outputs based on a set of principles. Built into Claude models. Influence on the field is massive even if not a standalone product.