AI Wisdom
๐Ÿ›ก๏ธ

Guardrails & Safety

Input/output validation, PII protection, content moderation, and prompt injection defence.

Graduated ยท 2Incubating ยท 3Sandbox ยท 38 total
โ† All categories

Guardrails AI

Incubating
4/5

Add input/output validation and safety rails to LLM calls

Cleanest Python API for defining validators on LLM inputs and outputs. RAIL spec and Pydantic-based guards. Retry on failure is well-designed. Server mode needed for prod latency.

NeMo Guardrails

Sandbox
3/5

NVIDIA toolkit for programmable guardrails via Colang language

Unique dialogue-flow approach using Colang DSL. Best for complex multi-turn conversation policies. Steeper learning curve than Guardrails AI but richer for conversation steering.

Open Source

Llama Guard 3

Sandbox
4/5

Meta's fine-tuned safety classifier for prompt and response screening

Production-deployable content moderation model. Run it as a sidecar to screen every input/output. MLCommons hazard taxonomy built in. Free, open-weight, and fast on a single GPU.

Open Source

Rebuff

Sandbox
2/5

Prompt injection detection API for LLM applications

Purpose-built for prompt injection detection โ€” a real attack vector in RAG systems. Uses a canary token technique alongside an LLM classifier. Early-stage; combine with input sanitization.

Open Source

Microsoft Presidio

Graduated
4/5

Data protection and anonymization for PII in LLM pipelines

Best open-source PII detection and anonymization library. Detect-and-replace SSN, emails, credit cards before sending to LLMs. Critical for GDPR/HIPAA compliance in RAG.

Open Source

Lakera Guard

Incubating
4/5

Real-time prompt injection and jailbreak protection API

Sub-millisecond inference for prompt injection detection. Managed API means zero infra. Integrates as middleware. Best for teams that want safety without building classifiers.

Azure AI Content Safety

Graduated
4/5

Microsoft's content moderation API for text and images

Enterprise-grade content filtering integrated into Azure OpenAI. Fine-grained severity thresholds for hate, violence, sexual content, and self-harm. Required for regulated use cases on Azure.

Anthropic Constitutional AI

Incubating
4/5

Principle-based self-improvement for harmlessness and helpfulness

Pioneering approach where the model critiques and revises its own outputs based on a set of principles. Built into Claude models. Influence on the field is massive even if not a standalone product.

Proprietary