Guardrails
Guardrails that keep agents safe
Layer policy prompts, automated tests, and escalation paths to reduce risk and drift.
Ship safer agents with policy-first prompts, red-team testing, and structured evaluation loops.
4 guides4 focus areasPolicy rules
Starter kit
- Add a policy-first system message.
- Define disallowed outputs and refusal paths.
- Run evals on every prompt update.
- Set escalation rules for sensitive outputs.
Focus areas
Policy layers
Separate policy logic from task instructions.
Red-team testing
Run adversarial prompts and measure failure rates.
Risk scoring
Score outputs and route to humans for review.
Auditability
Store traces and decisions for compliance.
Guides in this topic
Guardrails guides
Curated recipes, playbooks, and walkthroughs for this topic area.
EvalsQualityAutomation
Eval flywheel for prompt regressions
Generate test cases, score outputs, and track regressions.
Oct 6, 202514 min read
GatewayAuthSecurity
Gateway API authentication guide
Secure your Gateway API integration with proper authentication and scopes.
Sep 20, 202514 min read
FunctionsResponsesAgents
Structured outputs for multi-agent systems
Keep agents aligned with JSON schema validation and repair loops.
Aug 6, 202415 min read
SafetyGuardrailsPrompting
Policy-first prompting
Layer safety policy before task instructions to reduce risk.
May 29, 202411 min read
Start here
Featured in Guardrails
Related topics