Disruptive Rain Cookbook

Markdown view
# Policy-first prompting

Layer safety policy before task instructions to reduce risk.

- Date: May 29, 2024
- Reading time: 11 min
- Level: Beginner
- Tags: Safety, Guardrails, Prompting

## Takeaways
- Separate policy and task instructions.
- Refuse unsafe requests with clear messaging.
- Escalate high risk cases to humans.

## Add a policy layer

Put policy constraints in the system prompt before task instructions.

## Red-team prompts

Pressure test the policy with adversarial prompts and track failure rates.

## Escalation and audit

Route sensitive outputs to human reviewers and log all decisions.