Add a policy layer
Put policy constraints in the system prompt before task instructions.
Red-team prompts
Pressure test the policy with adversarial prompts and track failure rates.
Escalation and audit
Route sensitive outputs to human reviewers and log all decisions.