Guardrails and bypass labels

The set of rules that keeps the AI out of conversations where it shouldn't speak.

By ChristopherUpdated May 14, 20263 min read

Guardrails and bypass labels

Guardrails are the rules that stop the AI from replying when it shouldn't. The most important one is the bypass label list. The rest are channel and confidence overrides plus the quarantine queue.

Together they form the floor under your AI setup. You can tune everything else aggressively as long as guardrails are in place.

Bypass labels

The bypass label list names topics where the AI never drafts.

Defaults:

bug. Engineering should triage. AI guesses about a bug are worse than silence.
abuse. Threats, harassment. Always a human.
spam. Filtered out, no need to spend tokens.

You can add more bypass labels on AI → Guardrails. Common additions:

legal. Anything from a lawyer or about a contract.
security. Disclosure reports, account takeover claims.
pii. Customer accidentally sent sensitive data.

Bypass labels are checked on every new message. If a conversation gets re-labeled bug after it started, the AI stops drafting on the next message.

Per-channel overrides

Beyond bypass labels, you can disable the AI per channel on the Guardrails page.

Common patterns:

AI off entirely on Slack Connect (humans only).
AI off on a specific email alias used for legal.
AI on but confidence floor raised for VIP segments.

The Channel master toggles provide a one-click kill switch per channel.

Confidence floor as a guardrail

The auto-send confidence floor works like a guardrail too. Below the floor, replies do not send. See Confidence thresholds.

If you only ever set one guardrail, set this one.

Quarantine

Newly-promoted Q&A pairs from the Learned Q&A library, brand-new help center articles, and recently-edited articles wait in a quarantine state for a short window before the AI uses them on live tickets. This stops a typo or a bad promotion from showing up in customer replies before any reviewer sees it.

You can review and approve or reject quarantined items on the Learned Q&A page. Quarantined articles auto-release after their hold period unless you mark them rejected.

Customer segment overrides

Override AI behavior for specific customer segments. Examples:

VIP segment. Force draft mode regardless of workspace setting. Humans always touch a VIP reply before it goes.
Trial accounts. Auto-send aggressively, since they need fast answers and the stakes are lower.
Churn-risk segment. Silent (humans only).

Set per-segment overrides on the Guardrails page.

What never gets sent

A handful of things are blocked at the model layer. The AI will refuse to:

Promise refunds it cannot verify.
Claim a feature exists when the KB says it does not.
Reply to messages that look like impersonation attempts.
Send to addresses on your bounce list.

These are non-configurable safety rails, separate from the bypass labels.

Spend cap as a guardrail

When you hit your monthly spend cap, AI replies pause. The inbox keeps working as a normal helpdesk. See Spend caps and alerts.

This is a useful guardrail against runaway costs from a flood of bot traffic or a misconfigured channel.

Auditing guardrail behavior

Every AI decision (drafted, suppressed, bypassed) shows up in AI receipts with the reason. If a ticket should have been bypassed but was not, the receipt tells you why the bypass did not match.

What guardrails do not protect against

Bad voice. Guardrails do not edit; they block. If the voice is wrong on tickets that pass guardrails, fix Voice and tone.
Bad facts. Guardrails do not verify the KB. Bad articles produce bad replies. See The brain: KB graph.
Misclassification. If the AI labels a bug as how-to, the bug bypass does not catch it. Confidence floors help here.

Recommended starting setup

Bypass labels: bug, abuse, spam (default), plus legal and security if relevant.
Per-channel: AI off on Slack Connect for the first month.
Confidence floor: 85% for auto-send.
VIP segment: force draft mode.
Spend cap: 2x expected first-month usage.

When to loosen guardrails

After two weeks of receipts, look at:

Tickets that got drafted that should have been bypassed (and add labels).
Tickets that got bypassed that the AI could have handled (and remove labels).
VIP segments where the AI was actually fine (and consider auto-send for them).

Guardrails should evolve. They are not a set-once thing.

Was this article helpful?

← Back to Ochre Help

Guardrails and bypass labels

Bypass labels

Per-channel overrides

Confidence floor as a guardrail

Quarantine

Customer segment overrides

What never gets sent

Spend cap as a guardrail

Auditing guardrail behavior

What guardrails do not protect against

Recommended starting setup

When to loosen guardrails

Related