What Is Content Shield?
Content Shield detects sensitive content in player messages — such as self-harm or suicidal intent — and silently escalates the conversation to a human agent. When triggered, zero automated messages reach the player. No greeting, no AI response, no transfer message. Only a human communicates.Content Shield runs on every inbound message, not just the first. A player might start with a normal question and later express distress — Content Shield catches it regardless of when it appears.
How It Works
- Player sends a message (first or subsequent)
- Before any AI processing, Content Shield checks the message
- If sensitive content is detected → conversation is silently escalated to a human agent
- If not detected → normal flow continues (greeting, AI response, etc.)
What the Player Experiences
- If Content Shield triggers: Nothing automated. A human agent picks up the conversation.
- If Content Shield doesn’t trigger: Normal experience — greeting, AI response, etc.
What the Operator Sees
- Ticket is flagged in the escalation queue
- Detection metadata shows the type, confidence score, and matched reference
- The
suppressOutboundflag prevents any automated outbound for the ticket’s lifetime
Setting Up Content Shield
Content Shield is configured through Automation Rules using a special trigger type.Create an Automation Rule
Go to Settings → Automation Rules and create a new rule:
- Trigger: Content Detected
- Detection Type: Self-harm
- Action: Escalate to Human (with silent mode enabled)
Assign an Agent Team
Make sure you have a team of human agents configured to receive escalated conversations. These are the agents who will handle flagged conversations.
Detection
Content Shield uses purpose-built safety classification that works across languages. Detection is fast — messages are evaluated in real time with no noticeable delay to the player.Key Behaviors
Runs on every message
Runs on every message
Not just the first message. A player might start with a withdrawal question and later express distress. Content Shield evaluates every inbound message.
Idempotent
Idempotent
Once a ticket is flagged, Content Shield won’t re-trigger on subsequent messages in the same conversation. The flag is permanent for that ticket’s lifetime.
Zero overhead when not configured
Zero overhead when not configured
Workspaces without a Content Detected automation rule skip detection entirely. No API calls, no latency.
Message is always preserved
Message is always preserved
The player’s message is saved to the transcript for completeness, even when intercepted. Only the AI processing is skipped.
Works with all channels
Works with all channels
Content Shield works on all supported channels — LiveChat, Zendesk, Zoho, Respond.io, Intercom, Web Messenger, and Web — with full first-message coverage.
Limitations
- Single detection type — currently only self-harm. Additional types (harassment, threats) can be added.
- No per-brand thresholds — a single threshold applies across the workspace.
- Very terse messages may not match — short phrases like “end it” with no context may fall below the detection threshold.
Content Shield is a safety layer, not a replacement for comprehensive responsible gaming detection. It specifically targets self-harm content as an immediate safety measure. For broader RG detection (problem gambling, financial hardship, self-exclusion), the AI agent handles classification during normal conversation flow.