What Is QA Scoring?
QA (Quality Assurance) scoring transforms Cevro into an automated quality assurance system for support conversations. Instead of having QA specialists manually review small samples of chats, the AI evaluates every conversation against your customized scorecard—delivering consistent, granular, and scalable quality assurance. Key benefits:- 100% coverage — Every conversation is scored, not just a sample
- Consistent evaluation — Same criteria applied to every conversation
- Granular customization — Define your own scoring rules beyond Cevro’s built-in CSAT
- Rich context — See exactly why each score was given, with rule-by-rule breakdowns
- Immediate feedback — Scores available as soon as conversations close
How It Works
When a support conversation closes:- Conversation synced — Cevro pulls the full transcript
- Context gathered — The AI identifies relevant procedures and fetches player data
- Scorecard evaluation — Each rule in your scorecard is evaluated independently
- Results stored — Scores are saved with per-rule breakdowns
Setting Up Your Scorecard
A scorecard defines exactly how conversations should be evaluated. It’s the rubric your AI QA specialist follows.Create a Scorecard
Click Create Scorecard and give it a name (e.g., “Support Quality Scorecard”).You can set the total point scale (default is 100 points).
Add Sections
Sections are categories of quality you’re evaluating. Common examples:
- Accuracy — Did the agent provide correct information?
- Soft Skills — Was the tone appropriate? Did they greet properly?
- Process Compliance — Did they follow the documented procedures?
- Resolution — Was the issue fully resolved?
Scorecard Structure
Writing Good Evaluation Criteria
The criteria field is crucial—it tells the AI exactly how to evaluate each rule.Good Criteria Example
Bad Criteria Example
The QA Tab
Every scored conversation has a QA tab in the right sidebar. This is where you see the full scoring breakdown with rich context.Score Overview
At the top, you’ll see:- Total Score — Points awarded out of maximum (e.g., “72/80”)
- Visual indicator — Color-coded to show quality at a glance
Section-by-Section Breakdown
Each section expands to show:- Section score — Points for this category
- Individual rules — Each rule with:
- Points awarded (e.g., “8/10”)
- The AI’s explanation of why this score was given
- References to specific transcript moments
Reference Sources
The Reference Sources panel shows what context the AI used when scoring:- AI Procedures — Which procedures were identified as relevant to this conversation
- Knowledge Base — Articles referenced during evaluation
- Back Office Data — Player data fetched (balances, bonus status, etc.)
Feedback
Both AI and human feedback are tracked:- AI Feedback — The automated score with rule-by-rule breakdown
- Human Feedback — Manual reviews and overrides from your team
Metrics & Analytics
QA Scoring unlocks powerful analytics beyond basic conversation counts. Access these via the Metrics button on the Conversations page.Key Performance Indicators
The metrics dashboard shows:| Metric | What It Measures |
|---|---|
| QA Score | Average scorecard score across conversations |
| Average Handling Time | How long conversations take to resolve |
| First Response Time | Time until first agent response |
Score Trends
Track quality over time with the Score Over Time chart. Filter by:- Date range
- Agent (AI vs human)
- Conversation status
Performance by Section
See which areas of quality are strongest and which need attention:| Section | Average Score | Trend |
|---|---|---|
| Accuracy | 92% | ↑ |
| Soft Skills | 88% | → |
| Resolution | 78% | ↓ |
Granular Export
Export detailed QA data for offline analysis:- Per-conversation scores
- Rule-by-rule breakdowns
- Trend data over time
Manual Review
QA isn’t fully automated—human oversight matters. Administrators can:- Override AI Scores — Adjust overall or per-section scores
- Add Comments — Document why a score was changed
- Use Custom Rating Scales — Configure specific point values for quick scoring
Configuring the Review Scale
By default, the review scale uses evenly-spaced values (e.g., 0, 20, 40, 60, 80 for an 80-point scorecard). If you need a custom review scale to match your existing QA workflow, contact your Customer Success Manager to configure it for your workspace.Understanding Your Scores
Knowing how the AI evaluates conversations helps you write better scorecard criteria and understand why scores come out the way they do.What the AI Considers
When evaluating a conversation, the AI scorer has access to more than just the chat transcript. It evaluates in full context:| Context | How It’s Used |
|---|---|
| Full transcript | What was said by both parties, including timestamps and message order |
| Your scorecard | Every rule and its criteria — the AI evaluates each one independently |
| AI Procedures | Your documented SOPs that define correct agent behavior for each topic |
| Knowledge Base | Your articles, used to verify whether the agent gave correct information |
| Player data | Real account data (balances, transactions, status) from your back-office systems |
| Session info | Player location, language, and device data from the helpdesk |
How Data Availability Works
A key concept: the scorer treats all reference data as information that was available to the agent during the conversation. For example, if your back-office integration returns a player’s country, and your SOP says “check the player’s country before proceeding” — the scorer considers that requirement satisfied as long as the agent’s behavior is consistent with the data. The agent doesn’t need to explicitly say “I can see you’re from Germany” for the country to count as “checked.”What this means for your scorecard criteria: If you want agents to explicitly state something to the player (e.g., “confirm the player’s country out loud”), say so explicitly in your criteria. Otherwise, the AI assumes internal checks are performed correctly when the agent’s actions are consistent with the data.
Fairness Safeguards
The scoring AI is designed to be fair and consistent. Several safeguards are built in:- Benefit of the doubt — If a procedure requires an internal check (like looking up account data) and the agent’s response is consistent with the data, the AI assumes the check was performed correctly
- No penalty for short conversations — If the player left abruptly or didn’t engage, agents aren’t penalized for not demonstrating behaviors they had no opportunity to show
- Time-aware accuracy — When verifying data accuracy, the AI accounts for the fact that account data can change. It evaluates based on what was available during the conversation, not what changed after
- Translation-aware — If your workspace uses automatic translation, agents are never penalized for the language of their responses
Tips for Better Scores
Best Practices
Start simple
Start simple
Begin with 2-3 sections and a few rules each. Add complexity only when needed.
Be specific in criteria
Be specific in criteria
Vague criteria like “was professional” lead to inconsistent scores. Specify exactly what you’re looking for.
Clarify internal vs. external actions
Clarify internal vs. external actions
If a procedure requires internal checks (like verifying data in back-office systems), clarify whether the agent needs to mention this to the player or just perform the check.
Test with real conversations
Test with real conversations
Review a few scored conversations to calibrate expectations before rolling out widely.
Use manual review for calibration
Use manual review for calibration
Regularly compare AI scores to your own assessments. Adjust criteria if there’s consistent disagreement.
Troubleshooting
| Issue | What to Check |
|---|---|
| Score seems too low | Check the rule-by-rule breakdown in the QA tab. Which specific rules failed? |
| AI used wrong context | Check Reference Sources. Did it pick the right procedure for this conversation type? |
| Inconsistent scores for similar conversations | Your criteria may be too vague. Add more specific pass/fail examples. |
| Agent penalized for internal actions | Clarify in criteria whether internal steps need to be visible in the transcript. |
Limitations
Back Office Data Timing
When accuracy scoring uses back-office data, this data is fetched at scoring time — which may be minutes after the conversation ended. The AI is designed to account for this: it looks at timestamps in the data and evaluates based on what was available during the conversation, not what changed after. However, if data changes very rapidly (e.g., real-time balance updates), there may be edge cases where the timing matters.Related Documentation
- AI Procedures — Documenting procedures that ground QA evaluation
- Analytics & Insights — Tracking agent performance over time
- Player Authentication — Setting up back-office data access