Skip to main content

What Is QA Scoring?

QA (Quality Assurance) scoring transforms Cevro into an automated quality assurance system for support conversations. Instead of having QA specialists manually review small samples of chats, the AI evaluates every conversation against your customized scorecard—delivering consistent, granular, and scalable quality assurance. Key benefits:
  • 100% coverage — Every conversation is scored, not just a sample
  • Consistent evaluation — Same criteria applied to every conversation
  • Granular customization — Define your own scoring rules beyond Cevro’s built-in CSAT
  • Rich context — See exactly why each score was given, with rule-by-rule breakdowns
  • Immediate feedback — Scores available as soon as conversations close
QA Scoring works for both AI-handled and human agent conversations. Whether Cevro AI resolved the ticket or a human agent did, the same scorecard evaluates quality consistently.

How It Works

When a support conversation closes:
  1. Conversation synced — Cevro pulls the full transcript
  2. Context gathered — The AI identifies relevant procedures and fetches player data
  3. Scorecard evaluation — Each rule in your scorecard is evaluated independently
  4. Results stored — Scores are saved with per-rule breakdowns
The AI acts like an expert QA reviewer with access to your documented procedures, knowledge base, and back-office data.

Setting Up Your Scorecard

A scorecard defines exactly how conversations should be evaluated. It’s the rubric your AI QA specialist follows.
1

Navigate to Scorecards

Go to Settings → Scorecards in your workspace.
2

Create a Scorecard

Click Create Scorecard and give it a name (e.g., “Support Quality Scorecard”).You can set the total point scale (default is 100 points).
3

Add Sections

Sections are categories of quality you’re evaluating. Common examples:
  • Accuracy — Did the agent provide correct information?
  • Soft Skills — Was the tone appropriate? Did they greet properly?
  • Process Compliance — Did they follow the documented procedures?
  • Resolution — Was the issue fully resolved?
Each section has a point value that contributes to the total score.
4

Add Rules to Each Section

Rules are the specific criteria within each section. For each rule, define:
  • Objective — What you’re evaluating (shown in reports)
  • Criteria — Detailed instructions for the AI on how to evaluate
  • Points — Maximum points for this rule

Scorecard Structure

Scorecard (e.g., "Support Quality Scorecard")
├── Total Points: 80

├── Section: Accuracy [30 points]
│   ├── Rule 1: Complete Answer (10 pts)
│   ├── Rule 2: Correct Information (10 pts)
│   └── Rule 3: SOP Compliance (10 pts)

├── Section: Soft Skills [25 points]
│   ├── Rule 1: Greeting (5 pts)
│   ├── Rule 2: Tone (10 pts)
│   └── Rule 3: Closing (10 pts)

└── Section: Performance [25 points]
    ├── Rule 1: Response Time (10 pts)
    └── Rule 2: Resolution (15 pts)
Rule points within a section must add up exactly to the section’s total points. Section points must add up to the scorecard’s total points.

Writing Good Evaluation Criteria

The criteria field is crucial—it tells the AI exactly how to evaluate each rule.

Good Criteria Example

Agent explicitly confirmed they understood the player's issue by
restating it, provided a complete solution addressing all aspects
of the question, and verified the player understood before closing.

Bad Criteria Example

Agent was helpful.
Be specific about what “good” looks like. Include examples of pass/fail scenarios if helpful. The more precise your criteria, the more consistent your scores will be.

The QA Tab

Every scored conversation has a QA tab in the right sidebar. This is where you see the full scoring breakdown with rich context.

Score Overview

At the top, you’ll see:
  • Total Score — Points awarded out of maximum (e.g., “72/80”)
  • Visual indicator — Color-coded to show quality at a glance

Section-by-Section Breakdown

Each section expands to show:
  • Section score — Points for this category
  • Individual rules — Each rule with:
    • Points awarded (e.g., “8/10”)
    • The AI’s explanation of why this score was given
    • References to specific transcript moments

Reference Sources

The Reference Sources panel shows what context the AI used when scoring:
  • AI Procedures — Which procedures were identified as relevant to this conversation
  • Knowledge Base — Articles referenced during evaluation
  • Back Office Data — Player data fetched (balances, bonus status, etc.)
This transparency helps you understand why a score was given and debug unexpected results.

Feedback

Both AI and human feedback are tracked:
  • AI Feedback — The automated score with rule-by-rule breakdown
  • Human Feedback — Manual reviews and overrides from your team

Metrics & Analytics

QA Scoring unlocks powerful analytics beyond basic conversation counts. Access these via the Metrics button on the Conversations page.

Key Performance Indicators

The metrics dashboard shows:
MetricWhat It Measures
QA ScoreAverage scorecard score across conversations
Average Handling TimeHow long conversations take to resolve
First Response TimeTime until first agent response
Track quality over time with the Score Over Time chart. Filter by:
  • Date range
  • Agent (AI vs human)
  • Conversation status

Performance by Section

See which areas of quality are strongest and which need attention:
SectionAverage ScoreTrend
Accuracy92%
Soft Skills88%
Resolution78%
This breakdown helps you focus training and process improvements where they matter most.

Granular Export

Export detailed QA data for offline analysis:
  • Per-conversation scores
  • Rule-by-rule breakdowns
  • Trend data over time

Manual Review

QA isn’t fully automated—human oversight matters. Administrators can:
  • Override AI Scores — Adjust overall or per-section scores
  • Add Comments — Document why a score was changed
  • Use Custom Rating Scales — Configure specific point values for quick scoring
Manual reviews are tracked separately from AI scores, enabling comparison and calibration over time.

Configuring the Review Scale

By default, the review scale uses evenly-spaced values (e.g., 0, 20, 40, 60, 80 for an 80-point scorecard). If you need a custom review scale to match your existing QA workflow, contact your Customer Success Manager to configure it for your workspace.

Understanding Your Scores

Knowing how the AI evaluates conversations helps you write better scorecard criteria and understand why scores come out the way they do.

What the AI Considers

When evaluating a conversation, the AI scorer has access to more than just the chat transcript. It evaluates in full context:
ContextHow It’s Used
Full transcriptWhat was said by both parties, including timestamps and message order
Your scorecardEvery rule and its criteria — the AI evaluates each one independently
AI ProceduresYour documented SOPs that define correct agent behavior for each topic
Knowledge BaseYour articles, used to verify whether the agent gave correct information
Player dataReal account data (balances, transactions, status) from your back-office systems
Session infoPlayer location, language, and device data from the helpdesk
This means the AI can verify factual accuracy — not just whether the agent sounded right, but whether what they said was actually correct based on the data.

How Data Availability Works

A key concept: the scorer treats all reference data as information that was available to the agent during the conversation. For example, if your back-office integration returns a player’s country, and your SOP says “check the player’s country before proceeding” — the scorer considers that requirement satisfied as long as the agent’s behavior is consistent with the data. The agent doesn’t need to explicitly say “I can see you’re from Germany” for the country to count as “checked.”
What this means for your scorecard criteria: If you want agents to explicitly state something to the player (e.g., “confirm the player’s country out loud”), say so explicitly in your criteria. Otherwise, the AI assumes internal checks are performed correctly when the agent’s actions are consistent with the data.

Fairness Safeguards

The scoring AI is designed to be fair and consistent. Several safeguards are built in:
  • Benefit of the doubt — If a procedure requires an internal check (like looking up account data) and the agent’s response is consistent with the data, the AI assumes the check was performed correctly
  • No penalty for short conversations — If the player left abruptly or didn’t engage, agents aren’t penalized for not demonstrating behaviors they had no opportunity to show
  • Time-aware accuracy — When verifying data accuracy, the AI accounts for the fact that account data can change. It evaluates based on what was available during the conversation, not what changed after
  • Translation-aware — If your workspace uses automatic translation, agents are never penalized for the language of their responses

Tips for Better Scores

Be explicit about visibility. If you want the agent to say something specific to the player, write “Agent must inform the player that…” in your criteria. If the check can be internal, write “Agent must verify…” — the AI will give credit when the agent’s behavior matches the data.
Connect your back-office tools. The more data the AI has access to, the more accurately it can evaluate. Without back-office data, the AI can only judge based on what’s said in the conversation — it can’t verify if the information was correct.

Best Practices

Begin with 2-3 sections and a few rules each. Add complexity only when needed.
Vague criteria like “was professional” lead to inconsistent scores. Specify exactly what you’re looking for.
If a procedure requires internal checks (like verifying data in back-office systems), clarify whether the agent needs to mention this to the player or just perform the check.
Review a few scored conversations to calibrate expectations before rolling out widely.
Regularly compare AI scores to your own assessments. Adjust criteria if there’s consistent disagreement.

Troubleshooting

IssueWhat to Check
Score seems too lowCheck the rule-by-rule breakdown in the QA tab. Which specific rules failed?
AI used wrong contextCheck Reference Sources. Did it pick the right procedure for this conversation type?
Inconsistent scores for similar conversationsYour criteria may be too vague. Add more specific pass/fail examples.
Agent penalized for internal actionsClarify in criteria whether internal steps need to be visible in the transcript.

Limitations

Back Office Data Timing

When accuracy scoring uses back-office data, this data is fetched at scoring time — which may be minutes after the conversation ended. The AI is designed to account for this: it looks at timestamps in the data and evaluates based on what was available during the conversation, not what changed after. However, if data changes very rapidly (e.g., real-time balance updates), there may be edge cases where the timing matters.