QA Scoring

What Is QA Scoring?

QA (Quality Assurance) scoring transforms Cevro into an automated quality assurance system for support conversations. Instead of having QA specialists manually review small samples of chats, the AI evaluates every conversation against your customized scorecard—delivering consistent, granular, and scalable quality assurance. Key benefits:

100% coverage — Every conversation is scored, not just a sample
Consistent evaluation — Same criteria applied to every conversation
Granular customization — Define your own scoring rules beyond Cevro’s built-in CSAT
Rich context — See exactly why each score was given, with rule-by-rule breakdowns
Immediate feedback — Scores available as soon as conversations close

QA Scoring works for both AI-handled and human agent conversations. Whether Cevro AI resolved the ticket or a human agent did, the same scorecard evaluates quality consistently.

How It Works

When a support conversation closes:

Conversation synced — Cevro pulls the full transcript
Context gathered — The AI identifies relevant procedures and fetches player data
Scorecard evaluation — Each rule in your scorecard is evaluated independently
Results stored — Scores are saved with per-rule breakdowns

The AI acts like an expert QA reviewer with access to your documented procedures, knowledge base, and back-office data.

Setting Up Your Scorecard

A scorecard defines exactly how conversations should be evaluated. It’s the rubric your AI QA specialist follows.

Navigate to Scorecards

Go to Settings → Scorecards in your workspace.

Create a Scorecard

Click Create Scorecard and give it a name (e.g., “Support Quality Scorecard”).You can set the total point scale (default is 100 points).

Add Sections

Sections are categories of quality you’re evaluating. Common examples:

Accuracy — Did the agent provide correct information?
Soft Skills — Was the tone appropriate? Did they greet properly?
Process Compliance — Did they follow the documented procedures?
Resolution — Was the issue fully resolved?

Each section has a point value that contributes to the total score.

Add Rules to Each Section

Rules are the specific criteria within each section. For each rule, define:

Objective — What you’re evaluating (shown in reports)
Criteria — Detailed instructions for the AI on how to evaluate
Points — Maximum points for this rule

Scorecard Structure

Scorecard (e.g., "Support Quality Scorecard")
├── Total Points: 80
│
├── Section: Accuracy [30 points]
│   ├── Rule 1: Complete Answer (10 pts)
│   ├── Rule 2: Correct Information (10 pts)
│   └── Rule 3: SOP Compliance (10 pts)
│
├── Section: Soft Skills [25 points]
│   ├── Rule 1: Greeting (5 pts)
│   ├── Rule 2: Tone (10 pts)
│   └── Rule 3: Closing (10 pts)
│
└── Section: Performance [25 points]
    ├── Rule 1: Response Time (10 pts)
    └── Rule 2: Resolution (15 pts)

Rule points within a section must add up exactly to the section’s total points. Section points must add up to the scorecard’s total points.

Writing Good Evaluation Criteria

The criteria field is crucial—it tells the AI exactly how to evaluate each rule.

Good Criteria Example

Agent explicitly confirmed they understood the player's issue by
restating it, provided a complete solution addressing all aspects
of the question, and verified the player understood before closing.

Bad Criteria Example

Agent was helpful.

Be specific about what “good” looks like. Include examples of pass/fail scenarios if helpful. The more precise your criteria, the more consistent your scores will be.

The QA Tab

Every scored conversation has a QA tab in the right sidebar. This is where you see the full scoring breakdown with rich context.

Score Overview

At the top, you’ll see:

Total Score — Points awarded out of maximum (e.g., “72/80”)
Visual indicator — Color-coded to show quality at a glance

Section-by-Section Breakdown

Each section expands to show:

Section score — Points for this category
Individual rules — Each rule with:
- Points awarded (e.g., “8/10”)
- The AI’s explanation of why this score was given
- References to specific transcript moments

Reference Sources

The Reference Sources panel shows what context the AI used when scoring:

AI Procedures — Which procedures were identified as relevant to this conversation
Knowledge Base — Articles referenced during evaluation
Back Office Data — Player data fetched (balances, bonus status, etc.)

This transparency helps you understand why a score was given and debug unexpected results.

Feedback

Both AI and human feedback are tracked:

AI Feedback — The automated score with rule-by-rule breakdown
Human Feedback — Manual reviews and overrides from your team

Metrics & Analytics

QA Scoring unlocks powerful analytics beyond basic conversation counts. Access these via the Metrics button on the Conversations page.

Key Performance Indicators

The metrics dashboard shows:

Metric	What It Measures
QA Score	Average scorecard score across conversations
Average Handling Time	How long conversations take to resolve
First Response Time	Time until first agent response

Score Trends

Track quality over time with the Score Over Time chart. Filter by:

Date range
Agent (AI vs human)
Conversation status

Performance by Section

See which areas of quality are strongest and which need attention:

Section	Average Score	Trend
Accuracy	92%	↑
Soft Skills	88%	→
Resolution	78%	↓

This breakdown helps you focus training and process improvements where they matter most.

Granular Export

Export detailed QA data for offline analysis:

Per-conversation scores
Rule-by-rule breakdowns
Trend data over time

Manual Review

QA isn’t fully automated—human oversight matters. Administrators can:

Override AI Scores — Adjust overall or per-section scores
Add Comments — Document why a score was changed
Use Custom Rating Scales — Configure specific point values for quick scoring

Manual reviews are tracked separately from AI scores, enabling comparison and calibration over time.

Configuring the Review Scale

By default, the review scale uses evenly-spaced values (e.g., 0, 20, 40, 60, 80 for an 80-point scorecard). If you need a custom review scale to match your existing QA workflow, contact your Customer Success Manager to configure it for your workspace.

Understanding Your Scores

Knowing how the AI evaluates conversations helps you write better scorecard criteria and understand why scores come out the way they do.

What the AI Considers

When evaluating a conversation, the AI scorer has access to more than just the chat transcript. It evaluates in full context:

Context	How It’s Used
Full transcript	What was said by both parties, including timestamps and message order
Your scorecard	Every rule and its criteria — the AI evaluates each one independently
AI Procedures	Your documented SOPs that define correct agent behavior for each topic
Knowledge Base	Your articles, used to verify whether the agent gave correct information
Player data	Real account data (balances, transactions, status) from your back-office systems
Session info	Player location, language, and device data from the helpdesk

This means the AI can verify factual accuracy — not just whether the agent sounded right, but whether what they said was actually correct based on the data.

How Data Availability Works

A key concept: the scorer treats all reference data as information that was available to the agent during the conversation. For example, if your back-office integration returns a player’s country, and your SOP says “check the player’s country before proceeding” — the scorer considers that requirement satisfied as long as the agent’s behavior is consistent with the data. The agent doesn’t need to explicitly say “I can see you’re from Germany” for the country to count as “checked.”

What this means for your scorecard criteria: If you want agents to explicitly state something to the player (e.g., “confirm the player’s country out loud”), say so explicitly in your criteria. Otherwise, the AI assumes internal checks are performed correctly when the agent’s actions are consistent with the data.

Fairness Safeguards

The scoring AI is designed to be fair and consistent. Several safeguards are built in:

Benefit of the doubt — If a procedure requires an internal check (like looking up account data) and the agent’s response is consistent with the data, the AI assumes the check was performed correctly
No penalty for short conversations — If the player left abruptly or didn’t engage, agents aren’t penalized for not demonstrating behaviors they had no opportunity to show
Time-aware accuracy — When verifying data accuracy, the AI accounts for the fact that account data can change. It evaluates based on what was available during the conversation, not what changed after
Translation-aware — If your workspace uses automatic translation, agents are never penalized for the language of their responses

Tips for Better Scores

Be explicit about visibility. If you want the agent to say something specific to the player, write “Agent must inform the player that…” in your criteria. If the check can be internal, write “Agent must verify…” — the AI will give credit when the agent’s behavior matches the data.

Connect your back-office tools. The more data the AI has access to, the more accurately it can evaluate. Without back-office data, the AI can only judge based on what’s said in the conversation — it can’t verify if the information was correct.

Best Practices

Start simple

Begin with 2-3 sections and a few rules each. Add complexity only when needed.

Be specific in criteria

Vague criteria like “was professional” lead to inconsistent scores. Specify exactly what you’re looking for.

Clarify internal vs. external actions

If a procedure requires internal checks (like verifying data in back-office systems), clarify whether the agent needs to mention this to the player or just perform the check.

Test with real conversations

Review a few scored conversations to calibrate expectations before rolling out widely.

Use manual review for calibration

Regularly compare AI scores to your own assessments. Adjust criteria if there’s consistent disagreement.

Troubleshooting

Issue	What to Check
Score seems too low	Check the rule-by-rule breakdown in the QA tab. Which specific rules failed?
AI used wrong context	Check Reference Sources. Did it pick the right procedure for this conversation type?
Inconsistent scores for similar conversations	Your criteria may be too vague. Add more specific pass/fail examples.
Agent penalized for internal actions	Clarify in criteria whether internal steps need to be visible in the transcript.

Limitations

Back Office Data Timing

When accuracy scoring uses back-office data, this data is fetched at scoring time — which may be minutes after the conversation ended. The AI is designed to account for this: it looks at timestamps in the data and evaluates based on what was available during the conversation, not what changed after. However, if data changes very rapidly (e.g., real-time balance updates), there may be edge cases where the timing matters.

AI Procedures — Documenting procedures that ground QA evaluation
Analytics & Insights — Tracking agent performance over time
Player Authentication — Setting up back-office data access

Get Started

AI Ticket Handling

Analytics & Insights

Working with Human Agents

Responsible Gaming

Settings

Security & Compliance

Channels

Custom Integrations

What Is QA Scoring?

How It Works

Setting Up Your Scorecard

Scorecard Structure

Writing Good Evaluation Criteria

Good Criteria Example

Bad Criteria Example

The QA Tab

Score Overview

Section-by-Section Breakdown

Reference Sources

Feedback

Metrics & Analytics

Key Performance Indicators

Score Trends

Performance by Section

Granular Export

Manual Review

Configuring the Review Scale

Understanding Your Scores

What the AI Considers

How Data Availability Works

Fairness Safeguards

Tips for Better Scores

Best Practices

Troubleshooting

Limitations

Back Office Data Timing

Get Started

AI Ticket Handling

Analytics & Insights

Working with Human Agents

Responsible Gaming

Settings

Security & Compliance

Channels

Custom Integrations

​What Is QA Scoring?

​How It Works

​Setting Up Your Scorecard

​Scorecard Structure

​Writing Good Evaluation Criteria

​Good Criteria Example

​Bad Criteria Example

​The QA Tab

​Score Overview

​Section-by-Section Breakdown

​Reference Sources

​Feedback

​Metrics & Analytics

​Key Performance Indicators

​Score Trends

​Performance by Section

​Granular Export

​Manual Review

​Configuring the Review Scale

​Understanding Your Scores

​What the AI Considers

​How Data Availability Works

​Fairness Safeguards

​Tips for Better Scores

​Best Practices

​Troubleshooting

​Limitations

​Back Office Data Timing

​Related Documentation

What Is QA Scoring?

How It Works

Setting Up Your Scorecard

Scorecard Structure

Writing Good Evaluation Criteria

Good Criteria Example

Bad Criteria Example

The QA Tab

Score Overview

Section-by-Section Breakdown

Reference Sources

Feedback

Metrics & Analytics

Key Performance Indicators

Score Trends

Performance by Section

Granular Export

Manual Review

Configuring the Review Scale

Understanding Your Scores

What the AI Considers

How Data Availability Works

Fairness Safeguards

Tips for Better Scores

Best Practices

Troubleshooting

Limitations

Back Office Data Timing

Related Documentation