QA managers and contact center supervisors spend hours reviewing individual calls, yet the metrics on their dashboards rarely connect to coaching decisions. The seven metrics below predict coaching outcomes, giving you a measurable path from call data to behavior change.
Methodology
These seven metrics were selected based on their direct connection to coaching decisions: each one either identifies what to coach, who to coach, or whether coaching worked. Metrics were evaluated across three dimensions:
| Dimension | What It Measures | Why It Matters for Coaching |
|---|---|---|
| Behavioral specificity | Targets one observable behavior | Enables precise coaching conversations |
| Repeatability signal | Shows patterns, not one-off events | Separates incidents from habits |
| Outcome linkage | Connects to downstream performance | Validates that coaching produced change |
According to ICMI's contact center management research, coaching programs grounded in behavioral observation rather than composite performance scores show significantly stronger development outcomes. Manual QA sampling at 3 to 10% of calls creates blind spots in agent performance data; automated coverage of 100% of calls provides the statistical foundation that makes these metrics reliable.
Avoid this common mistake: coaching to composite scores. A rep who needs help with objection handling responds to targeted objection practice. Generic conversations about overall numbers move nothing.
Metric 1: Criterion-Level Score by Agent
Best suited for: supervisors who want to replace general performance conversations with behavior-specific coaching agendas.
Overall QA scores mask the patterns that drive coaching. A rep who scores 72% average across 40 calls may be perfect on rapport and product knowledge while failing compliance disclosure on 90% of calls.
Key signals to track:
- Bottom three criteria by average score, per rep
- Spread between best and worst criteria (a wide spread means selective failure, not general underperformance)
- Whether the bottom criteria are the same week over week
Insight7 surfaces criterion-level breakdowns for every rep across every scored call automatically, so supervisors can see the coaching agenda rather than build it manually from call notes.
Honest con: Criterion-level data requires well-designed scorecards. First-run AI scores without company-specific context on what "great" and "poor" look like can diverge from human judgment. Tuning to your QA standards typically takes four to six weeks.
Metric 2: Criteria Failure Rate by Call Type
Best suited for: QA leads managing multi-call-type environments where context changes what good looks like.
The same rep may handle inbound service calls well but consistently fail on outbound sales calls. Failure rate segmented by call type reveals whether a performance problem is role-wide or context-specific.
Insight7's dynamic criteria routing automatically applies the correct scorecard per call type, so failure rate data reflects what matters for each interaction, not a one-size scorecard applied to every conversation.
Coaching application: If a rep's failure rate on compliance disclosures spikes specifically on transfer calls, role-play the transfer scenario rather than general compliance training.
Metric 3: First-Call Resolution Rate
Best suited for: supervisors whose coaching goals include reducing callback volume and escalations.
First-call resolution (FCR) is the output metric most directly influenced by coaching. Reps who understand the product, handle objections cleanly, and communicate next steps clearly resolve calls on first contact.
Pair FCR by agent with criterion-level data to identify the cause. Low FCR plus low scores on "provides clear next steps" points to communication training. Low FCR plus low scores on "product knowledge" points to content review.
Honest con: FCR measurement requires reliable callback tracking. Centers that cannot match inbound calls to prior contacts will see inaccurate FCR data regardless of the coaching platform.
Metric 4: Talk Ratio
Best suited for: sales and retention teams where rep over-talking correlates with lower conversion.
Talk ratio measures what percentage of each call the rep is speaking versus the customer. High rep-side talk ratios on consultative calls typically indicate the rep is pitching instead of diagnosing.
Insight7 captures talk ratio alongside behavioral criteria scores, so you can correlate it directly with outcomes and show reps specific moments in actual transcripts where they over-talked.
Honest con: Talk ratio norms vary by call type. Optimal ranges for outbound sales calls differ from inbound support calls. Establish baselines per call type before using talk ratio as a coaching trigger.
Metric 5: Repeat Issue Rate
Best suited for: supervisors who want to distinguish habitual failures from isolated incidents before deciding on coaching intensity.
Repeat issue rate tracks how often the same agent surfaces the same failure across multiple scored calls. A rep who failed to use empathy language once may have had a bad day. A rep who failed on the same criterion across 15 of 20 scored calls has a habit that needs structured practice.
Set a threshold, such as three or more failures on the same criterion in a 30-day window, and trigger automatic coaching assignment. Insight7's auto-suggested training feature does exactly this: when QA scores flag a consistent gap, the platform generates a targeted practice scenario and queues it for supervisor approval before deployment to the rep.
Metric 6: Compliance Rate by Disclosure Type
Best suited for: QA leads in regulated industries where aggregate compliance rates hide specific disclosure gaps.
Compliance tracking at the aggregate level tells you your team is hitting 88% compliance. It does not tell you that mini-Miranda disclosures are being missed at 34% on outbound calls while TCPA language is near-perfect.
Insight7 supports script-based exact-match scoring for compliance items, checking for the specific language required rather than a general impression. This matters for regulated industries where partial disclosure is still a violation.
Honest con: Script-based exact-match scoring can flag compliant calls where rep paraphrasing accurately conveys required content. Pair exact-match checks with intent-based evaluation for disclosure items that permit reasonable paraphrasing.
Metric 7: Coaching Completion-to-Score-Improvement Rate
Best suited for: QA managers and L&D leads who need to demonstrate the ROI of their coaching program to leadership.
This metric validates your entire coaching program. It measures the percentage of reps who completed an assigned coaching activity and showed measurable improvement on the targeted criterion in their next QA cycle.
Of the 12 reps assigned a specific practice activity last month, eight showed score improvement on that criterion in the following 30-day window. That is a 67% behavioral conversion rate, and it tells you what to examine in the sessions that did not convert.
Insight7 connects QA scoring directly to coaching assignment and tracks criterion-level scores over time per rep, making this calculation possible without building a separate tracking system.
If/Then Selection Guide
If your team is scoring less than 20% of calls, then automated QA coverage is the prerequisite before any of these metrics become reliable.
If you manage a compliance-heavy operation (collections, insurance, financial services), then prioritize compliance rate by disclosure type as your primary coaching trigger.
If coaching sessions are happening but behavior on live calls is not changing, then coaching completion-to-score-improvement rate will show you where the conversion gap is.
If you run a multi-call-type environment, then criteria failure rate by call type will show context-specific gaps that aggregate scores hide.
How many calls do you need to score before making a coaching decision?
Quality assurance practice generally requires a minimum of 30 scored calls per rep per evaluation period to draw reliable behavioral conclusions. At manual review rates of 3 to 10% of calls, most contact centers cannot reach this threshold for individual agents. Automated scoring across 100% of calls solves the sample size problem entirely. Contact centers processing 30,000 or more calls per month use Insight7 specifically to reach the coverage needed for statistically reliable per-agent coaching data.
FAQ
How often should QA metrics be reviewed for coaching purposes?
Criterion-level scores and repeat failure rates should be reviewed weekly by direct supervisors. Compliance rates and FCR trends should be reviewed monthly with team leads. Coaching completion-to-improvement rates are most meaningful over 60 to 90-day windows to allow enough post-coaching calls to accumulate.
What is the difference between a QA metric and a coaching metric?
A QA metric measures whether a call met a quality standard. A coaching metric connects that measurement to a development decision. Compliance rate is a QA metric. Coaching completion-to-score-improvement rate is a coaching metric. The seven metrics in this guide are selected because they do both: they measure performance against standards and point to specific coaching actions.
Can QA metrics identify top performers as well as coaching needs?
Yes. The same criterion-level data that identifies coaching needs also identifies reps who consistently excel on specific behaviors. Those reps are internal case studies for peer coaching and onboarding content. Criterion-level top performers are often more effective coaches for specific skills than managers who do not work the calls themselves.
