How to Use Call Data to Measure Soft Skill Development in Agents

Call data gives managers an objective measure of soft skills that observation-based assessments cannot provide. Where a manager reviewing 5 calls per month sees a sample, call analytics applied to every interaction reveals whether empathy, active listening, and communication behaviors actually appear in the interactions that matter. This guide covers how to use call data to measure soft skill development in agents and how to connect that measurement to coaching interventions that produce lasting behavior change.

Why Soft Skills Are Hard to Measure Without Call Data

Soft skills like empathy, active listening, and ownership language are notoriously difficult to assess because they depend on context. An agent can demonstrate empathy in a calm interaction and fail in a difficult one. Manager observation captures which calls the manager happened to review, not how the agent actually performs under pressure.

Call data changes this by measuring soft skill behaviors across hundreds of interactions rather than a handful. The specific behaviors that define empathy (naming the customer's stated frustration, acknowledging wait time before redirecting), active listening (referencing earlier parts of the conversation, asking follow-up questions based on the customer's responses), and ownership language (using first-person commitment rather than policy deflection) can all be scored at the call level.

According to ATD research on learning measurement, organizations that use behavioral observation data to assess soft skills achieve higher training ROI than those relying on self-assessment or supervisor impression alone.

What Methods Can You Use to Assess Comprehension and Skill Development in Agents?

The most reliable methods for measuring agent skill development combine behavioral scoring rubrics with call data analysis. Rubric-based scoring defines what each skill looks like at each performance level (not just "empathy: yes or no" but specific behavioral anchors at each score level). Applied to a random sample of 10 or more calls per agent, this approach identifies whether skills are present across different interaction types, not just observed calls. Pairing rubric scores with 30-day re-measurement cycles confirms whether coaching produced lasting change or temporary compliance.

Step 1: Translate Soft Skills into Observable Behaviors

Measuring "empathy" is not possible at scale. Measuring "agent names the customer's specific frustration in the first 60 seconds of a complaint call" is. The first step in using call data for soft skill measurement is translating each soft skill into 2 to 3 observable, scoreable behaviors.

For empathy, the scoreable behaviors might include: naming the customer's frustration before moving to resolution, acknowledging wait time when the customer references it, and avoiding policy language as the first response to a complaint. For active listening: referencing what the customer said earlier in the conversation, asking at least one follow-up question based on the customer's response (not from a script), and pausing at least 2 seconds after the customer finishes before responding. These behaviors can be detected in transcripts and scored with behavioral anchors.

Common mistake: Using binary scoring (yes/no) for soft skills. Binary scoring cannot distinguish between an agent who sometimes demonstrates empathy and one who demonstrates it consistently. Use a 1 to 5 scale with behavioral anchors at each level.

Step 2: Score a Baseline Sample Across All Agents

Before using call data to measure improvement, establish a baseline. Pull a random sample of 10 calls per agent from the last 30 days. Score each call against your soft skill rubric, focusing on 2 to 3 behaviors per skill dimension rather than attempting to score everything at once.

The baseline serves two purposes. First, it identifies the team-wide average for each behavior, which becomes the benchmark for improvement. Second, it identifies which agents score highest on each soft skill dimension. These agents become peer coaching candidates for the behaviors where they excel.

Target at least 80% inter-rater reliability before using the rubric for formal assessment. Have two managers score the same 5 calls independently. Where they disagree by more than 1 point on a 5-point scale, refine the behavioral anchor for that criterion.

Insight7 applies your custom rubric to every call automatically and generates per-agent scorecards with dimension-level breakdowns. The baseline period requires no additional manager time because scoring happens as calls are processed. According to Insight7 platform data, manual QA programs typically cover 3-10% of calls, while automated coverage applies the same rubric to 100% of volume.

Step 3: Identify Soft Skill Gaps That Are Coaching-Addressable

Not every soft skill gap is a coaching problem. Some patterns are hiring problems (the behavior is absent across a new cohort but present in the rest of the team). Some are process problems (agents skip empathy acknowledgment because the script does not include it). And some are genuine coaching problems (agents know what to do but do not do it under pressure).

Use your baseline data to distinguish between these. A behavior that scores below 2.5 across 80% of the team is likely a process or training problem. A behavior that scores below 2.5 for specific agents while the rest of the team scores 3.5 or above is a coaching problem. Address them differently: process problems need script or workflow changes, coaching problems need targeted roleplay practice.

How Do You Measure Soft Skill Improvement Over Time?

Measure soft skill improvement by comparing rubric scores for specific behaviors at three intervals: the baseline period (30 days before any coaching intervention), 30 days after the first coaching cycle, and 60 days after. You are looking for sustained improvement, not a post-coaching bump that decays. If scores return to baseline within 30 days of coaching, the coaching addressed awareness rather than behavior change. Add structured roleplay practice to the next cycle, focusing on the interactions where the behavior fails most consistently.

Step 4: Connect Skill Scores to Customer Outcomes

Measuring soft skills in isolation produces activity metrics. Connecting soft skill scores to customer outcome data produces evidence of business impact. Pull CSAT scores, first call resolution rates, or complaint escalation rates alongside soft skill rubric scores for the same time periods and agents.

If agents who score above 4 out of 5 on empathy behaviors show CSAT scores that are 8 to 15 points higher than agents scoring below 3, the empathy behaviors are validated as business-relevant. This validation serves two purposes: it confirms coaching is targeting the right behaviors, and it provides the business case for continuing the program.

Insight7's voice of customer dashboard surfaces customer sentiment patterns alongside agent scores, making it possible to see whether high-empathy agents also produce lower complaint rates or higher satisfaction scores in the same view.

Step 5: Build Roleplay Scenarios from Low-Scoring Call Moments

The most effective coaching uses the specific call moments where soft skills failed as the basis for practice. After identifying which agents score below threshold on specific behaviors, pull the 3 calls where the behavior gap was most pronounced.

Build roleplay scenarios from those exact call moments. For an agent who consistently fails to acknowledge frustration before redirecting, create a scenario with the specific customer profile, trigger, and success criteria from the actual call. The agent practices until they hit the passing threshold. Insight7's AI coaching module generates roleplay sessions directly from call transcripts, so practice scenarios match real customer interaction patterns rather than generic training templates.

Score progression is tracked over time. After 30 days of targeted roleplay, pull the agent's rubric scores for the coached behavior from their most recent 10 calls and compare to the baseline.

Fresh Prints expanded from QA into the coaching module specifically because they wanted reps to practice flagged skills immediately after receiving QA feedback. Read more on the Fresh Prints case study page.

If/Then Decision Framework

If your agents consistently score above 4 on soft skills during low-volume periods but below 3 during peak volume, then the gap is pressure-based, not skill-based. Practice under simulated high-pressure scenarios, not standard roleplay.

If soft skill scores vary widely by call type (complaints score low, standard inquiries score high), then build separate rubrics for complaint calls and routine calls and track improvement by interaction type.

If you cannot connect soft skill scores to CSAT or first-call resolution data after 90 days, then either your rubric is measuring the wrong behaviors or your CSAT data has a lag. Check CSAT collection timing before assuming the coaching is ineffective.

If you need to measure soft skill development across a hybrid team of 20+ agents with 500+ calls per week, then Insight7 provides automated scoring at the scale needed to generate representative data without additional manager time.

Common Questions

What are the best methods to assess agent skill development?
The most reliable methods combine behavioral scoring rubrics with call data analysis across a random sample of 10 or more calls per agent. Single-observation assessments miss the variation in performance across different interaction types. Platforms like Insight7 score every call automatically, eliminating sampling bias and producing representative skill assessment data at no additional manager time cost.

How do you measure empathy in customer service calls?
Measure empathy by defining 2 to 3 observable behaviors (naming frustration, acknowledging wait time, avoiding policy-first responses), building a 1 to 5 scale with behavioral anchors at each level, and scoring a random sample of calls. Track whether scores improve after coaching. Binary empathy measurement is insufficient because it cannot detect the difference between situational empathy and consistent empathy across interaction types.

How long does it take to see soft skill improvement after coaching?
Most agents show measurable rubric score improvement within 30 days of targeted coaching combined with structured roleplay practice. Sustained improvement at the 60-day re-measurement is the indicator of genuine skill acquisition. See how Insight7 tracks soft skill progression over time.