Tracking soft skills in coaching calls is genuinely hard. Unlike handle time or first-call resolution, empathy, active listening, and adaptability don't appear in a dashboard by default. Yet these behaviors are what separate agents who de-escalate complaints from those who escalate them, and reps who close from those who stall. This guide covers which signals matter, how AI surfaces them from call data, and how to build a feedback loop that changes behavior.
What does active listening look like on a coaching call?
Active listening shows up in measurable signals: the rep paraphrasing the customer's concern before offering a solution, asking clarifying questions before moving to a fix, acknowledging emotional cues, and not interrupting. AI analysis tools flag presence or absence of these behaviors across every call — not just the 3-10% a human QA team can review.
Why can't standard QA frameworks capture soft skills?
Most QA frameworks were built to measure compliance: did the rep follow the script, avoid prohibited language, use the required phrase? Soft skills don't fit that model. A rep can say "I understand your frustration" while sounding robotic and impatient. Evaluating whether a behavior actually landed requires intent-based scoring, not just keyword matching.
Step 1: Define Observable Behaviors Per Criterion
Generic rubrics fail. "Shows empathy" produces inconsistent scores across reviewers. Before you can track anything at scale, you need behavioral anchors: what does good empathy look like, average empathy, poor empathy — stated in terms of specific agent actions observable in a transcript or recording.
For empathy: a good score means the agent named the specific customer situation when acknowledging ("I see you've been waiting since Tuesday"). An average score means a generic phrase was used. A poor score means the agent acknowledged nothing and moved straight to process.
Avoid this mistake: copying a generic soft skills rubric from a training library and applying it without customization. The behaviors that matter for a B2C insurance call are different from those in an outbound sales environment.
Step 2: Track Five Core Soft Skill Signals
Empathy markers: Insight7 found that empathy was used in only 6% of applicable situations at one insurance platform, and correlating empathy with conversion improvements gave the team specific coaching targets rather than generic feedback. Look for acknowledgment tied to the specific situation, not scripted openers.
Interruption rate: Agents who consistently interrupt customers before they finish a sentence signal impatience even when their words are polite. Track interruption rate per rep. Target: fewer than 10% of customer statements interrupted.
Question quality: Track the ratio of open to closed questions during discovery phases. Closed questions ("Did you receive the email?") move calls forward but gather less information and make customers feel processed. Open questions build rapport and surface problems before they escalate.
Resolution confidence: Hedging language ("I think," "I'm not sure but") undermines customer trust in the outcome. Track frequency of confidence-undermining qualifiers per call per rep. Decision point: if a rep averages more than 3 hedges per call, that's a coaching priority.
Emotional regulation under pressure: Track whether the rep's language becomes more clipped or defensive as a call escalates. This requires tone analysis beyond transcription — evaluating sentiment and tonality of the rep's voice, not just the words used.
Step 3: Score 100% of Calls, Not a Sample
Manual QA teams typically review 3-10% of calls. That sample misses reps having bad weeks, underestimates how often soft skill failures occur at scale, and creates fairness issues when agents know they're being judged on 5 calls per month.
Automated call scoring that covers 100% of calls gives you statistical accuracy, removes recency bias from coaching conversations, and lets managers spot trend deterioration before it becomes a pattern. A 2-hour call can be processed in minutes using AI analysis tools.
Don't do this: manually sample calls after implementing automated scoring. The value of 100% coverage comes from catching the outliers that sampling misses.
Step 4: Tie Feedback to Specific Call Evidence
Feedback that says "you need to show more empathy" produces defensiveness or confusion. Feedback tied to a timestamped quote — "at 4:12, the customer said she'd been transferred three times, and your response moved directly to account lookup without acknowledging that" — is actionable.
Every soft skill score should trace back to evidence. Insight7's call analytics links every criterion score to the exact quote and transcript location, so coaching conversations start with shared evidence, not contested impressions.
Step 5: Build Practice Loops, Not Just Feedback Loops
Identifying a soft skill gap is step one. Step two is giving the agent a structured way to practice the corrected behavior before the next live call. AI roleplay tools let reps practice specific scenarios where their soft skills consistently underperform — a simulated hostile customer, a complex objection, a multi-issue complaint — with scoring and feedback from the practice session itself.
Fresh Prints expanded from QA-only to include AI coaching and saw immediate impact: their QA lead noted that reps could "practice right away rather than wait for the next week's call." Closing the gap between feedback and practice is the bottleneck most teams haven't solved.
Reps can retake roleplay sessions unlimited times with scores tracked over time, showing an improvement trajectory until they clear the configured pass threshold.
If/Then Decision Framework
If your team has no soft skill tracking at all -> start with empathy markers and interruption rate. These are the easiest to define and most correlated with CSAT.
If you're scoring manually and getting inconsistent results -> the problem is criteria definition, not scoring volume. Rewrite your rubric with behavioral anchors before expanding coverage.
If you're scoring 100% of calls but coaching isn't changing behavior -> the feedback loop is broken. Check whether feedback is tied to specific call evidence and whether agents have a practice path, not just a scorecard.
If agents are improving scores but CSAT isn't moving -> your criteria may be measuring compliance with language patterns rather than authentic behavior. Review whether scoring captures intent or just phrasing.
FAQ
How often should soft skills be reviewed in coaching calls?
Soft skill scores should feed into weekly or biweekly 1:1s, not just monthly QA reviews. High-frequency feedback on recent calls is more effective than retrospective reviews of calls from three weeks ago. Trend data — is this rep's empathy score improving over the last four sessions — is more actionable than individual call snapshots.
Can AI accurately detect soft skills like empathy and tone?
AI tools are improving significantly but require configuration. Out-of-the-box sentiment analysis often misclassifies tone. Well-configured tools that score intent rather than keyword presence perform substantially better. According to Insight7's implementation data, criteria tuning to align AI scores with human judgment typically takes 4-6 weeks.
Running Soft Skill Coaching at Scale
Teams seeing the most improvement from soft skills coaching combine three elements: automated scoring across all calls so no one hides from feedback, evidence-backed coaching conversations so feedback is specific not general, and structured practice via AI roleplay so agents can act on feedback immediately. Insight7 brings these three elements into one platform — call analytics that surface soft skill gaps across your full call volume, coaching modules that convert scorecard feedback into practice scenarios, and trajectory tracking that shows whether agents are actually developing over time.
