Soft skills are invisible until you define what they look like in a real call. A training director who cannot show that empathy scores increased after an empathy training program has a measurement problem, not a training problem. This six-step guide shows you how to configure QA scoring criteria that capture soft skills as observable behaviors, pull a pre-training baseline, and attribute post-training criterion deltas to the specific training that targeted them.

What You Need Before Step 1

Gather these before starting: access to 30 days of call recordings prior to training, your current QA criteria if any exist, and a list of the soft skills your training covers. You also need agreement from your training and QA teams on which behaviors will represent each skill, because ambiguity here invalidates the entire pre/post comparison.

Step 1: Define Soft Skills as Observable Call Behaviors

Every soft skill your training covers needs a behavioral anchor. Empathy is observable when an agent acknowledges the customer's specific concern in their own words before offering a solution. Active listening is observable when an agent asks a follow-up question that references something the customer said earlier in the call. Adaptability is observable when an agent changes their communication approach mid-call in response to a customer signal.

Document each definition at the behavioral anchor level, not the concept level. "Agent demonstrates empathy" is a concept. "Agent repeats the customer's concern using the customer's language within the first 90 seconds, before offering any solution" is a behavioral anchor. The anchor must pass the new hire test: could someone who started today understand exactly what to score?

Common mistake: Defining soft skills with the same criteria you use for compliance. Empathy scored as "agent used at least one empathy phrase from the approved script" measures compliance, not empathy. Intent-based criteria, which evaluate whether the agent achieved the empathic goal regardless of exact phrasing, capture the soft skill more accurately.

Step 2: Configure QA Scoring Criteria That Capture These Behaviors

Build your soft skill criteria into your QA rubric before collecting any pre-training data. Each criterion needs: the behavioral anchor, a scale (1 to 5 or 1 to 3 works for nuanced behaviors), score-level descriptions for each level, and a weight relative to other criteria.

For a training evaluation, soft skill criteria should carry enough weight to be visible in the overall score movement. If empathy represents 5% of your total score, a 15-point improvement in empathy produces only a 0.75-point improvement overall, which is noise, not signal. Weight the soft skills you are training at 20 to 30% combined during the evaluation period.

Insight7 supports both intent-based and script-based criteria with full weighting control. Training directors configure soft skill criteria with behavioral descriptions, set weights, and deploy them to 100% of calls without additional manual review effort.

Decision point: Choose between a dedicated soft skill rubric for the training evaluation period versus integrating soft skill criteria into your permanent QA rubric. A dedicated evaluation rubric gives you cleaner pre/post data but requires a rubric swap after the evaluation period. Integrating criteria into your permanent rubric is more sustainable but may dilute the signal. For high-stakes training programs, use a dedicated rubric for 90 days, then integrate the best-performing criteria permanently.

Step 3: Pull Pre-Training Baseline

Score 15 to 20 calls per employee from the 30 days before training begins. Calculate the average score for each soft skill criterion across the cohort. Document the baseline at both the cohort level (overall average) and the individual level (per-rep average).

The baseline serves two purposes: it establishes the starting point for calculating post-training improvement, and it identifies which employees were already strong before training (who may not show large deltas but whose absolute scores validate the criteria). Employees who score 80%+ on a criterion before training need targeted measurement on a different, more advanced criterion.

Common mistake: Pulling baseline data during or after training begins. Even the first day of training changes behavior. Baseline data must come exclusively from the pre-training period.

Step 4: Run Training and Score Post-Training Calls Against the Same Criteria

Run training. Do not change scoring criteria during this period. Beginning two weeks after training completion, score 15 to 20 post-training calls per employee using the exact same behavioral anchors and weights.

Calculate the criterion delta for each soft skill: post-training average minus pre-training average, per employee and for the cohort. A cohort-level delta of 12 percentage points on empathy after an empathy training program is a measurable, attributable outcome. A delta of 2 percentage points may be within normal call variation and should not be reported as training impact.

How Insight7 handles this step: Insight7's QA engine applies your configured criteria to every call automatically, generating per-agent scorecards that show criterion-level scores over time. A training director can view the cohort dashboard, filter by training cohort and date range, and see the criterion delta for every soft skill without manual data aggregation. See how AI coaching tracks behavioral improvement post-training.

Step 5: Measure Criterion Delta and Attribute to Training

Attribution requires more than a pre/post comparison. You need to verify that other explanations for the improvement are implausible: no script changes during the evaluation period, no major product changes, no significant team turnover that would shift the cohort composition.

Document the attribution case: training covered behavior X, criterion X increased by Y percentage points in the 30 days after training, no competing explanations exist, and the delta exceeds normal call-to-call variation (typically 3 to 5 percentage points for stable criteria). This documentation supports your L&D budget case and connects training investment to business outcome data.

For skills that did not show improvement, document that too. If your training covered active listening but the active listening criterion did not move, the training either did not address the behavior as defined in the rubric, or the rubric definition does not match what the training taught. Either way, the data identifies a program gap.

Step 6: Track Criterion Scores at 30, 60, and 90 Days Post-Training

Initial post-training scores frequently understate durable improvement. Employees trying to apply new techniques are often more mechanical in weeks one and two than they are at 30 days when behaviors begin to integrate naturally. Schedule three measurement windows and track the trajectory.

At 60 days, identify employees whose scores have plateaued below the cohort improvement threshold. These employees need coaching reinforcement targeted at the specific criterion that plateaued, not re-training on the full program. At 90 days, the employees whose scores continue to improve without additional coaching have internalized the behavior. These are your program success cases.

Insight7's platform generates trend charts by criterion over configurable date ranges. A training director can see which soft skill criteria are tracking upward, which are flat, and which are regressing, without manually reviewing calls.

What Good Looks Like at 90 Days

After 90 days of structured measurement, a training director should see: cohort-level criterion deltas of 10 to 20 percentage points on the behaviors directly targeted by training, clear separation between soft skill improvement (intent-based criteria moving) and compliance improvement (script criteria moving), and a documented attribution case connecting each training program to specific behavior changes in call data.


How do you measure soft skill development after training?

Define each soft skill as a behavioral anchor visible in call recordings, configure QA scoring criteria for those behaviors, score a pre-training baseline, then score post-training calls against the same criteria. Calculate the criterion delta for each skill at 30, 60, and 90 days. A delta of 10 or more percentage points on a criterion targeted by training indicates measurable skill development.

What are the best methods to assess comprehension and soft skills after training?

For soft skill assessment specifically, behavioral observation via call scoring outperforms knowledge tests because it captures whether employees apply skills, not just whether they can define them. Configure QA criteria as intent-based rather than script-based to capture genuine behavior rather than phrase mimicry. Pre/post criterion comparison using consistent scoring anchors is the most direct measurement method.

What is the difference between soft skill improvement and compliance improvement in call data?

Compliance improvement reflects score increases on verbatim criteria: required disclosures delivered, scripts followed, process steps completed. Soft skill improvement reflects score increases on intent-based criteria: empathy demonstrated regardless of exact phrasing, questions asked that evidence listening. Configure these as separate criterion types in your QA rubric to distinguish them in post-training analysis.

How long should you track employees after soft skill training?

Track for a minimum of 90 days using three measurement windows: 30 days (initial adoption), 60 days (consolidation), and 90 days (integration). Single post-training snapshots miss the common plateau-and-recovery pattern where scores dip at 60 days before rising again. The 90-day measurement provides the most defensible evidence of durable behavior change.


Training director measuring soft skill development across 20 or more reps? See how Insight7 scores behavioral criteria across 100% of post-training calls automatically. See it in 20 minutes.