Process Evaluation Methods in Social Work

L&D directors and training program managers in customer-facing organizations often discover that a training program was completed without ever knowing whether it worked. Process evaluation gives you the structured methods to find out. This guide covers six steps for applying process evaluation to contact center and customer-facing team training, from defining behavioral outcomes before the program runs to calculating ROI and feeding results back into program design.

What are the 5 levels of training evaluation?

The Kirkpatrick/Phillips model defines five levels of training evaluation. Level 1 (Reaction) measures participant satisfaction immediately after training. Level 2 (Learning) measures knowledge or skill acquisition during the program. Level 3 (Behavior) measures on-the-job behavior change weeks after training ends. Level 4 (Results) measures organizational outcomes such as call quality scores, conversion rates, or compliance rates that the training was designed to move. Level 5 (ROI) compares the monetary value of those results to the cost of the program. For contact center and customer-facing teams, Levels 3 and 4 are the most operationally relevant, because both are directly measurable through call behavior data.

Why does process evaluation matter more than outcome evaluation alone?

Outcome evaluation tells you whether results changed. Process evaluation tells you whether the training was delivered as designed and whether the mechanism connecting training to outcomes is working. A program can show improved call scores without the training being the cause, or can fail to show improvement even when it was well-executed, because baseline conditions were not measured or post-training behavior data was not collected. Process evaluation closes that gap by tracking what happened at each stage: how the training was designed, what participants actually did in sessions, and how their on-the-job behavior changed against a documented baseline.

Step 1: Define What Behaviors the Training Was Designed to Change

The most common failure in training program evaluation is measuring the wrong thing. Programs are designed to change behavior, not to improve satisfaction scores. Start by writing behavioral outcomes in observable terms.

A behavioral outcome for a contact center training program might be: "After training, agents ask open-ended discovery questions in the first two minutes of a call at least 80% of the time." That is specific enough to measure against call recordings. Compare it to a vague outcome like "improve communication skills," which cannot be measured or falsified.

Document three to five behavioral outcomes for the program. For each, define the evaluation criterion (what behavior will be observed), the measurement method (call scoring, manager observation, quality review), and the target threshold (what improvement counts as success). This documentation becomes the specification for your baseline measurement in Step 3.

Avoid this common mistake: Defining training outcomes after the program has already run. Outcomes defined retroactively are fitted to whatever data exists rather than to what the program was actually designed to do.

Step 2: Select Your Evaluation Method

Different evaluation methods are suited to different program types and organizational contexts. For contact center and customer-facing team training, the following three approaches are most relevant.

Kirkpatrick Levels 3-4 with call scoring is the most direct method for teams with call recording infrastructure. Pre- and post-training call scores on defined behavioral criteria give you a clean before/after comparison. This method produces behavioral evidence rather than self-reported estimates.

Phillips ROI Model extends Kirkpatrick Level 4 by isolating the training's contribution to results (separating it from market conditions, rep tenure, and other factors) and converting outcomes to monetary value. The Phillips ROI Institute methodology is the industry standard for training ROI calculation and requires both a solid behavioral baseline and a method for isolating training effects.

Behavioral observation scoring uses trained observers (managers, QA analysts, or automated scoring tools) to rate target behaviors before and after training. This method works for any customer interaction type, including live chat and video calls, not only phone calls.

Select one primary method and stick with it across cohorts. Changing measurement approaches between training cycles makes it impossible to compare results over time.

Step 3: Establish a Pre-Training Baseline

A baseline is the measurement of current call behavior before the training program runs. Without a baseline, you have no way to attribute post-training score changes to the program.

Run baseline scoring against the behavioral criteria defined in Step 1 for a minimum of two weeks before training begins. Score the same set of calls or interaction types that will be scored post-training. Document the average score per criterion per agent or team.

Insight7 automates this step for teams with call recording infrastructure. The platform scores 100% of calls against configurable evaluation criteria, generating per-agent behavioral baselines without requiring a manual QA analyst to review a sample. Manual QA programs typically cover only 3-10% of calls, which means baseline scores are often drawn from a sample too small to be reliable. A complete call dataset produces a more accurate behavioral baseline.

Store the baseline scores with the training cohort data. The baseline becomes the denominator in your post-training delta calculation.

Step 4: Run the Training Program

Execute the training as designed. Process evaluation requires that you document what actually happened during delivery, not just what was planned.

Track attendance and completion rates by session. Note any content that was skipped, condensed, or supplemented in real time. Document whether roleplay or scenario practice occurred as planned, and how many practice repetitions each participant completed. Collect participant reaction data (Level 1) at the end of each session.

This delivery documentation is the "process" in process evaluation. If post-training behavioral scores do not improve, delivery documentation tells you whether the gap is a design problem (the program was executed correctly but did not work) or a delivery problem (the program was not executed as designed).

For contact center teams using Insight7 AI roleplay, session completion data is tracked automatically. Managers can see which reps completed practice scenarios, how many times they retook a session, and how their roleplay scores progressed before the live training concluded.

Step 5: Measure Post-Training Behavior Against Baseline

Four to six weeks after training completion, run the same behavioral scoring against the same criteria used to establish the baseline. This interval allows behavior to stabilize: early post-training performance often dips before improving as reps apply new techniques under real call pressure.

Calculate the behavioral delta: the difference between post-training scores and baseline scores for each criterion and each participant. A useful output is a criterion-level heatmap showing which behaviors improved, which stayed flat, and which regressed.

Insight7 generates agent scorecards that cluster calls across a defined time period, making it straightforward to compare a pre-training period scorecard against a post-training period scorecard for the same rep on the same criteria. Because the platform evaluates 100% of calls, the post-training dataset is complete rather than sampled.

Behaviors that did not improve after training are the highest-priority inputs for program redesign. Behaviors that improved significantly are the components to protect in the next program iteration.

Step 6: Calculate Behavioral ROI and Feed Back to Program Design

Behavioral ROI connects the measured behavior change (Step 5) to an operational outcome the organization cares about. For contact center teams, relevant outcomes include call quality scores, customer satisfaction ratings, conversion rates, and compliance pass rates.

Calculate ROI by estimating the monetary value of the behavioral improvement. For example: if post-training call quality scores increased by eight points on average, and internal data shows that each quality score point above threshold correlates with a defined improvement in customer satisfaction or conversion, the training's behavioral outcome can be converted to a revenue or cost-avoidance number. Set this against the cost of the training program (design, facilitation, technology, rep time off production).

The CDC's training effectiveness framework describes this as the full measurement cycle: from reaction through to results and return. Most organizations stop before completing it.

Feed the ROI calculation and the criterion-level behavioral delta into the next program design cycle. Document which program components correlated with the strongest behavioral improvement and which correlated with flat or negative results. Use this evidence to prioritize redesign effort.

FAQ

What is the difference between process evaluation and outcome evaluation in training?

Outcome evaluation measures whether training produced the intended result, such as improved call scores or higher conversion rates. Process evaluation examines whether the training was delivered as designed and whether the mechanism connecting training to outcomes is functioning. Both are needed: outcome data without process data cannot explain why results did or did not materialize.

How often should contact center training programs be evaluated?

Evaluate behavioral outcomes at two points per training cohort: two weeks post-training for an early signal and six weeks post-training for a stable behavioral measure. Program-level process evaluation should be conducted after every cohort for the first three cycles of a new program, then annually for mature programs unless performance data signals a change in effectiveness.

Can process evaluation methods be applied to on-the-job coaching, not just formal training events?

Yes. The same framework applies to ongoing coaching programs. Define the behavioral criteria the coaching is designed to move, establish a baseline before coaching begins, track coaching delivery (sessions completed, behaviors targeted), measure post-coaching call behavior at a defined interval, and calculate the behavioral delta. Insight7 supports this cycle for call-based coaching by generating ongoing behavioral scorecards that can serve as both coaching inputs and evaluation outputs.