Wednesday, April 13, 2011

In Defense of the Four Levels

By Shelley A. Gable

Over the past year or so, I’ve noticed several comments about how Kirkpatrick’s model of four levels of evaluation is outdated.

I don’t agree.

The debate reappeared on my radar in a Twitter #lrnchat session a couple weeks ago. Evaluation was the discussion topic, and several tweets mentioned that the model originated in the 1950s, a lot has changed since then, and we ought to follow a more current model.

Before digging into its pros and cons, let’s do a quick mini-review of the model.

  • Level 1: Learner opinions. Did they like the training?

  • Level 2: Performance during training. Are learners meeting the stated objectives (typically via quizzes, practice activities, skill assessments, etc.)?

  • Level 3: Performance on the job. Are they demonstrating the expected behaviors on the job? Are they meeting expectations in terms of quality, quantity, etc.?

  • Level 4: Organizational impact. How did training outcomes impact the organization, quantitatively and qualitatively? What was the return on investment?

So now let’s dig in....

Does the model include anything irrelevant?

I don’t think so.

An initial gap analysis should identify specific business needs (level 4) and what is required to fulfill those needs (level 3). So it makes perfect sense that we’d evaluate those same things later and report results when we can.

It also makes sense to assess learners’ knowledge and performance during training (level 2), for the sake of corrective coaching, encouragement, and potentially offering additional support to help learners prepare for on-the-job application.

When it comes to learner satisfaction (level 1), there’s a lot of talk about the tendency to focus too much on this and too little on the other levels. I agree that’s a mistake. That said, I still want to know how learners felt about their training, for the sake of improving the experience, working out bugs, and potentially helping to identify the causes of any results gaps.

Does the model leave anything out?

I can’t think of anything.

Though it’s common to track “butts in seats” and other attendance-related metrics not accounted for in the model, these measures seem more related to staffing and forecasting as opposed to training results.

We talk a lot about the need for improved diligence in the field with measuring job performance and business results. I agree that we should do this consistently. And so does the model (levels 3-4).


Like any model, Kirkpatrick’s four levels has limitations.

A few disclaimers: I’m not trying to suggest that it’s perfect. And I’m not trying to suggest that it covers everything we need to think about related to evaluation (and in fairness, I doubt this was ever the intent). Nor am I suggesting that following the model makes evaluation easy.

Some criticize the model because it seems to focus exclusively on a learning event, when learning is actually an ongoing process. Even if Kirkpatrick was thinking about learning “events” when introducing the model, I think the levels can apply to learning as an event or an ongoing process. The model suggests the types of results to measure. It’s up to us determine the subject we’re evaluating (event vs. something ongoing) and how to collect and analyze the data.

Some criticize the model because it neglects confounding variables, such as post-training support from learners’ managers and ongoing accountability. These are just a couple of examples, and I agree that factors like these are critical to a training initiative’s success. So perhaps we still measure the things outlined by the four levels, while also investigating how those other key variables helped or hindered the effort (Brinkerhoff’s Success Case Method can help).

Many insist that we should flip the model upside-down, with organizational impact and on-the-job performance as the first levels. I can see how numbering the levels might suggest prioritization or sequence. So with that interpretation, presenting the levels in reverse order makes a lot of sense.

The bottom line...

My view of Kirkpatrick’s model is that it suggests what to measure to help determine how successful a training initiative was, but it doesn’t spell out how to scope and execute the evaluation effort.

So with that in mind, I believe that the model is still relevant and useful today. Even if we need other information sources to help us with the rest.

Do you agree? Disagree? I’m sure I haven’t thought of everything here, so please take a moment to share your two cents!


  1. Interesting article. Thanks for the link to Brinkerhoff's book.

  2. I still see merit in the model. It has a good strong foundation just like ADDIE does for Instructional Design. I would also add a Level 5 to consider ROI.
    It's hard to use it to measure informal learning.


Thank you for your comments.