Course evaluations: everyone knows them and uses them, but does everyone know what are they good for? Opinions are very much split on how to evaluate the evaluation. University teachers differ from university administrators, for example, when they assign importance to the results of course evaluations. Whereas faculty are more skeptical, administrators rather confidently believe that the responses to end-of-course assessments represent an accurate description of teacher effectiveness (Morgan, Sneed and Swinney 2003).
The truth of the matter is that we do not know for sure what the impact of these evaluations is. Do they really help improve teaching? Do they help improve learning? And perhaps most interestingly, are course evaluations true? I would like to discuss here this last point, and to rephrase it in a milder form: Do course evaluations deliver the answers expected by teachers and administrators? Or do students respond based on assumptions outside the scope of the course?
Research done in several academic environments points out that the students’ answers cannot be taken as facts, but are raw data in need of interpretation and contextualization. In a study performed at the Hong Kong Polytechnic University, Kwan (1999) reaches the conclusion that students base their answers on factors external to the course. This is reflected in the following four observations:
1. Humanities courses tend to get better evaluations than science courses, regardless of the variation within the respective curricula;
2. Courses with fewer students (the borderline is at around 20) get much more positive evaluations than large courses;
3. Courses at the advanced level get slightly better evals than those at the basic level;
4. Optional courses are better appreciated than obligatory ones.
I do not wish to engage here with the possible explanations of these results, even though, of course, it is very interesting to investigate why these phenomena happen, I concern myself here with the accuracy of course evaluations. If a teacher is assigned a mandatory first-year course with one hundred students, she is very likely to get poorer results on the course evaluations than a colleague teaching a smaller, optional course for the third-year students. And this is regardless of the actual pedagogical skills and competence of the persons in question!
In another very recent study, this time on Swedish students active on a site equivalent to the US “Rate My Professor”, Karlsson and Lundberg (2012) analyze 98 assessments of faculty from across the universities in Sweden. They come to the conclusion that there is a clear gender and age bias in the ratings provided on the site. Younger teachers tend to obtain lower marks in comparison with more senior faculty. Women teachers also consistently receive poorer ratings in comparison with their male counterparts. The effects are worse if the two negative factors are combined: if you are a young female teacher your evaluations are likely to be significantly below those of a senior male teacher at the same institution.
The Swedish study corroborates with the earlier investigation on students at US universities by Sprague and Massoni (2005). They asked almost 300 students the decievingly simple question “who was the best respectively the worst teacher you have ever had?”. The answers reveal, among other things, the “Ginger Rogers effect”: in order for a women teacher to obtain the same level of recognition they need to invest more energy and emotional commitment in their students in comparison with a male teacher. Or, as one of my colleagues put it, as a women faculty member “one has to do the same dance steps, but in high heels”.
As we have seen, factors extrinsic to the course affect the evaluation results and do not provide an accurate description of the teacher’s effectiveness. Moreover, evaluations need to be properly situated in their cultural and social context, as students who respond to them often share the general prejudices and stereotypes that are the norm in a given society. Before judging teacher performance, tenure assessment committees should certainly evaluate the course evaluation.
Karlsson, Martin och Erik Lundberg. 2012. “I betraktarens ögon – Betydelsen av kön och ålder för studenters läraromdömen.” Högre utbildning 2:1, 19-32.
Kwan, Kam‐por. 1999. ” How Fair are Student Ratings in Assessing the Teaching Performance of University Teachers?”. Assessment & Evaluation in Higher Education 24:2
Morgan, Donald A., John Sneed and Laura Swinney. 2003. “Are student evaluations a valid measure of teaching effectiveness: perceptions of accounting faculty members and administrators”, Management Research News, 26 (7): 17-32.
Sprague, Joey and Kelley Massoni. 2005. “Student Evaluations And Gendered Expectations: What We Can’t Count Can Hurt Us.” Sex Roles: A Journal of Research 53, 11‐12: 779‐793.
This post was also published in Inside Higher Ed