Problem failing students

In the summer of 1999, I reflected on the clash of junior college students at Chinmin (the immovable object) and grading (the irresistible force)

I am crossposting this rambling message on AERA-D, TESL-L, FLTEAcH and LTEST-L lists, because although I think it mainly involves generaic educational questions about grading, the situation is an (English as a) foreign language one.

My purpose in writing is to clear up for myself some confusions I had grading my students–language majors in a junior college in Taiwan–and to ask for advice about scaling scores over the various assessments I carry out so that I can come up with appropriate final grade distributions.

Last semester was my first semester at the junior college. Last year I had been teaching at a university in Korea. There I had developed a routine of failing no-one and in Freshman English classes of mapping the final scores (the sum of atttendance + quizzes + midterm + final) one-by-one on to appropriate numerical grades from 60-100 so approximately six percent of the 50-70 students in the class got A’s, 33 percent got B’s, 33 percent got C’s and 16 percent got D’s. This practice I developed mostly in isolation of other teachers’ practices at the school, though the school did request grade distributions in a range around these figures.

When I started teaching at the junior college in Taiwan, the question was what to expect from the students (Two meanings: What did I want the students to do? (Or, more accurately, What would I like the students to be able to do?) And, what did I think the students would in fact be able to do?)

Before I had had too much experience with the students, I learned teachers did not expect as much as I had expected. I didn’t want to lower my expectations however (I wanted to leave open the possibility you could still be a good language learner and a poor academic student.)

The rest of the semester was a struggle between the ideal and the reality–the two meanings of expectation–or confusion about what my students could do (or between what I wanted to do and what the students would let me do) as I tried to cope with the demands of my new situation.

These demands and my confusion resulted in some personal dissatisfaction about how I graded. I only did two or three quizzes over the semester. This was for workload reasons. The school required/requested the quizzes to be recorded as out of 100 [I had to compute the average score of the quizzes. The final grade was to be 30 percent quizzes, 30 percent midterm and 40 percent final].

In Korea I had been giving 0, 1, 2 for each quiz for a total of 25. [Because of my confusion about what to expect, not having taught at this level or in Taiwan before, I felt my only choice was the same normal-curved grading practices I had been following before]. A score out of 100 for a quiz on the other hand didn’t just make the arithmetic more difficult. It also could be seen to be a prediction of a final grade a student could expect on the course. I didn’t want to be accountable for the expectations these quizzes could create.

The midterms however I had to report straight away so I felt I had to scale them in the way I had done in the university in Korea for the final grade, distributed around a mean of 80 and a standard deviation of 10.

I had been told the school did not require grades to fit any specific distribution but I had to report the average, the numbers in the ranges, below 60 (failure), 60-70, 70-80, 80-90 and 90-100, and the numbers above and below 80.

Afterwards, the dean talked to me about not failing any students. I said I was satisfied with the work being done by all the students (!?) I did not say I thought I could only give students F’s when I could give the Taiwan English as a Foreign Language education system a B. She said it might not be academically desirable, but when teachers don’t fail anyone in the class, the students don’t study unless they see there is a chance they will fail if they don’t.

I asked how many students I could fail. She said 10. I assumed she meant of the 50 students in each class. She said I should specify what students should be able to do. I said that was the question. But I felt I had to stop bending over backward (cut my work?)

I couldn’t report the raw scores the students got on my tests, as the questions were only ones I would have liked them to be able to answer. I had little idea whether they would be able to answer them. As it turned out, many raw scores were way below 60.

At about this point, I was also being introduced to spreadsheets and [before?] saw that computing things like standard deviations might not be as much hard work. So I decided for the final to just apply a linear transform to give a mean of 80 and a standard deviation of 10 and let the students fail with total scores (30 percent of quiz score, 30 percent of midterm score + 40 percent of final score) less than 60.

As it turned out, no students failed despite this policy, because the standard deviations of the total grade scores for the various classes were all around 7-8, less than the 10 created for the final grades and assumed in the midterm grades. The correlations between the midterms and finals ranged from 0.35-0.7 over the various classes.

I think this phenomenon is the one commonly described as regression toward the mean.

What I am looking for is advice on how to grade quizzes, finals and midterms so the distribution has a mean of 80 and standard deviation of 10 and the calculation of a grade is transparent. Ideally, the grade on a test would be just the sum of the questions correctly answered, but such a “ballistic” test procedure is beyond my capability. I think it is a necessary evil to curve the scores on individual tests. I would like some statistical procedure that would allow me to retain the transparency of 30 percent of quizzes, 30 percent of midterm and 40 percent of finals. I don’t want to curve this score too.

Obviously the possibility of doing this depends on the stability of the correlation between midterms, finals (and quizzes). Assuming that ..

Did I write more?