Monday, February 5, 2018

Oxford dons at war with signal strength and signal clarity

I kept putting off this post because I don't have time to clarify my thoughts but I keep encountering pertinent articles that seem to call (or signal) for a response. If I don't capture it now, to paraphrase Omar Khayyám
The thinking mind quits; and, having quit,
Moves on: nor all thy Piety nor Wit
Shall lure it back to complete half a Line,
Nor all thy Tears recall a Word of it.
The triggering article was Oxford University gives women more time to pass exams by Tony Diver of the UK's Daily Telegraph. This quite made a splash in social media, especially on Twitter. Those opposed to social justice feminism were mocking radical victimhood feminists for demanding special treatment by seeking more time on exams in order to get better scores. And indeed, "I am woman, hear me roar" in conjunction with "Slow down, I need more time" certainly is an invitation for mockery.

But the headline is a little misleading.
Students taking maths and computer science examinations in the summer of 2017 were given an extra 15 minutes to complete their papers, after dons ruled that "female candidates might be more likely to be adversely affected by time pressure". There was no change to the length or difficulty of the questions.

It was the first time such steps had been taken. In previous years, the percentage of male students awarded first class degrees was double that of women and in 2016 the board of examiners suggested that the department make changes to improve women's grades.
To be clear, all students were being given an extra fifteen minutes on exams. And while the intention was to narrow the gap between men and women, both benefitted from the extra time, though women to a greater extent than men.

While it is fun to mock and while I disagree with designing processes to ensure equal social outcomes regardless of actual performance, there are a couple of very interesting issues in here which the articles do not address.

For most of us, an exam on academic content is essentially a test intended to reflect mastery of that content. But the Oxford experiment introduces an intriguing issue. Why have a time limit at all? What is the optimal time allowance for achieving the highest scores for the most number of people? If you have 1,000 people taking the test, there is presumably some upward rising curve of achieved score to amount of time. But presumably that upward rising curve eventually plateaus. Under those conditions, all you have to do is set the time limit at the point where extra time makes no further difference. Potentially, there might even be a decline in the curve past a certain amount of time, as people begin to second-guess their answers.

That's reasonably interesting and suggests that courses in disciplines ought to each have some degree of experimentation to find the optimum time allowance. That is expensive and operationally challenging but it certainly makes sense and would be better than arbitrarily setting time limits on professorial estimation. I wonder what those would look like for different courses?

But what happens if the 1,000 individuals each have their own unique curves. Some people work better under pressure than others. Some people might be more prone to incorrectly second-guessing themselves. Some just process information more slowly. All sorts of reasons for differentials in there. When you set a fixed time limit, you are essentially a priori, benefiting some intellectual styles over others. Why should some students have their unique styles handicapped against others?

But that's not all. What if personal unique curves also vary by SES class, or race, or gender, or religion, or cultural origin, or national origin, or personality profile, or educational background? In Oxford's case, by allowing more time in order for women to score better, perhaps they are reducing the scores of students from cultures with a high degree of self-doubt, or people with a high degree of neuroticism. Again, the university is preferencing one group over another. That does not seem morally correct.

There is another completely different aspect to this as well. Test scores are supposed to be some sort of signal about knowledge domain mastery. You have studied this subject for 16 weeks and this score indicates the degree to which you have mastered the topic. But what happens if tests, as traditionally administered don't reflect that mastery? That is the implication of research by Richard Arum and Josipa Roksa who found that nearly half of university students "had no significant gains in learning." If half are experiencing no knowledge gain and yet virtually all are being scored at A and B levels of achievement, perhaps the tests are not actually signaling mastery.

Instead of measuring performance at the end of the semester where students with good swotting skills and short term memory capabilities score well, perhaps students should be administered a pop exam sometime in the following semester to capture what they really learned.

But maybe society places a premium on short term memory, on quick wittedness and capacity to load a lot in a brief time frame in order to regurgitate such knowledge simplistically under tight deadlines. Perhaps tests aren't really measuring knowledge acquisition and perhaps that isn't a valued signal. Perhaps what determines longterm success is indeed quick wittedness, short term memory, capacity to study assiduously and capacity to work against the clock.

Given Arum and Roksa's findings, perhaps it is the latter signal which is more important than the signal for knowledge acquisition. If that is the case, then Oxford, in seeking to equalize the scores of men and women, is inadvertently undermining the signal to employers. If employers are seeking quick witted diligent and pressure handling students and that is no longer what the test results signal, then the Oxford brand for both men and women will decline.

This mirrors the issues universities have had with affirmative action. Once employers were no longer able to use IQ tests to screen candidate employees (Grigg Vs. Duke Power Company), they switched to using universities as their filter because universities are still allowed to do so (ACT and SAT are both proxies for IQ). If I can't screen for a candidate for an IQ of 130, then instead I will recruit only at universities whose selection criteria deliver a student body with an average IQ of 130. But if universities start admitting students on criteria other than IQ, they no longer are able to signal that their student body is reliably exceptional.

So mocking Oxford's intentions aside, it sets in train a whole range of intriguing issues for which I have no good answer. All I can do is recognize the complexity of what they are dealing with and muster some thoughts.

I did not intend to post them because of the wordy complexity.

Then I came across Easy-pass policy fails students by Joanne Jacobs. Hers has nothing to do with Oxford University but it does have to do with the same issues of achievement, capability and signaling. And letting social justice goals supersede actual measurement of real performance.
In last week’s post on “grading floors,” Memphis teachers debated whether giving minimum grades for minimal achievement motivates failing students — or misleads them and their parents.

Emily Langhorne taught in affluent Fairfax County, Virginia before joining the Progressive Policy Institute, she writes on The 74. She became complicit in lowering expectations for students’ achievement and work habits.

District policies discourage teachers from setting “hard” deadlines or “giving a student less than 50 percent on an assignment (regardless of the quality of work or level of completion),” Langhorne writes. Teachers are encouraged to “allow retakes on all major assignments if a student earns less than an 80.”
Not only do these policies create extra work for already overworked teachers, they also promote an attitude of low expectations that does a disservice to our students in the long run. They teach students that deadlines aren’t important, that you can receive half the credit for none of the work, that achievement is detached from practice, and that you can always bank on a second chance.
In theory, students are graded for “ultimate mastery of skills or content knowledge,” she writes. But they’re not. Thanks to “quality points,” a student who earns an A in the first quarter and fails the next three quarters will pass with a D.
Districts across the country have begun to impose similar policies on teachers.
. . . Teachers know it’s unethical, and they know that the students will suffer the consequences when they leave high school misinformed about their abilities and unprepared for college and the workforce.
Only 12 states require graduation exam, writes Langhorne. “We’re afraid that our kids will fail, so, instead, we fail them by sending them off to college and the workforce, knowing that they’re underprepared.”
High schools are desperate to show academic progress and to show high graduation rates. In order to achieve that, many (most?) are achieving those goals by eviscerating objective performance measures. Everyone passes and the signal of relative capacity disappears.

Today there is Toward Better Signals by economist Robin Hanson. Again, not Oxford but the same issues.
In simple signaling models, people tend to do too much of the activities they use to signal. This suggests that a better world is one that taxes or limits such activities. Say by taxing or limiting school, hospitals, or sporting contests. However, this is hard to arrange because signaling via political systems tends to create the opposite: subsidies and minimum required levels of such widely admired activities. (Though socializing such activities under limited government budgets is often effective.) Also, if we put most all of our life energy into signaling, then limits or taxes on just signaling activities will mainly result in us diverting our efforts to other signals.

If some signaling activities have larger positive externalities, then it seems an obvious win to use taxes, subsidies, etc. to divert our efforts into those activities. This is plausibly why we try to praise people more for showing off via charity, innovation, or whistleblowing. Similarly, we tend to criticize activities like war and other violence with large negative externalities. We should continue to do these things, and also look for other such activities worthy of extra praise or criticism.

However, on reflection I think the biggest problem with signals today is the quality of our audience. When the audience that we want to impress knows little about how our visible actions connect to larger consequences, then we also need not attend much to such connections. For example, to show an audience that we care enough about someone via helping them to get medicine, we need only push the sort of medicine that our audience thinks is effective. Similarly for using charity to convince an audience we care about the poor, politics to convince an audience we care about our nation, or using creative activities to convince an audience we promote innovation.
He is packing a lot of cognitive implications into some less than clear words but he is dealing with the issue of the quality of signal and how what we do from a social policy perspective affects that quality of signal.

Competition, transparency and freedom of speech (communication) enable clear quality signaling. Such clarity of signaling runs counter to the postmodernist aspirations of equality of outcome and unity of identity. Quality signaling is at war with social justice. More glibly, social justice is at war with reality.

No comments:

Post a Comment