Wednesday, May 19, 2021

SATs aren't the problem, you just don't like the reality they reveal

From You Aren't Actually Mad at the SATs: you're mad at what they reveal by Freddie deBoer.  

A number of years ago Nate Silver published an excellent book, The Signal and the Noise: Why So Many Predictions Fail - But Some Don't.  The signal is the meaningful information in data and noise randomness which can creep in and dominate.  

If you are interested in an individual's health record, then collecting their temperature first thing in the morning, at midday and in the evening, all at the same time of day, all with the same thermometer or measuring device, by the same person, the thermometer being constantly calibrated to an independent and reliable base, then you are likely to get a pretty clear signal (reliable and meaningful data reflecting underlying health condition) over time.

On the other hand, if it is a different person taking the temperature at different times in the day with different measurement devices (mercury thermometer, digital device, back of hand to forehead) which have not been calibrated to one another or to an independent third party reliable base, then your data set is likely to contain a lot of noise.  For example, it is easily conceivable that the mercury thermometer and the digital device may differ by as much as a degree even if measured at the same time.

All systems generate noise.  The trick is to, as Silver alludes to, distinguish the signal from the noise.

This is brought to mind by deBoer's essay.

The University of California system is getting rid of its SAT/ACT requirement. More will follow.

There’s a lot to say. First, we must distinguish between two types of tests, or really two types of testing. When people say “standardized tests,” they think of the SAT, but they also think of state-mandated exams (usually bought, at great taxpayer expense, from Pearson and other for-profit companies) that are designed to serve as assessments of public K-12 schools, of aggregates and averages of students. The SAT, ACT, GRE, GMAT, LSAT, MCAT, and similar tests are oriented towards individual ability or aptitude; they exist to show prerequisite skills to admissions officers. (And, in one of the most essential purposes of college admissions, to employers, who are restricted in the types of testing they can perform thanks to Griggs v Duke Power Co.) Sure, sometimes researchers will use SAT data to reflect on, for example, the fact that there’s no underlying educational justification for higher graduation rates1, but SATs are really about the individual. State K-12 testing is about cities and districts, and exists to provide (typically dubious) justification for changes to education policy2. SATs and similar help admissions officers sort students for spots in undergraduate and graduate programs. This post is about those predictive entrance tests like the SAT.

Griggs v. Duke Power Co. was the 1971 Supreme Court decision which precluded companies from using IQ tests when recruiting personnel out of concern that it resulted in racial discrimination in effect even if not in intent.  This despite IQ tests being one of the most reliably predictable measures of future performance, tested and validated extensively over decades in many countries.  

This is an example of deliberately sabotaging the utility of a useful signal (IQ tests) in the name of some other goal (the desire to not acknowledge variances in individual and group performance.

DeBoer's article is primarily focused on the evidentiary base for knowing that IQ tests are reliable predictors for a broad range of outcomes.  

Why do people have such revulsion towards the SATS? Because they produce unequal results; some students perform better than others on the test. Of course, this is the very function of testing, to reveal underlying inequality, in this case underlying academic aptitude or ability. In fact, the more valid a test is, the more powerful it is, the more inequality it reveals, as it becomes capable of demonstrating finer and finer-grained distinctions between test takers. Most people are bothered by this tendency to reveal inequality because of troublesome and persistent group differences. Traditionally the gender education gap was cited as a source of concern, but because the gender positions have flipped (outside of a few stubborn fields), most progressive people don’t care much3. The racial achievement gap, however, is still the singular obsession of the American education politics, policy, and research world, and despite periodic predictions that it will soon close, it remains stubbornly real. And that’s ultimately where the anti-SAT/ACT animus comes from: Black and Hispanic students significantly underperform white and Asian, and this is vexing for obvious reasons.

The racial achievement/performance gap is a curious thing even in the context of an American political discourse that seems to get more bizarre by the day. That the gap exists is, on balance, not controversial. Gaps in performance are observed on essentially every measured academic metric, though the size of the effects vary from context to context, and the general distribution is Asian American students at the top, white students next, then Hispanic, then Black. The Black-white gap in particular has shrunk from the era of (explicitly) segregated schools but progress has not been consistent or linear. Most people in academia and politics admit it exists: prominent Black politicians like Barack Obama and Kamala Harris reference it, every major think tank and foundation operating in the educational space identifies it as a major priority, and the NAACP used to address if often, though their Education and Education Strategy pages have recently disappeared so it’s hard to know where they stand now. These things are faddish but once upon a time every other dissertation written by someone getting a PhD in Education was about the gap. We can observe it even outside of reference to controversial tests, such as noting that the white high school graduation rate is 10% higher than that for Black students. The achievement gap is a thing.

All true.   And with unintended consequences.  When companies could no longer select based on IQ, they used a proxy for IQ.  That proxy being which university you attended.  

SATs are a correlated proxy for IQ and universities were still allowed to use SATs to select their student population.  You want to recruit people with an IQ at least two standard deviations above the mean?  You recruit at the most competitive universities - The Ivy League and the most selective of state universities.  

This is less efficient than a direct IQ test.  The entering class of a U of Penn, Harvard, Stanford will have its own distribution.  Not everyone will have an IQ of 130.  Legacy admits, athletes, affirmative action admits will likely have lower IQs than the average of the whole student body.  Correspondingly, you will miss out on the 130 IQ students in large state schools which have a lower overall average but still a distribution with many high IQ students.

For legal and social policy ends, we diluted the signal when Griggs v. Duke Power Co. became the law.  Back to deBoer.

Here is the essence of it: hierarchies of relative academic performance are remarkably stable throughout life, due to differences in inherent or intrinsic academic ability of whatever origin, and the SATs and similar mechanisms reveal those differences in a way that liberal America is increasingly unable to accept. This is the source of all of this angst, not the technical details of whether a test is fair or valid or just, but a liberal intelligentsia that is incapable of honestly confronting the fact that different human beings have fundamentally different intrinsic abilities. I believe in political equality, social equality, equality of rights, equality of dignity, equality of protection under the law. But the notion that all people are equally talented, in academics or anything else, is an absurdity, and as much as people will rush to deny intrinsic difference, I suspect that pretty much everybody knows that they are real. When you were a child you casually assumed that some of your classmates were naturally better at school than others, and you did because it was true.

[snip]

Trying to fight educational inequality by getting rid of the SAT is like trying to fight climate change by getting rid of thermometers. It is as indicative of a heads-in-the-sand attitude as I can possibly imagine. 

The chattering class are trying to hide something which is true by obscuring the signal.  There are real variances, not just in IQ but in behavior, motivation, values, cultural conformity, etc.   All of them are pertinent variances.  The more you hide those variances, the more difficult you make it for companies and individuals to flourish and prosper.  

It is a great irony that those most insistent on removing evidence of variance are the very people who are the most vocal proponents of diversity.  

This all relates to another piece from this past week, Bias Is a Big Problem. But So Is ‘Noise.’ by Daniel Kahneman.  He distinguishes within a signal between accurate and biased signals.  

The word “bias” commonly appears in conversations about mistaken judgments and unfortunate decisions. We use it when there is discrimination, for instance against women or in favor of Ivy League graduates. But the meaning of the word is broader: A bias is any predictable error that inclines your judgment in a particular direction. For instance, we speak of bias when forecasts of sales are consistently optimistic or investment decisions overly cautious.

You really want an accurate forecast of sales.  However, even a biased forecast is useful so long as it is is consistently biased.  If the sales forecast is always 10% too high, you merely apply a 10% deflater to the biased forecast.  Signals, whether biased or not, can be useful.  Noise is a different matter.

Society has devoted a lot of attention to the problem of bias — and rightly so. But when it comes to mistaken judgments and unfortunate decisions, there is another type of error that attracts far less attention: noise.

While bias is important, noise is equally important.  

Although it is often ignored, noise is a large source of malfunction in society. In a 1981 study, for example, 208 federal judges were asked to determine the appropriate sentences for the same 16 cases. The cases were described by the characteristics of the offense (robbery or fraud, violent or not) and of the defendant (young or old, repeat or first-time offender, accomplice or principal). You might have expected judges to agree closely about such vignettes, which were stripped of distracting details and contained only relevant information.

But the judges did not agree. The average difference between the sentences that two randomly chosen judges gave for the same crime was more than 3.5 years. Considering that the mean sentence was seven years, that was a disconcerting amount of noise.

Noise in real courtrooms is surely only worse, as actual cases are more complex and difficult to judge than stylized vignettes. It is hard to escape the conclusion that sentencing is in part a lottery, because the punishment can vary by many years depending on which judge is assigned to the case and on the judge’s state of mind on that day. The judicial system is unacceptably noisy.

Consider another noisy system, this time in the private sector. In 2015, we conducted a study of underwriters in a large insurance company. Forty-eight underwriters were shown realistic summaries of risks to which they assigned premiums, just as they did in their jobs.

How much of a difference would you expect to find between the premium values that two competent underwriters assigned to the same risk? Executives in the insurance company said they expected about a 10 percent difference. But the typical difference we found between two underwriters was an astonishing 55 percent of their average premium — more than five times as large as the executives had expected.

Many other studies demonstrate noise in professional judgments. Radiologists disagree on their readings of images and cardiologists on their surgery decisions. Forecasts of economic outcomes are notoriously noisy. Sometimes fingerprint experts disagree about whether there is a “match.” Wherever there is judgment, there is noise — and more of it than you think.

Prejudice, signal (biased or not) and noise are important issues.  However, we are in an era when there are influential parties who, in a conviction of widespread prejudice, are desperately trying to reduce signal and inflate noise.

Systemically, this will reduce prosperity, efficiency and effectiveness.  It is a gross destruction of the commonweal.  All in the name of social justice and stamping out perceived (versus real) prejudice.

Already alarm signals are being sounded at universities.  If you remove the most accurate predictor of performance (IQ tests in the form of SATs), greater reliance will be place on other measures which are dramatically less predictive of performance and with their own forms of bias.  

Most expect that personal essays will carry more of the selection burden.  Which applicants are likely to have the writing and cultural versatility to master the effective essay?  Children of the upper middle class.

In order to reduce the possibility of racial prejudice due to group differences in IQ, social justice advocates are likely laying the foundation for a more class-based selectivity.

The fundamental issue, though, is that a modern complex economy and society depends on accurate information, clear signals.  The more you hide the signal, the more you undermine trust and undermine prosperity.  


UPDATE:  Here is another example of suppressing signal, this time in terms of high school graduation rates.  Graduation rates are a useful predictive metric combining both IQ and desirable behavioral traits such as perseverance.  From Why Graduation Rates Are Rising But Student Achievement Is Not by Natalie Wexler.

In the desire to measure school performance, accrediting agencies and government programs began measuring graduation rates.  High schools which could raise their graduation rates benefitted financially and in reputation.  So schools began instituting programs, as described by Wexler, for credit recovery and GREs, etc. which allowed the school to report higher graduation rates without actually increasing the graduation rate.  High school graduation is a powerful signal of base competence and for ideological reasons and inadvertent measurement reasons, we are once again reducing the power of the signal to the disadvantage of real graduates.  


No comments:

Post a Comment