Published as a ZNet Commentary, August 15, 2002
Have you ever noticed how expensive restaurants go out of their way to fill the air of their bathrooms with the refreshing scent of a pine forest after a gentle rain? Hoping to cover up the smells that would otherwise predominate in such an environment, the keepers of luxury lavatories bombard their patrons with diversionary odors, so as to make the dining experience more pleasant. Frankly, I’ve always perceived such efforts as more than a little inadequate. Crap, after all, even on a pinecone is still crap, and although you can “pretty up” a pig by slapping a dress on it, in the end it’s still a pig.
Such is a lesson we would do well to remember in the wake of the recent announcement that the Educational Testing Service is going to revamp the SAT, ostensibly to make it more fair and relevant for a 21st century educational system. Despite their insistence that the new SAT will better predict student ability while reducing unfairness by eliminating culture-bound items like analogies, the announced changes actually overlook the largest problems with standardized tests.
Although eliminating analogies is admirable since these are often biased against those from non-white, non-upper-middle-class backgrounds — what with questions involving words like regatta — the problems with the SAT were always deeper than that. In fact, whatever cultural bias the ETS has eliminated with the ban on analogies will likely be re-triggered with the addition of a writing section, whose graders will no doubt emphasize stylistically Standard English, marking students down whose writing style employs idioms or word patterns more common to communities of color. Poetic license will have no place, one suspects, on the SAT writing test.
Though cultural bias has been observed in testing for years, the bigger issue is that supporters of the SAT presuppose that administering a standardized test to profoundly unstandardized students, from unstandardized schools, and then using results on that test to determine college placement can ever be fair. The fact is, even if such biased items are removed from the SAT, the unequal educational experience of the students taking the test — especially in terms of class and race — all but guarantees a persistent scoring gap between whites on the one hand, and blacks, American Indians or Latinos on the other.
Furthermore, the adding of Algebra II to the test can only cause alarm for those concerned about racial score gaps; after all, tracking in schools is so pernicious that blacks, even when they score at the top of 8th grade achievement test distributions, are less likely than whites to be placed in upper-level math courses in high school, and far more likely to be stuck in remedial classes irrespective of ability. As such, many won’t get around to Algebra II by the time the SAT is taken. But even tracking isn’t the clincher that makes the SAT inherently problematic. The two biggest issues are of a different nature altogether and incapable of being fixed with piecemeal reforms.
The first is what Stanford Psychologist, Claude Steele has called “stereotype threat.” As Steele and his colleagues have demonstrated in a number of ingenious experiments, black students take standardized tests under a cloud of group suspicion that hinders performance: suspicion by the broader culture that they are less intelligent and capable than others.
Here’s how they explain it: According to a wealth of research, black students are well aware of the negative stereotypes held about them by members of the larger society. As such, when blacks who value educational achievement take a standardized test and expect the results to be used to indicate cognitive ability, the fear of confirming the stereotype in the eyes of others harms their performance. These students may second-guess themselves on easy questions, rush through the test so as to seem more confident than they truly are, or alternately take too much time, trying desperately not to make mistakes. The self-doubt engendered by the racist beliefs of the culture is added to the general anxiety that all test-takers feel, with the result that students who are stigmatized by negative stereotypes suffer a unique disadvantage.
To demonstrate that it is stereotype threat — and not differences in ability or preparation per se — that explains racial gaps on standardized exams, Steele devised experiments to test the theory, in which generally strong black and white students were randomly split into two mixed groups and given the same sample questions from a real standardized test. In the first group, students were given the impression that the test would be graded and would indicate their mastery of concepts critical to success at the next level of schooling: in other words, the perceived stakes were high. In that group, whites outscored black students dramatically. In the second group, students were given the same questions but told that the test was not to be scored, nor would it indicate their ability: in other words, the stakes were low. In that group, there was no real difference between whites and blacks.
Bottom line: when the black students fear that their performance will be used to indicate ability, the fear of confirming often-held negative stereotypes generates anxiety and drives down scores; but when the same questions are given in a low-stakes setting, the stereotype threat is lifted and they perform as well as whites. The same stereotype threat has also been shown to drive down the scores of young women relative to young men on math tests, and white students when paired with Asians (since the stereotype is that Asians are better at math, and thus whites fear confirming this theory when placed in an experimental setting against Asian students).
In other words, so long as racist beliefs about black ability are common, those stigmatized by these beliefs will underperform as a function of the anxiety generated by the stereotype itself. Certainly there is nothing ETS can do to the structure of the test that can alleviate this problem.
And finally, racial gaps are a function of the way that tests like the SAT are developed. Indeed, the gaps are all but built-in. As anyone who has taken the SAT or a similar test remembers, there is an experimental section on the exam (either an extra verbal or extra math) containing questions that are not counted toward a student’s score. The section exists so as to “pre-test” questions for use on future tests. But as ETS concedes, questions chosen for future use must produce, in the pre-test phase, similar gaps between test-takers as existed in the overall test taken at that time. In other words, questions are rarely if ever selected for future use if students who received lower scores overall answer that question correctly as often as (or more often than) those who scored higher overall.
The racial implications of such a policy are clear. Because blacks, Latinos, and American Indians tend to score lower on these exams than whites and Asians, any question in the pre-test phase that blacks answer correctly as often as (or more often than) whites or Asians would be virtually guaranteed never to appear on an actual exam! In practice, questions answered correctly by blacks more than whites have been routinely excluded from future use on the SAT. Although questions that whites answer correctly 30 percent more often than blacks are allowed to remain, questions answered correctly even 7 percent more often by blacks than whites have been thrown out.
Although the rationale for this practice is not overtly racist — the testing company does not intentionally seek to maintain lower scores for blacks — the thinking has a racist impact. Essentially, the company’s position is that for any question to have predictive validity (and what is called biserial correlation with the test as a whole) it should be answered correctly or incorrectly in proportion to the overall number of correct or incorrect answers given by test-takers. But since general scores exhibit a racial gap, such logic results in the virtual guarantee of maintaining that gap. Ironically, if certain test questions were made less culturally biased, so that the racial gap shrunk or disappeared in the pre-test phase, those questions would likely be thrown out, simply because — being less culturally biased — they failed to replicate the racial gaps produced by the rest of the exam.
Interestingly, as testing expert Jay Rosner has demonstrated, the makers of the SAT could reduce the racial gap while still maintaining the same level of overall test difficulty by choosing questions that, although equally tough, produce less differentiation between white and black test-takers. That instead they maintain large differences by way of the questions they choose, and that reforms of this nature are not being considered, indicates how unconcerned ETS truly is about test fairness.
Instead of trying to pretty up this pig, persons concerned about educational equity should call for colleges to either eliminate use of the SAT in admissions decisions, or at least massively downplay its importance, given its irrelevance in predicting actual academic ability. SAT gaps of as many as 300 points between two students (or groups of students) can be completely insignificant in terms of indicating ability differences, and gaps of 125 points between students are considered random by the test-makers themselves, and say nothing about the different abilities of the students in question.
That SAT scores have little to do with one’s ability is borne out by a number of studies and even data provided by the test-makers themselves, which indicate that only sixteen percent (at most) of the difference between students in terms of freshman grades can be explained by results on the SAT. Further, the correlation between SAT scores and overall four-year college grades or graduation rates, has been so low as to be essentially nonexistent, explaining no more than three percent of the difference between any two students.
If ETS wants to promote fairness — and they insist that they are committed to changing the unequal educational system that helps produce scoring gaps — they must first stop promoting a test battery that replicates and reinforces that inequity. If they wish to provide tests purely for the purpose of gauging how much is being taught and learned in K-12 schools, so be it. But so long as they release test scores prior to college admission, knowing that such scores will be used to dole out opportunities that themselves result in still more opportunities upon graduation, ETS can only be seen as complicit in the maintenance of racial and economic stratification. They are not reformers, but merely gatekeepers for the status quo. And that smells the same, no matter how one tries to cover it up.