rabbit in viscera: when experts seek to mislead

Here's a pretty good example what happens when a respected scientist gets senile and his senility attracts a band of ne'er-do-well rogue/revisionist followers:

Illuminating, eh?

Last week, this Op-Ed piece showed up in The New York Times. To me, this piece is incredibly dangerous. It's author, E.D. Hirsch, Jr., has a string publications on education and literacy that might lead a casual reader of said article to believe that he, indeed, has some expertise in the field of educational assessment, particularly with regard to reading comprehension tests.

But the truth is that Hirsch can't lay claim to even a glimmer of an idea as to that about which he is speaking. Now, I'm reticent to call myself an expert in the field of reading comprehension educational assessments, too. Except, well, I kind of am.

I go to some pains to discuss my livelihood in only vagaries and roundabout twiddle-dances on this blog. Virtually every piece of paper I touch over the course of a day contains sensitive, secure material and I am pretty careful about what I share about what I do with the same public who reads about my feelings on activistic consumerism, my insistence on laissez-faire sexuality and my other quasi-political rants.

But here's what I can tell you: I've worked up one side and down the other of the assessment industry-- from essay-scoring to nitty-gritty test-question development to being a liaison between state departments of education and a test development company. And, as my background is duly credentialed with all the appropriate literature nerd street cred, most of my experience has been with reading comprehension tests. To my continued relief and joy, I can also tell you that my current position is not at all dependent on the regulations defined by the No Child Left Behind law, but my last job lived and died by that ungodly little turd of legislation, and with many of its repercussions, I am intimately familiar.

So, this is just to say, I know a little something about how a standardized reading test is made. Had I any balls at all, I'd get my supervisor's permission to send this post to the NYT, sign it with my official title and the name of my organization, and take Dr. Hirsch to task in a more public forum than this. But, for now, I tend to think it best I not attract too much professional traffic to the more typical fare found in these parts: the Brown Rabbit After-hours Game Hour. Just sayin'.

Nonetheless, I feel compelled to write Dr. Hirsch an open letter. Here goes.

Dear Dr. Hirsch:

Have you ever, for a day in your life, spent any time behind the scenes of a test development company? If I were to hazard a guess about your experience with actual state-level standardized tests, based on the several assumptions you make in your Op-Ed article, I'd be inclined to think perhaps you hadn't so much as encountered a testing industry professional in your life. But I'm not a gambling woman.

That said, you do state a few items as though they are fact when, in actuality, they are pure speculation on your part. Let's start with this first clump of assertions:

The problem is that the reading passages used in these tests are random. They are not aligned with explicit grade-by-grade content standards. Children are asked to read and then answer multiple-choice questions about such topics as taking a hike in the Appalachians even though they’ve never left the sidewalks of New York, nor studied the Appalachians in school.

OK. First of all, where on earth did you get the idea that reading passages are chosen at random? In all honesty, every year, when passage selection time rolls around again, I get a rather dreadful sensation in my solar plexus. Selecting passages for a reading test is easily the most difficult part of the entire test development cycle. To illustrate why I hate passage selection so, allow me to present you, Dr. Hirsch, with a pretend assignment. You shall soon see that passages are, indeed, picked quite carefully to align with state- and grade-specific content standards and to appeal to the largest assortment of demographic groups possible.

Your assignment is to choose 30 reading passage for possible inclusion in a Grade 6 test form. Ready? Go to the library. Go!

What? You need more specific parameters? Well, OK! At your service.

Your 30 passages should be divided roughly evenly between literary (fiction, poetry, literary non-fiction) and informational (instructional pamphlets, short biographies, newspaper or magazine articles of scientific or sociological interest that will not be obsolete for another ten years, business letters, etc.) texts. They should all be comprised of intact, authentic texts that can be excerpted from larger texts, but cannot be cropped and edited in any other way (funny thing-- authors tend to be sticklers for their original intent when it comes to granting copyright permissions.) They should be at least 250 words, but no longer than 950 words. Got all that?

Great! Because now you need to make sure that they're all of an appropriate reading level for 6th graders. There are several readability tools at your disposal via the internet, but none of them will agree with each other. You best bet is to average the Dale-Chall, Lexile and Flesch scores for each of your passages and cross your fingers that you'll wind up with a passage that's somewhere within 6th grade range. Oh, but make sure there are at least a handful of words in each passage that are 1 or 2 grade-levels above Grade 6 (consult your EDL Core Vocabularies book to make sure) because we'll need to write at least 3 vocabulary items (uh, sorry, Dr. Hirsch-- when I say "item," I mean "test question." I forget how out of touch with today's testing argot you are.) per passage.

While I'm on the subject of testable points, I should also warn you that I'll need you to be reading each passage with an eye to all the different types of questions we can ask based on these short excerpts. See, contrary to your statement about lack of alignment between reading test passages and state content standards, we test developers actually map out standard-specific testable points in each passage before we even begin to write items. Didn't know that, didja?

For a 250-word passage, we'll probably have to write 15 items. And for a 950-word passage, I'll be asking you to write about 25 to 30 items. Therefore, each passage should be rich in figurative language, inferred information, character development, cultural relevance and any number of other topics as defined in each state's content standards, benchmarks and indicators (please see each state's DOE website for specific content standard outlines). In other words, I need you to find 30 short passages that are so dense with meaning and language-level interest that each one is like a Dostoevsky novel. Oh, except that they should be at a 6th grade reading level. And of subject matter that a wide swath of twelve-year-olds will find engaging.

Intimidated yet?

Well, you shouldn't be. Not yet, anyway.

Dr. Hirsch, I'm sure I don't need to tell you that your state's population is diverse. It contains 6th graders who live on farms, and 6th graders who live in cities. It contains boys and girls. It contains people of African heritage, people of Asian heritage , Middle Easterner heritage, Eastern European heritage. It contains Native Americans, people from all over the Latino diaspora, and, doubtlessly, a bunch of white folks just like me. So, to that end, I'll need your passages to be split roughly evenly between the genders, in terms of both subject matter and authorship. I'll need you to see to it that the subjects of none of your passages favor one gender over another (i.e., nothing about sports, and nothing about the chemical difference between waterproof and non-waterproof mascara). Oh, and I need roughly 80% to be written by or about people of non-white minority populations.

As for your passage about the Appalachians? I'll keep that one. But I'll also need one that speaks to an urban experience. And another that regards some typically suburban activity (though, not about driving SUVS, as that would display a distinct socio-economic bias, and we can't have that!). Because, you see, long after you've finished collecting your 30 passages, and hundreds of items have been written across all of your passages, we then have to build an actual test form that isn't composed just for city kids or just for Latino kids or just for girls, but one that contains a wide variety of passages, some appealing to some kids and others appealing to other kids. This is how we minimize bias. Not by eliminating all passages that refer to rural areas, just because some students happen to live in cities. Are you beginning to see now? We're not maniacal literature slingers. We're really not.

So, now you think you're done, do you? You've gathered 30 passages, rich in testable points, reflecting the most vast differences in your populations of which you can conceive. You're on a roll! Oh, wait. You're not. Now, I'm going to share with you the most challenging part of passage selection. I need you to sit down and read through each of your passages with a proverbial fine-toothed comb. You're looking for any little thing that might, maybe, possibly, in the worst-case scenario, offend someone, somewhere, with the thinnest skin on the planet. Or maybe it doesn't even have to be offensive. It just needs to be capable of distracting a student to a degree that might negatively impact on his or her testing experience.

So. No suicide, no dangerous activities (which may or may not include such things as scuba diving, petting dogs, climbing trees, parachuting, cooking with fire, and/or riding bicycles), no divorce, no sex, no drugs, no alcohol, no swear-words, no advice-giving (yes, in case you were wondering, articles on why composting is a good idea or why condensed fluorescent lightbulbs save you money or how to write a good job application cover letter constitute "advice-giving"), no illness or hospitalizations, nothing that reflects aging in a negative light, no political or social unrest, no civil injustice (yeah, you try finding a half-way decent, engaging, juicy piece of literature, written by a non-white American that contains no instances of or references to civil injustice. Just try.), and no death whatsoever.

Basically, I'm looking for the best pieces of writing that you can find that will offend the least number of people possible. But guess what? Artistically sound, interesting, rich literature is designed to stir people up. And it's nearly impossible to find good chunks of text that truly reflect the diversity of any given testing population that don't somehow allude to the concerns of that diverse population. Writing reflects life and life ain't always pretty, but a standardized test is not the place to force students to address issues of the ugliness of the world around them. Or to challenge their belief systems (that's what classroom discussion is for). Standardized tests, by necessity, strand students on their own little proctored islands, without the benefit of classroom discussion of the aforementioned topics. Say what you may about what NCLB and standardized testing has done to the American classroom, but there isn't any way around the fact that students have to take the tests alone, without help, in order to ensure the statistical validity of their scores. Standardized tests aren't going away. The least we can do is attempt to minimize the frustration, alienation and distraction many students feel while undergoing said testing. And one of the ways we attempt to mitigate all those side effects is by presenting students with innocuous, yet engaging, reading passages.

But I am telling you: finding those passages is work. Hard work. And it's work I take seriously, both as a reasonably socially conscious person and as a lover of literature.

So, maybe now you see why passage selection is such a stressful time of year for test developers-- given all the constrictures within which we have to work, it's nearly impossible to fill the states' requirements, in terms of volume of passages, every year. Particularly when you have to repeat the process for all grade levels, kindergarten through Grade 8 and then again for high school exit exams.

Doubtlessly, there's an obvious question in your head, Dr. Hirsch. Am I personally offended by the notion that passages are selected at random? Well, sure, after all the sturm und drang through which I've put myself through several successive test development cycles, yeah, I'm offended. In calling the selection of passages "random," you're suggesting that I haven't been doing my job. And that my job is one that exists without adequate thoughtful application of my industry's best standards and practices. However, I do recognize that it is your ignorance that allows you to make such a statement. If anything, passage selection is the most carefully calculated step in the making of a test. Your assumption that it's anything but extremely meticulous, well, just goes to show us all how little you know of what you speak, my friend.

But allow me to put aside my raised hackles for just a moment to discuss Dr. Hirsch's proposed "solution" to the "randomness" of passages on standardized tests. Here's what he says:

Let’s imagine a different situation. Students now must take annual reading tests from third grade through eighth. If the reading passages on each test were culled from each grade’s specific curricular content in literature, science, history, geography and the arts, the tests would exhibit what researchers call “consequential validity” — meaning that the tests would actually help improve education. Test preparation would focus on the content of the tests, rather than continue the fruitless attempt to teach test taking.

A 1988 study indicated why this improvement in testing should be instituted. Experimenters separated seventh- and eighth-grade students into two groups — strong and weak readers as measured by standard reading tests. The students in each group were subdivided according to their baseball knowledge. Then they were all given a reading test with passages about baseball. Low-level readers with high baseball knowledge significantly outperformed strong readers with little background knowledge.

The experiment confirmed what language researchers have long maintained: the key to comprehension is familiarity with the relevant subject. For a student with a basic ability to decode print, a reading-comprehension test is not chiefly a test of formal techniques but a test of background knowledge.

In actuality, many states and even city educational systems already have tests in place of the nature you describe, Dr. Hirsch. They're called "subject matter tests." They are, indeed, knowledge-based tests that have a very different end in mind than do reading comprehension tests. In essence, knowledge-based tests assess how well a student has been taught specific facts, data, methods of literary interpretation, mathematical theorems, etc. In other words, they test how well teachers are doing their jobs. What they don't do is assess the skill-level of a particular student. Nor do they assess things like the readiness of a particular student to move onto the next grade level.

If we were to change the modus operandi of reading comprehension tests in the way you suggest, really, you would render them useless. The whole point of a reading comprehension test is to assess how well a student can understand and answer questions about an unfamiliar chunk of text he or she has just read-- and read cold. If we were to front-load our tests with handy little packets of well-canonized literature, the stuff widely taught in classrooms around the country, we'd then be testing how well students are able to regurgitate their classroom discussions about text they didn't even have to re-read to be able to answer questions thereupon. You would, in effect, be testing the teachers. And learning very little about the students. Even your own example about the baseball statistics points to this messy fact. You can't put an article containing a bunch of baseball statistics on a test precisely because it wouldn't necessarily elucidate which students were more skilled at understanding texts, but rather, which students understood the game of baseball. Moreover, you'd likely bore or annoy students who aren't at all interested in baseball, thus falsely suppressing their comprehension scores. It wouldn't tell us a damn thing about which student was flush with the skills necessary for understanding written language and which student was struggling with those skills.

In fact, populating standardized tests with old, familiar texts would actually completely undermine the psychometric validity of pretty much every reading comprehension test across the land. It would so skew the results of these tests in favor of the already higher-performing institutions (often, the ones already on the receiving end of the anti-Robin-Hood-ian NCLB law's monetary gifts) that the statistics generated would be entirely meaningless.

But then, Dr. Hirsch, you also said this weird thing:

This is because the schools have imagined that reading is merely a “skill” that can be transferred from one passage to another, and that reading scores can be raised by having young students endlessly practice strategies on trivial stories. Tragic amounts of time have been wasted that could have been devoted to enhancing knowledge and vocabulary, which would actually raise reading comprehension scores.

Hey, guess what? You may be right that reading itself-- with all its palimpsest of benefits-- might not be "merely a 'skill.'" But reading comprehension-- the gleaning of meaning through the action of reading-- is a cognitive skill. Or rather, a set of cognitive skills. Now, the one point on which I'll will agree with you is that countless classroom hours have been spent drilling students on whichever skills the state in which a student lives deems necessary. That skill-drilling is possibly a little wrong-headed. Lest anyone think I'm advocating that age-old educational peccadillo known as "teaching to the test" with anything I've said above, let me say that I firmly believe the best way to way to help a student learn how to understand written language is to give said student a lot of it. And to talk to him or her about it. Then have the student explain back to you what was understood about what he or she read. Pedagogical research aside, as best I can tell, the only way to internalize language and its methods is to constantly bombard yourself with it. And the good news is that a student with strong reading comprehension skills, ones acquired through the read-a-lot-and-talk-about-it method, will do well on the tests I've helped to develop, even if they haven't been coached in test-taking method. And the tests can also function as broader diagnostic tools for students in need of further skill development.

But Dr. Hirsch, you seem to be falling into the trap that NCLB has set for all of us. The trap that posits all American children hail from Lake Woebegone, I mean. You want all students to do well on reading tests because they will have been fed the passages in their classes before they sit down to take the test. But not every student can be above-average. That's just the nature of averages. So, a good test is not one that everyone can pass but one that displays an authentic bell curve, reflective of actual students' abilities. Get used to the idea, Dr. Hirsch. Bell curves, like standardized tests themselves, are not going away.

Now, I think my real beef with you, Dr. Hirsch, is actually personal. You have a slew of well-regarded publications about cultural literacy and language arts education. You're a big fancypants egghead who teaches at a big fancypants university. You've published your opinion in The New York Times. People are bound to take you seriously, despite the fact that you really are talking out your ass with regard to assessment development and the psychometric research that backs all the ingredients of any test, including its reading passages. Meanwhile, I, with all my extensive knowledge of assessment practices and the reasons behind those practices, wallow in bloggish obscurity (which, in the long run, is fine by me, by the way). At least a handful of conscientious NYT-reading citizens are bound to buy your argument. And those conscientious citizens will continue to believe that standardized tests are slung together without concern for statistical validity or demographic representation. And that, Dr. Hirsch, I can assure you, is far from the truth. Therefore, your argument ultimately does more damage to the public image of testing and to the thing with which state departments of education struggle year in and year out: student motivation. Thanks for that.

So, shame on you for badmouthing reading test specialists and the products of our toiling in such a public forum. I can't say that the industry to which I contribute is without its misguidance and occasional misdoings. And gods know NCLB has made a real snafu of things. But as an industry insider, I can avow that we are not rash. And we are not random. And we are working our butts off to make the fairest, most valid educational measuring tools possible. With a much more extensive pool of research, and several on-staff psychometricians, to back us up.

Who, again, is backing you up, Dr. Hirsch? Though you did mention one 1988 study, you didn't actually bother to cite one solitary source. Neither have I, really, but I got a bookshelf full of 'em if you're interested. And also, were I to tack my title and professional affiliation to this post (which I will not do), I can assure you that my authority on this matter would trump yours any day of the week.

Thank you, sir, and good day.
Marjorie

2 comments:

Jenna McWilliams said...: Bravo! If everyone who opposed the ideas of Hirsch and his ilk could write like you, we'd be well on our way.

I recently wrote about Hirsch's op-ed on my own blog, at http://jennamcwilliams.blogspot.com/2009/03/couple-three-things-about-standardized.html. I'd love for you to take a look / comment / suggest!; March 31, 2009 at 6:39 AM
brownrabbit said...: Thanks, Jenna--

And, indeed, I found your commentary elucidating as well!; March 31, 2009 at 9:15 AM

Monday, March 30, 2009

when experts seek to mislead

2 comments: