Gas station without pumps

2014 May 3

Spread on SAT2 raw scores

Filed under: Uncategorized — gasstationwithoutpumps @ 12:15
Tags: , , , ,
On one of the mailing lists I read, someone asked
Can anyone explain to me how a raw score of 75 and a raw score of 59 are both 800’s on the scale for the physics test?  That seems a huge spread. I see similar stuff on other tests, but nothing spread quite this far.
One can find the distribution of scale scores as percentiles on the College Board web site at http://media.collegeboard.com/digitalServices/pdf/research/SAT-Subject-Tests-Percentile-Ranks-2013.pdf, but finding information about raw scores is harder.  The College Board says
The raw score is converted to the College Board 200- to 800-point scaled score by a statistical process called equating. Equating adjusts for slight differences in difficulty between test editions and ensures that:

  • A student’s score does not depend on the specific test edition she took.
  • A student’s score does not depend on how well others did on the same edition of the test.

[https://professionals.collegeboard.com/testing/sat-subject/scores/scoring]

I’ve found raw-score-to-scaled-score conversions for one version of the SAT test at http://media.collegeboard.com/digitalServices/pdf/research/SAT-RAW-Score-to-Scaled-Score-Ranges-2013.pdf, but I’ve not found them for SAT 2 tests, and I don’t know whether the person had access to more data than I’ve been able to find on the College Board site, or just had a couple of data points for students who both got a standard score of 800.
The scale score on the SAT subject test, like other scale scores, is intended to have the same meaning from year to year, despite differences in the underlying test questions.  Initially, tests are written so that questions span a range of difficulty, with some easy questions and some hard ones.Depending on the purpose of the test, the questions may cluster around a particular level of difficulty—if the test is intended as a pass/no-pass test, the questions hover around the pass threshold. Think of a driving exam, where the questions are intended to separate those who can drive safely from those who can’t.  There is no point to asking esoteric questions that even good drivers can’t answer, nor trivial ones that even bad drivers do well at.

When the point is to spread students out without a single boundary (as for college admissions), there needs to be a wider diversity of difficulty of questions.  You need some that are so difficult that few get them, and some that are so easy that few miss them, and everything in between. There need to be several difficult questions, to compensate for students randomly guessing correctly on one difficult question.
Because the scale scores are supposed to mean the same from year to year, and the scales are arbitrarily capped at 200 and 800, drift in student education or who takes the exams can make students pile up at one end of the scale or the other, even if initially the end scores were very rare.  That has happened in math (level one has few students with 800, level 2 has 9% of students getting 800), physics (8% of students get an 800), and most of the language exams (when mostly native speakers take the exam—for Chinese, lots of students get an 800).  Student selection seems to play more of a role than education (people who expect to do poorly don’t pay to take the test), so the bottom end of the scale is rarely used, while students pile up at the top end.
A more sensible system would not cap the scale scores (Lexiles, for example, are not capped) so that drifts in student population do not pile up people at one point on the scale.  Many of the underlying SAT subject tests have the ability to measure more at the top end, but the political or marketing (not scientific) decision to cap scores at 800 limit the utility of the tests for making distinctions.
Note: the ACT has similar artificial score limitations and suffers the same sort of unnecessary ceiling, though it is not clear whether the underlying questionshave the ability to make distinctions at the top end—when you get down to differences of one or two questions right, you are looking at noise, not signal.

 

4 Comments »

  1. “Many of the underlying SAT subject tests have the ability to measure more at the top end, but the political or marketing (not scientific) decision to cap scores at 800 limit the utility of the tests for making distinctions.”

    Do you ahve a reason why you believe this to be true? The mere fact that there are more questions to get right (and, say, that people who took the test and missed 10 questions v 5 aren’t necessarily different from each other because of randomness of which questions were chosen on a particular test, and certainly on different tests). I think of this as the Slumdog Millionaire phenomenon where performance on some set of questions isn’t reflective of the person’s general knowledge because of randomness of the specific questions.

    Comment by zb — 2014 May 3 @ 16:29 | Reply

    • I believe, though I’ve not been able to find definitive data, that the raw points to get the maximum scale score on the physics SAT 2 test is well below 100%—that is, there is a substantial part of the range of the raw scores that is being ignored in producing the scale score. Note that this is not true of the pSAT, where missing one question can drop you a long way in the scale, or even the SAT reasoning test, which has essentially no extra questions at the top end.

      Of course, any test has validity problems, in that only a tiny fraction of the material can be asked about, and luck about what gets asked may play as big a role as knowledge or ability. But if a large fraction of the students are getting 800 on a test and several different raw scores all convert to that same 800 scale score, then information is being lost by lumping all those raw scores together.

      Comment by gasstationwithoutpumps — 2014 May 3 @ 18:36 | Reply

  2. “Can anyone explain to me how a raw score of 75 and a raw score of 59 are both 800’s on the scale for the physics test? That seems a huge spread. I see similar stuff on other tests, but nothing spread quite this far.”

    “I don’t know whether the person had access to more data than I’ve been able to find on the College Board site, or just had a couple of data points for students who both got a standard score of 800.”

    My son just took the Physics SAT2, so those numbers looked familiar. That person was most likely looking at the scaled score conversion table in the “Official Study Guide for All SAT Subject Tests,” published by the College Board. The chart in the book shows a score of 800 for raw scores of 59-75. But then it starts dropping quickly…a raw score of 58=790, 56-57=780, 55=770, and so on.

    They do include a disclaimer in the book that scaled scores are adjusted depending on the difficulty of each edition of the test, so “your scores are likely to differ somewhat from the scores you obtain on the tests in this book.”

    Comment by Linda — 2014 May 14 @ 10:26 | Reply

    • OK. It was the raw scores for a particular practice test issued by College Board, which makes sense to me.

      Comment by gasstationwithoutpumps — 2014 May 14 @ 12:04 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: