In my post Numbers vs. symbols, I summarized and critiqued the preprint How numbers help students solve physics problems, by Eugene Torigoe. In this post I’ll look at three published papers by Torigoe, from his site https://sites.google.com/site/etorigoe/research.
E. Torigoe and G. Gladding, “Same to Us, Different to Them: Numeric Computation Versus Symbolic Representation.” Physics Education Research Conference, AIP Conference Proceedings, 2006. [PDF]
This paper used a large sample 894 students given different versions of a final exam. There were two questions that represented identical problems, but one form of the exam had the questions in symbolic (algebraic) form and the other had the questions in numeric form. (It would have been better for each form of the exam to have one numeric and one symbolic question—fairer to the students and less likely to pick up an artifact from who got which form of the exam.)
The difference in performance between the numeric and the symbolic forms of the question were enormous, supporting Torigoe’s conclusion that many students who can correctly formulate and solve a problem given explicit numbers still have difficulty doing the same thing symbolically. (The conclusion could have been stronger if each student had had one symbolic and one numeric problem to solve.)
Because there were only 2 paired questions, it was not possible in this study to analyze the reasons for difference in performance, though Torigoe mentions several possibilities.
E. Torigoe and G. Gladding, “Symbols: Weapons of Math Destruction.” Physics Education Research Conference, AIP Conference Proceedings, 2007. [PDF]
This study is a followup on the previous one, now with 10 paired questions and 765 students. Unfortunately, the same flawed protocol of making one version of the test all symbolic and the other all numeric seems to have been followed. The unfairness to the students is larger this time (since the numeric questions are already believed to be easier).
With 10 questions, it was possible to exam in more detail what sort of mathematical question caused the biggest difficulty. Torigoe also did a more detailed analysis of who was having difficulty with the symbolic versions. It turns out that the top of the class was having little extra difficulty with the symbolic questions—there was a slight drop in performance, particularly on the hardest question, but not really a significant one. At the bottom of the class, the difference in performance was enormous, with the numeric questions being very much easier than the symbolic ones.
Torigoe chose to eliminate the two hardest questions from further analysis. One question 8, there was no difference between performance on symbolic and numeric versions, and the class as a whole was doing only a little better than guessing, so data from this question was unlikely to be meaningful about anything other guessing strategies. Question 5, however, had the strongest numeric/symbolic effect, and it is not at all clear why this question was excluded from the study. The performance on Question 5 was completely consistent with the subsequent analysis of the non-excluded questions (that is, it had all the characteristics of the other questions that showed a strong difference in performance), so it wasn’t a matter of excluding an outlier that did not support the hypothesis. Perhaps Torigoe was just trying to be fair—after excluding the hardest problem (which did not match his hypothesis) he excluded the second hardest (even though it did match his hypothesis).
His after-the-fact analysis of the data suggests that the crucial factor is whether a problem involves manipulating a generic equation, which requires students to assign object-specific variables and apply a generic equation appropriately. Students at the bottom of the class often manipulate the equation blindly, without realizing that they have assigned the wrong meaning (average velocity instead of final velocity, for example).
E. Torigoe and G. Gladding, “Connecting Symbolic Difficulties with Success in Physics.” American Journal of Physics, 79(1), pp.133-140 (2011). [PDF] [Supplementary Material]
This paper is a more detailed analysis of the same data as in the 2007 paper. (Why did it take 4 more years to be done?) Torigoe recognized that the 2007 paper was a post hoc analysis and the hypotheses developed there had not been independently confirmed. In this study, he took a couple of the hypotheses and tried to confirm them on an independent data set. He did not generate new tests, but coded questions on older tests according to his classification scheme, and tried to see whether they were discriminating questions. That is, he was looking at structural properties of the math needed for a question to predict whether the question was easy for the top of the class and hard for the bottom of the class. He was mainly interested in distinguishing the bottom 1/4 from the rest, which implies to me that he is an educator of the NCLB type, more interested in getting everyone to a minimal pass level than in pushing the top end up. Of course, since the phenomenon he was studying (difficulty in symbolic manipulation) was not measurable in the top quarter of the class, this was a sensible, pragmatic choice for the data he had.
The main hypothesis he tested was that “equation-priority” questions would be most discriminating. He defined these as having any of the following characteristics:
- Multiple-equation symbolic questions;
- simultaneous-equation numeric questions; or
- single-equation numeric questions where the target unknown appears on opposite sides of the equal sign.
He believed that these characteristics require students to set up a symbolic equation and manipulate it, though the particular example given at the end of the paper does not. (There are some distractors, but the problem boils down to having a 10N force on a wire accelerating an unknown mass upward at 5m/s^2, and asks for the mass.)
He found that the equation-priority questions were more discriminating than other questions of comparable difficulty (comparable difficulty for the top ¾ of the class, that is). The questions he identified turned out to be among the hardest questions, so comparable difficulty questions were determined in two ways: eliminating easy questions until the remaining questions had the same average score as the equation-priority questions, or matching questions individually by difficulty. It didn’t make any difference which method was used.
The correlation between performance on equation-priority questions was not huge (Pearson’s r 0.38, though it isn’t clear from the paper whether that is correlation with total score for the class or just with a bottom-¼/top-¾ indiator variable), but is much bigger than the correlation from other questions of comparable difficulty (Pearson’s r 0.29). The equation-priority questions were significantly enriched for highly discriminating questions (ones that had a correlation with class performance higher than 0.4).
E. Torigoe, “How Numbers Help Students Solve Physics Problems.” arXiv:1112.3229v1 [Submitted to the American Journal of Physics 12/14/2011]
The preprint I discussed earlier is using fifteen of the students in same Spring 2007 class used for the previous 2 papers. These papers look like an attempt to get as many publications as possible out of a single study in a PhD thesis. Personally, I would have preferred to see new tests devised specifically to address the questions raised, rather than going back to mine the same 10 problems over and over. I think that this data set has now been over-analyzed, and any conclusions drawn from it really need independent confirmation with a new data set. The 2011 paper using the 2006 exams addressed the problem partially, but was not able to test many of Torigoe’s hypotheses, because there were not paired versions of the tests with the differences being just the points in question.
Bottom line: Torigoe may have identified some structural characteristics of problems that give the bottom ¼ of large, introductory, calculus-based physics classes particular difficulty. More and better experiments are needed to see whether his analysis captures the phenomenon correctly or is the result of some confounding variable. Most of his analysis is of no relevance for the top ¼ of the class, where the future physics and engineering majors should be concentrated.