In my post Numbers vs. symbols, I summarized and critiqued the preprint How numbers help students solve physics problems, by Eugene Torigoe. In this post I’ll look at three published papers by Torigoe, from his site https://sites.google.com/site/etorigoe/research.

## E. Torigoe and G. Gladding, “Same to Us, Different to Them: Numeric Computation Versus Symbolic Representation.” Physics Education Research Conference, AIP Conference Proceedings, 2006. [PDF]

This paper used a large sample 894 students given different versions of a final exam. There were two questions that represented identical problems, but one form of the exam had the questions in symbolic (algebraic) form and the other had the questions in numeric form. (It would have been better for each form of the exam to have one numeric and one symbolic question—fairer to the students and less likely to pick up an artifact from who got which form of the exam.)

The difference in performance between the numeric and the symbolic forms of the question were enormous, supporting Torigoe’s conclusion that many students who can correctly formulate and solve a problem given explicit numbers still have difficulty doing the same thing symbolically. (The conclusion could have been stronger if each student had had one symbolic and one numeric problem to solve.)

Because there were only 2 paired questions, it was not possible in this study to analyze the reasons for difference in performance, though Torigoe mentions several possibilities.

## E. Torigoe and G. Gladding, “Symbols: Weapons of Math Destruction.” Physics Education Research Conference, AIP Conference Proceedings, 2007. [PDF]

This study is a followup on the previous one, now with 10 paired questions and 765 students. Unfortunately, the same flawed protocol of making one version of the test all symbolic and the other all numeric seems to have been followed. The unfairness to the students is larger this time (since the numeric questions are already believed to be easier).

With 10 questions, it was possible to exam in more detail what sort of mathematical question caused the biggest difficulty. Torigoe also did a more detailed analysis of who was having difficulty with the symbolic versions. It turns out that the top of the class was having little extra difficulty with the symbolic questions—there was a slight drop in performance, particularly on the hardest question, but not really a significant one. At the bottom of the class, the difference in performance was enormous, with the numeric questions being very much easier than the symbolic ones.

Torigoe chose to eliminate the two hardest questions from further analysis. One question 8, there was no difference between performance on symbolic and numeric versions, and the class as a whole was doing only a little better than guessing, so data from this question was unlikely to be meaningful about anything other guessing strategies. Question 5, however, had the strongest numeric/symbolic effect, and it is not at all clear why this question was excluded from the study. The performance on Question 5 was completely consistent with the subsequent analysis of the non-excluded questions (that is, it had all the characteristics of the other questions that showed a strong difference in performance), so it wasn’t a matter of excluding an outlier that did not support the hypothesis. Perhaps Torigoe was just trying to be fair—after excluding the hardest problem (which did not match his hypothesis) he excluded the second hardest (even though it did match his hypothesis).

His after-the-fact analysis of the data suggests that the crucial factor is whether a problem involves manipulating a generic equation, which requires students to assign object-specific variables and apply a generic equation appropriately. Students at the bottom of the class often manipulate the equation blindly, without realizing that they have assigned the wrong meaning (average velocity instead of final velocity, for example).

## E. Torigoe and G. Gladding, “Connecting Symbolic Difficulties with Success in Physics.” American Journal of Physics, 79(1), pp.133-140 (2011). [PDF] [Supplementary Material]

This paper is a more detailed analysis of the same data as in the 2007 paper. (Why did it take 4 more years to be done?) Torigoe recognized that the 2007 paper was a post hoc analysis and the hypotheses developed there had not been independently confirmed. In this study, he took a couple of the hypotheses and tried to confirm them on an independent data set. He did not generate new tests, but coded questions on older tests according to his classification scheme, and tried to see whether they were discriminating questions. That is, he was looking at structural properties of the math needed for a question to predict whether the question was easy for the top of the class and hard for the bottom of the class. He was mainly interested in distinguishing the bottom 1/4 from the rest, which implies to me that he is an educator of the NCLB type, more interested in getting everyone to a minimal pass level than in pushing the top end up. Of course, since the phenomenon he was studying (difficulty in symbolic manipulation) was not measurable in the top quarter of the class, this was a sensible, pragmatic choice for the data he had.

The main hypothesis he tested was that “equation-priority” questions would be most discriminating. He defined these as having any of the following characteristics:

*Multiple-equation symbolic questions;**simultaneous-equation numeric questions; or**single-equation numeric questions where the target unknown appears on opposite sides of the equal sign.*

He believed that these characteristics require students to set up a symbolic equation and manipulate it, though the particular example given at the end of the paper does not. (There are some distractors, but the problem boils down to having a 10N force on a wire accelerating an unknown mass upward at 5m/s^2, and asks for the mass.)

He found that the equation-priority questions were more discriminating than other questions of comparable difficulty (comparable difficulty for the top ¾ of the class, that is). The questions he identified turned out to be among the hardest questions, so comparable difficulty questions were determined in two ways: eliminating easy questions until the remaining questions had the same average score as the equation-priority questions, or matching questions individually by difficulty. It didn’t make any difference which method was used.

The correlation between performance on equation-priority questions was not huge (Pearson’s r 0.38, though it isn’t clear from the paper whether that is correlation with total score for the class or just with a bottom-¼/top-¾ indiator variable), but is much bigger than the correlation from other questions of comparable difficulty (Pearson’s r 0.29). The equation-priority questions were significantly enriched for highly discriminating questions (ones that had a correlation with class performance higher than 0.4).

**E. Torigoe, “How Numbers Help Students Solve Physics Problems.” arXiv:1112.3229v1 [Submitted to the American Journal of Physics 12/14/2011]**

The preprint I discussed earlier is using fifteen of the students in same Spring 2007 class used for the previous 2 papers. These papers look like an attempt to get as many publications as possible out of a single study in a PhD thesis. Personally, I would have preferred to see new tests devised specifically to address the questions raised, rather than going back to mine the same 10 problems over and over. I think that this data set has now been over-analyzed, and any conclusions drawn from it really need independent confirmation with a new data set. The 2011 paper using the 2006 exams addressed the problem partially, but was not able to test many of Torigoe’s hypotheses, because there were not paired versions of the tests with the differences being just the points in question.

Bottom line: Torigoe may have identified some structural characteristics of problems that give the bottom ¼ of large, introductory, calculus-based physics classes particular difficulty. More and better experiments are needed to see whether his analysis captures the phenomenon correctly or is the result of some confounding variable. Most of his analysis is of no relevance for the top ¼ of the class, where the future physics and engineering majors should be concentrated.

[…] the original here: Numbers vs. symbols again « Gas station without pumps This entry was posted in Uncategorized and tagged conference, education, eugene-torigoe, […]

Pingback by Numbers vs. symbols again « Gas station without pumps | My Blog — 2011 December 29 @ 12:47 |

[…] work, which I discussed in Numbers vs. symbols and Numbers vs. symbols again, is beginning to spread through the educator blogosphere. Dan Goldner, in his blog Work in […]

Pingback by More on numbers and symbols « Gas station without pumps — 2012 January 2 @ 11:39 |

I appreciate that you spent the time to read my prior work. I’m sorry if you didn’t find much value from the experience. My intended audience was physics instructors interested in problem solving. My hope is that these papers demonstrate some of the mechanisms of why symbolic problems tend to be more difficult than analogous numeric problems, and how different question properties/structures influence question difficulty.

I’m sympathetic to your criticism about the top 1/4 students. To be honest it’s not clear to me what relevance this has to the top students, who on average do not distinguish the numeric and symbolic versions. It could be that their ability to solve symbolic problems helps them become better problem solvers, or it could be that brighter students generally do well and are able to pick up the procedures for symbolic problem solving. In either case, from my personal experience a student can’t become a successful physics major without understanding symbolic problem solving. One of my motivations was just to gain a deeper understanding of why it is difficult for students.

Let me address some of your criticisms of the papers.

You mention the possibility that the results may be an artifact of who got what version of the final, but that is very unlikely to be the case. Because there were so many students in the sample (N ~ 900), it is statistically unlikely that there would be any differences between the two groups. In fact we tested this by comparing each groups midterm grades (both in average and with a histogram binning by midterm average) and found no differences. Similar results were found in other semesters of the course as well.

We also tried to minimize the unfairness to the students. The two versions of the final were different exams with the exception of maybe a third of the questions being the same on both versions. We modified 10 questions from an exam which contained a total of 55-60 total questions. Each final was then curved independently. When we did the 10 question study we intended for there to be half numeric and half symbolic on each version, but because of an oversight the split was something like 7 and 3 (from what I remember). I’m not saying that it is perfectly fair, but fairness was not something that was ignored.

Even though the 2011 AJP article focuses on the bottom 1/4 of students, the correlation coefficient was calculated for each question using the list of total course points for each student. This measure did not use the bottom 1/4, middle 1/2 and top 1/4 groupings.

Thanks again for the time you spent reading my work.

Comment by Eugene Torigoe — 2012 January 3 @ 10:05 |

I am delighted to get your feedback on my comments.

A 7:3 split of the questions sounds better than what I inferred from the papers (one of which implied a 10:0 split without actually saying so).

Curving the two different forms independently also relieves my concerns about fairness. Although these concerns were not germane to the specific question you were addressing in your papers, a little more detail about the testing protocol (as you provided here) would have been welcome in the papers.

I agree that the large sample size and random assignment of test forms makes sample bias unlikely to have large effects. A 5:5 split would have reduced that source of error further, but even a 7:3 split helps.

It would have been good in the paper to say what the correlation was with (total class points, rather than class rank or an indicator variable of bottom 1/4). A scatter diagram is often a useful thing to add to a correlation measure, so that one can see whether the correlation is noise around a straight line (which Pearson’s r assumes), a non-linear relationship, or something more complicated.

I found your papers interesting, though not directly relevant to my teaching (at least, not until I teach a large freshman class, which I haven’t done for over a decade).

Comment by gasstationwithoutpumps — 2012 January 3 @ 20:34 |