Are All Wrong FCI Answers Equivalent? is a conference paper that does a cluster analysis of answers to the Force Concept Inventory, getting 7 different groups of students. They followed that by a hidden Markov analysis or pre-test and post-test results to see what transitions were most probable.

The data set was of respectable size (2275 students), but they do the clustering on only 4 questions (4 questions that came out as the first factor in their latent class factor analysis). With 5 possible answers to each question, there are only 625 possible groups. They did not explain how they clustered into 7 groups, though they did describe why they chose 7 (it was the smallest number of groups in which deviations of observed values from the values predicted by the model were not significantly different from chance observations, though they never gave p values, so I don’t know what criteria they used).

They had a theory about how the students were thinking to explain the patterns of results they saw, but it is not clear that this theory has any predictive value, nor that a different FCI data set would produce the same 7 clusters.

I believe that analyzing the types of wrong answers will reveal more information about students than using a single-bit right/wrong value for each question, but I’m not convinced that their analysis is robust enough to base any pedagogic decisions on.

They hypothesized 5 possible schemata for responses to a question:

- N, Newtonian:
- a correct answer
- D1, Dominance:
- larger masses exert larger forces
- D2, Dominance:
- objects that initiate movement exert larger forces
- PO, Physical Obstacles:
- physical motion is determined by obstacles in the path of moving objects
- NF, Net Force:
- an incorrect understanding of net force, in which the net force is the sum of scalars rather than of vectors (though they expressed it differently)

It is not clear whether each of the questions had answers corresponding to the 5 possible schemata. Indeed, for each of the 4 questions analyzed there seemed to be only 2 probable answers (at least for 5 of their 7 classes of students—they gave up on analyzing the classes C6 and C7, other than saying that they used PO schemata more than the other groups did). Having only 2 common answers means that the answers were only carrying about one bit of information, not bits, as a very carefully crafted question might. If there are only 2 common answers, a right one and a wrong one, then analyzing the pattern of wrong answers is not going to add much to the analysis.

Question Q4 only distinguished between N and D1, q15 between N and either D2 or NF (either schema would produce the same wrong answer), q16 between N or NF and D2 or NF, and q28 between N and D1. They did not show that the rare answers to the questions reflected other schemata, and I don’t have a copy of FCI to do that analysis myself.

I’m a little confused how both answer A and answer C of q16 could result from the same NF schema, though it is clearly a problem that the right answer to q16 could result from the wrong schema.

I found their partial reporting of results (like just the maximum likelihood answer for each question for each group) rather frustrating, as it was not enough to do more careful analysis of the results. Since the most likely answer to each question was the same for C3 and C1 (the correct answer), it looks like students in C3 got there because they got one or two of the questions wrong (but not q15, which would have put them in group C2. Group C4 got q16 right but the others wrong, and group C5 got them all wrong.

Overall, I think that they made a valiant effort, but analyzed too few questions to reach any conclusions—not about the 7 clusters, not about their 5 schemata, and not about pretest/post-test transitions.