Gas station without pumps

2017 May 18

Midterm quiz doesn’t tell me much new

Filed under: Circuits course — gasstationwithoutpumps @ 09:53
Tags: , ,

I don’t usually give exams in my courses any more, because I’m more interested in what students can do when they have time and resources than what they can do on toy problems under resource limitations.  But if students don’t do the homework, then they don’t learn the material, so I threaten each class that if too many students don’t turn in the homework, I’ll have to add a quiz (worth as much as one of the lab reports, each of which is equal to all the homework) to the course.

This quarter I had to follow through on that threat, because 12% of the class had turned in half or less of the homework (and by that, I don’t mean answered half the questions—I mean turned in nothing at all for half the assignments).  A quarter of the class had not turned in 25% or more of the assignments.

I gave the quiz yesterday, with 6 easy questions that only tested the very basic material: single-pole RC filters (passive and active) and negative-feedback amplifiers.  I told students ahead of time (and on the exam) that they could use the Bode approximations (the straight-line approximations to the gain of the RC filters) and we even reviewed them in class last week.  There were 60 points possible on the test, and none of the questions were design questions—they were almost all of the form “what is the corner frequency?” or “what is the gain of this circuit?”.

There are a small number of students in the class whose probity I have reason to question, so I took steps to reduce cheating that I would not normally bother with: I made up two versions of the test (same schematics, but different component values) and alternated them in the piles passed along each row.  I also had the students sit in different rows from usual, reversing front and back of the room, with the front row reserved for latecomers. I’ve noticed a high correlation between good homework grades and people being on-time and in the first two rows, so I had those students sit in the back row, where no one would be able to copy from them.

I normally figure that a test is appropriately long if an expert can do it in about a quarter of the time allotted.  So I made up the keys for the test while the students were taking it.  Working through one form with the Bode approximations took about 5 minutes.  Doing exact computation with the formulas for series and parallel impedances and complex numbers using only real-number arithmetic on my calculator extended that by another 15 minutes.  The students had 63 minutes, so the exam was too easy if the students used the Bode approximations (as they were told) but a little too hard if they worked just from the fundamentals of complex impedance and negative-feedback amplifiers.  As a consequence, I decided to give bonus points for exact computations of the gains that didn’t use the Bode approximations, though the class was not informed of this bonus, because I didn’t want them to waste time on the tiny bonus.  (The differences in answers were small, because I had deliberately asked for gains only at points well away from the corner frequency, so that the Bode approximations would be good.)

Even if students really didn’t understand complex impedance or RC filters, 39 of the 60 points could be earned with just DC analysis of the negative-feedback amplifiers and knowing that capacitors don’t conduct DC.   So I was hoping that students would do better on these very easy questions than they did on the harder design questions of the homework.  As a confirmed pessimist, though, I expected that students would show almost exactly the same distribution on the test that they showed on the homework, with the middle of the class being around 20 out of 60 points and showing serious misunderstandings of almost everything, with a long tail out to one or two students who would get almost everything right.  I also expected that the correlation between the homework scores and the quiz scores would be high.

So what happened?  First, I saw no evidence of any cheating (not that I had expected any), so that is one worry removed.  Second, my pessimistic assumption that students really were not learning stuff that they had done many times in homework and in lab was confirmed:

Here is a stem-and-leaf plot of the scores:

OO: 3
05: 6889
10: 011112444444
15: 555667777899
20: 00111112223344
25: 677999
30: 12224
35: 5678
40: 00444
45: 67
50: 01
55: 
60: 2

The median is indeed 21 out of 60, as I feared. At least no one got a zero, though the scores at bottom indicated complete failure to apply the basics of the course.

Most students could compute a corner frequency from a resistor and capacitor, but few had any idea what to do with that corner frequency. Many students could compute the DC gain of a non-inverting amplifier, though many could not then apply this knowledge to the DC gain of an active filter (which only requires replacing the capacitors with open circuits). A lot of students forgot the “+1” in the formula of the gain for the non-inverting amplifier.

Inverting amplifiers were even less understood than non-inverting ones, with students forgetting the minus sign or trying to use the formula for non-inverting amplifiers.

A lot of student answers failed simple sanity checks (students were having passive RC filters with gain greater than 1, for example).

Very few students used the Bode approximations correctly, and many tried the exact solution but either couldn’t set up the formulas correctly or couldn’t figure out how to use their calculators, often getting numbers that were way, way off.  Others seem to have ignored the complex numbers and treat x+jy as if it were x+y.

One disturbing result was how many students failed to recognize or understand a circuit that they have designed in three different labs: a voltage divider and unity-gain buffer to generate Vref, combined with a non-inverting amplifier. I asked for the output voltage as a function of the input voltage (both clearly labeled on the schematic). This was intended to be almost free points for them, since they had used that circuit so many times, and the formula they needed was one of the few formulas on the study sheet: \frac{V_{out}-V_{ref}}{V_{in}-V_{ref}} = 1 + Z_{f}/Z_{i} . The frequent failure to be able to fill in the blanks of this formula for a circuit that they have used several times in design makes me question whether the students are actually learning anything in the course, or if they are simply copying designs from other students without understanding a thing. (Note: the extremely poor performance and group-think duplication of ludicrously wrong answers on pre-lab homework this year has also lead me to the same question.)

Did the quiz tell me anything that the homework had not already told me? Here is the scatter diagram:

Pearson’s r correlation is 0.539 and Kendall’s tau is 0.306, so the homework and quiz scores are highly correlated. There are a few outliers: a diligent student who bombed the quiz and a student who has turned in few of the homeworks who actually understands at least the easy material. The points have a small amount of noise added, so that duplicate points are visible.

The high correlation between the quiz and the homework mostly confirmed my prior belief that the quiz would not tell me much that is new, and that the homework grades were pretty reflective of what students had learned. I will want to talk with a few of the most extreme outliers, to find out what happened (why were students who mostly understood the material blowing off the homework? and why did diligent students who had been doing moderately well on the homework bomb the quiz—is there undiagnosed test anxiety that should be getting accommodations, for example?).

Most of the points that were earned were from students randomly plugging numbers into a memorized formula and (perhaps accidentally) having chosen the right formula and the right numbers. Only a few students showed real understanding of what they were doing, and only one student saw the quiz as the trivial exercise it was intended to be.

It seems that the hands-on active learning that I have been so enthusiastic about is not working any better at getting students to learn the basics than the traditional (and much cheaper) droning lecture that EE uses. I’m not in complete despair about the course, as there is some evidence that students have picked up some lab skills (using oscilloscopes, multimeters, soldering irons, …) and some writing skills (though many are still not writing at a college level). But I’m trying to teach the students to be engineers, not technicians, so I was aiming at them understanding how to design and debug things, not just implementing other people’s designs. Picking up lab skills is not enough for the course.

I need help. How do I reach the lower half of the class? How do I get them to think about simple electronics instead of randomly applying half-remembered formulas? We’ve only got 3 weeks left—I don’t know how much I can salvage for this cohort, but I certainly would like better outcomes next year.

2014 October 25

Grading based on a fixed “percent correct” scale is nonsense

Filed under: Uncategorized — gasstationwithoutpumps @ 10:12
Tags: , , , , , ,

On the hs2coll@yahoogroups.com mailing list for parents home-schooling high schoolers to prepare for college, parents occasionally discuss grading standards.  One parent commented that grading scales can vary a lot, with the example of an edX course in which 80% or higher was an A, while they were used to scales like those reported by Wikipedia, which gives

The most common grading scales for normal courses and honors/Advanced Placement courses are as follows:

“Normal” courses Honors/AP courses
Grade Percentage GPA Percentage GPA
A 90–100 3.67–4.00 93–100 4.5–5.0
B 80–89 2.67–3.33 85-92 3.5–4.49
C 70–79 1.67–2.33 77-84 2.5–3.49
D 60–69 1.0–1.33 70-76 2.0–2.49
E / F 0–59 0.0–0.99 0–69 0.0–1.99
​Because exams, quizzes, and homework assignments can vary in difficulty, there is no reason to suppose that 85% on one assessment has any meaningful relationship to 85% on another assessment.  At one extreme we have driving exams, which are often set up so that 85% right is barely passing—people are expected to get close to 100%.  At the other extreme, we have math competitions: the AMC 12 math exams have a median score around 63 out of 150, and the AMC 10 exams have 58 out of 150.  Getting 85% of the total points on the AMC 12 puts you in better than the top 1% of test takers.  (AMC statistics from http://amc-reg.maa.org/reports/generalreports.aspx ) The Putnam math prize exam is even tougher—the median score is often 0 or 1 out of 120, with top scores in the range 90 to 120. (Putnam statistics from  http://www.d.umn.edu/~jgallian/putnam.pdf) The point of the math competitions is to make meaningful distinctions among the top 1–5% of test takers in a relatively short time, so questions that the majority of test takers can answer are just time wasters.
I’ve never seen the point of having a fixed percentage correct ​used institution-wide for setting grades—the only point of such a standard is to tell teachers how hard to make their test questions.  Saying that 90% or 95% should represent an A merely says that tests questions must be easy enough that top students don’t have to work hard, and that distinctions among top students must be buried in the test-measurement noise.  Putting the pass level at 70% means that most of the test questions are being used to distinguish between different levels of failure, rather than different levels of success. My own quizzes and exams are intended to have a mean around 50% of possible points, with a wide spread to maximize the amount of information I get about student performance at all levels of performance, but I tend to err on the side of making the exams a little too tough (35% mean) rather than much too easy (85% mean), so I generally learn more about the top half of the class than the bottom half.
I’m ok with knowing more about the top half than the bottom half, but my exams also have a different problem: too often the distribution of results is bimodal, with a high correlation between the points earned on different questions. The questions are all measuring the same thing, which is good for measuring overall achievement, but which is not very useful for diagnosing what things individual students have learned or not learned.  This result is not very surprising, since I’m not interested in whether students know specific factoids, but in whether they can pull together the knowledge that they have to solve new problems.  Those who have developed that skill often can show it on many rather different problems, and those who haven’t struggle on any new problem.

Lior Pachter, in his blog post Time to end letter grades, points out that different faculty members have very different understandings of what letter grades mean, resulting in noticeably different distributions of grades for their classes. He looked at very large classes, where one would not expect enormous differences in the abilities of students from one class to another, so large differences in grading distributions are more likely due to differences in the meaning of the grades than in differences between the cohorts of students. He suggests that there be some sort of normalization applied, so that raw scores are translated in a professor- and course-specific way to a common scale that has a uniform meaning.  (That may be possible for large classes that are repeatedly taught, but is unlikely to work well in small courses, where year-to-year differences in student cohorts can be huge—I get large year-to-year variance in my intro grad class of about 20 students, with the top of the class some years being only at the performance level of  the median in other years.)  His approach at least recognizes that the raw scores themselves are meaningless out of context, unlike people who insist on “90% or better is an A”.

 People who design large exams professionally generally have training in psychometrics (or should, anyway).  Currently, the most popular approach to designing exams that need to be taken by many people is item-response theory (IRT), in which each question gets a number of parameters expressing how difficult the question is and (for the most common 3-parameter model) how good it is at distinguishing high-scoring from low-scoring people and how much to correct for guessing.  Fitting the 3-parameter model for each question on a test requires a lot of data (certainly more than could be gathered in any of my classes), but provides a lot of information about the usefulness of a question for different purposes.  Exams for go/no-go decisions, like driving exams, should have questions that are concentrated in difficulty near the decision threshold, and that distinguish well between those above and below the threshold.  Exams for ranking large numbers of people with no single threshold (like SAT exams for college admissions in many different colleges) should have questions whose difficulty is spread out over the range of thresholds.  IRT can be used for tuning a test (discarding questions that are too difficult, too easy, or that don’t distinguish well between high-performing and low-performing students), as well as for normalizing results to be on a uniform scale despite differences in question difficulty.  With enough data, IRT can be used to get uniform scale results from tests in which individuals don’t all get presented the same questions (as long as there is enough overlap in questions that the difficulty of the questions can be calibrated fairly), which permits adaptive testing that takes less testing time to get to the same level of precision.  Unfortunately, the model fitting for IRT is somewhat sensitive to outliers in the data, so very large sample sizes are needed for meaningful fitting, which means that IRT is not a particularly useful tool for classroom tests, though it is invaluable for large exams like the SAT and GRE.
The bottom line for me is that the conventional grading scales used in many schools (with 85% as a B, for example) are uninterpretable nonsense, that do nothing to convey useful information to teachers, students, parents, or any one else.  Without a solid understanding of the difficulty of a given assessment, the scores on it mean almost nothing.

2011 August 16

Some FAQs for PhD students

Filed under: Uncategorized — gasstationwithoutpumps @ 15:40
Tags: , , ,

I maintain our departments FAQ page for the grad students, and I just added two new entries today.  Although some of the comments are specific to our department, many apply to any Ph.D. in any field, and so I thought it worthwhile to post them on my blog, for those outside the department.  I welcome any suggestions for improvements to the answers to the FAQ.

How is the Advancement to Candidacy (ATC) exam structured?

The Advancement to Candidacy Exam in BME is primarily an evaluation of the student’s thesis proposal. The format of the exam is fairly simple:

  1. One month before the exam, the student distributes the written thesis proposal to the committee.  Some faculty prefer paper copies, some prefer electronic copies—the student should check with the individual faculty members. Drafts of the proposal should already have been read and approved for distribution to the committee by the student’s adviser—it is embarrassing for everyone if the adviser tells the committee that they are not willing to supervise the proposed work.
  2. The student presents their thesis proposal publicly in a one-hour oral presentation.  It is the tradition of the department that all grad students and faculty who are in town and not scheduled with conflicting classes attend the advancement talks.
  3. After the public presentation, the committee grills the student in private, usually in the same room as the public presentation.  This may cover material in the written proposal, the oral presentation, background material that the student should know, research plans, contingency plans, or any other material that the committee needs to know in order to determine whether the student passes.
  4. The committee deliberates in private without the student present. At this time the committee decides whether the student passes, passes contingent on making some corrections to the proposal, or fails.  If the student passes, the committee signs the appropriate paperwork.  In any case, the chair of the committee takes notes, in order to prepare the formal report on the exam, which is a 1- to 2-paragraph summary of the strengths and weaknesses shown.  The formal report is filed with the School of Engineering grad advising staff.
  5. The student is called back in and the results presented to them.  Any contingencies on passing are clearly spelled out (and generally provided by e-mail within a day).

How is the Advancement to Candidacy (ATC) exam judged?

The faculty on the advancement committee have to weigh several considerations:

  • Does the student have enough background knowledge and skill to do the proposed research? For the most part, the BME department relies on the required courses to provide adequate breadth, so the ATC exam concentrates on the knowledge needed for the specific research, but this often entails more breadth than most students realize, so students should be ready for questions on anything that they have (or should have) learned. A student who has completed the required courses, but still does not have sufficient knowledge may be required to take more courses.
  • If the proposed research were completed, would it be sufficient for a PhD thesis? A PhD thesis must make some original contribution to science or engineering.  It is not enough to do a better implementation of an existing idea (though that is sufficient for an MS thesis).  Sometimes students propose work that is just routine lab work or programming, with no indication of the novelty of the ideas.  The thesis proposal should start with a clear statement of the new ideas that are being developed.
  • Can the research be completed in a reasonable time frame with available resources? Some students pick projects that would be really exciting if possible, but which can’t realistically be done.  These students usually get contingent passes or fails, and need to rework their research plans to be more reasonable.  Here the committee is the student’s friend, protecting them from themselves.
  • Is the research plan clearly stated, with clear indication of where the PhD thesis ends and post-doc work begins? A lot of students come to the exam with open-ended research plans that could take a lifetime to complete.  This is useless for planning a PhD.  Here the committee is the student’s friend, protecting them from advisers who always want one more experiment.  Having a clear statement about where the thesis research ends is important, and one of the main goals of a successful ATC exam is for the student, the adviser, and the rest of the committee to have agreed on the exact scope of the thesis.
  • Are there contingency plans in case experiments fail? Since PhD research is expected to be cutting-edge, and since the ATC exam is supposed to occur early in the research, there is a high probability that some of the proposed research will not have the expected results.  Both wet-lab and computational experiments can fail.  Students should have prepared some contingency plans for other approaches to the problem should experiments not work out or data that is expected from collaborators not be available.  It is extremely common for students to be required to rework their research plans to include some discussion of alternatives should the rosy best-case plans they initially laid out not be realized.  It is better for students to do this planning before giving the proposal to the committee.
  • Is the thesis proposal written well enough to give the committee confidence that the student can write a Ph.D. thesis, given the same level of assistance that they had for the proposal?  Note that of the 4 C’s of tech writing (clarity, correctness, conciseness, and completeness), Ph.D. theses put the most emphasis on correctness and completeness.
  • Is it clear what the candidate’s contributions are, clearly separated from the research group’s or adviser’s contributions?  The purpose of a Ph.D. thesis is to establish that the individual is capable of significant research contributions—thus it is essential that the proposal make it clear what the student has done and will do.  A thesis or thesis proposal is no place for the generic “we” or passive voice.  Whenever “we” is used, the names of the contributors should be given, and “I” should be used to emphasize the unique contributions of the candidate.

Blog at WordPress.com.

%d bloggers like this: