Gas station without pumps

2015 March 27

Followup on plagiarism

Filed under: Uncategorized — gasstationwithoutpumps @ 08:26
Tags: , , ,

In Plagiarism detected, I mentioned that an article in Nature Biotechnology plagiarizes from my blog, specifically Supplementary Material page 6 from Segmenting noisy signals from nanopores. I got email from the last author this week, explaining the situation:

We saw your recent blog post about our paper and feel that we owe you an explanation.

At the time we read your level-finding blog post we had already implemented a recursive level-finding algorithm that we have been using  in our lab.  Our algorithm made comparison of two data segments using a T-test. We came across your blog and found that the logP value was more useful than the T-test.  We wanted to cite your blog, but Nature’s online publication guidelines made it seem that “Only articles that have been published or submitted to a named publication should be in the reference list” (http://www.nature.com/nature/authors/gta/#a5.4). While we wanted to present our methods as transparently as possible, we had no intention of claiming your work as ours.We should have made efforts to contact you and NBT editors about how to best cite your contribution.

I have contacted NBT to see if a post-publication citation to your blog can be made and I will keep you posted on this.

We noted your recent BioarXiv manuscript and will refer to it in future publications using logP-test level-finders.

So one of the two corrections I was seeking has been met (an apology from the authors), and the other (a citation to the blog) is being sought by the authors. It seems that Nature has a very poor policy about citations, discouraging correct attribution.  Yet another reason to consider them a less desirable family of journals (their rip-off pricing for libraries and their preference for sensational articles over careful research are others).

On a related front, referees for our journal submission of the segmenter paper pointed out that several of the ideas are not new (hardly surprising), and that the basic algorithm has been around for quite a while.  They pointed us to a paper by Killick, Fearnhead, and Eckley (http://arxiv.org/pdf/1101.1438.pdf), which supposedly has an exact algorithm that is as efficient as binary segmentation (which only approximates the best breakpoints). I thank the referees for the pointer—that is the sort of thing peer review is supposed to be good for: pointing out to authors where they have missed relevant prior literature.

I’ve only glanced through the paper (I had 16 senior theses to grade in 4 days, plus trying to get a new draft of my book for my applied electronics course done in time for classes starting next Monday), so I can’t say anything about the algorithm they present, but they do give a citation for the binary algorithm that dates back to 1974:

Scott, A. J. and Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics, 30(3):507–512.

The online version of the journal only goes back to 1999, so I’ve not confirmed that the paper does contain the same algorithm, but it would not surprise me if it did—the binary split method is fairly obvious once the basics of splitting on log-likelihood are understood.  I had looked for papers on the technique and not found them (which surprised me), but I didn’t look as hard as I should have. I did not find the right entry points to the literature—it is scattered over many different disciplines and I relied too much on the one textbook that I did find to give me pointers. And I didn’t read all the textbook, so I may have missed the appropriate pointers—though they do not cite Scott and Knott, so maybe the textbook authors missed an important chunk of the literature, too.

Now that the Killick et al. paper has given me some useful pointers, I have a lot of reading to do.  I don’t know if I’ll have time before the summer, though—my teaching load starting next week is pretty heavy (I was just noticing that my calendar had 24.5 hours scheduled for the first week, not counting time for prepping for classes, setting up the lab, grading, or revising the book for the electronics class: 7 hours of lecture, 12 hours of lab class, 2 office hours, 1.5 hours meeting with the department manager, 2 hours faculty meeting—and the dean wants to meet with me for half an hour sometime also).

Given that the main idea in our segmenter paper is an old one, for it to be salvageable, we’ll have to shrink the basic algorithm to a brief tutorial (with citations to prior inventors) and concentrate on the little changes made after the basic idea: the parameterization of the threshold setting and the correction for low-pass filtering.  There may be a little bit for applying the idea to stepwise slanting segments using linear regression, but I bet that idea is also an old one, buried somewhere in the literature.

This summer I may want to look at implementing the ideas of the Killick et al. paper (or other similar approaches), to see if they really do produce better segmentation as quickly.

2015 March 14

Plagiarism detected

Filed under: Uncategorized — gasstationwithoutpumps @ 20:33
Tags: , , ,

It has recently come to my attention that an article in Nature Biotechnology: doi:10.1038/nbt.2950 “Decoding long nanopore sequencing reads of natural DNA” plagiarizes from my blog, specifically Supplementary Material page 6 from Segmenting noisy signals from nanopores.  Now, I don’t mind their using my work—I would not have published it in such a public form as posting to my blog if I were trying to keep it secret—but standard scholarly practice requires that sources be cited.  Claiming someone else’s work as one’s own is the academic sin.

I don’t know which of the 13 authors of the Nature Biotechnology authors is the plagiarist, but I hold the head of the lab (Jens Gundlach) responsible for the plagiarism, since it seems clear that he did not bother to check that his students and co-workers were citing others’ work appropriately.  It is the job of the head of a lab to create a culture of proper citation—failure to do so is indication of not doing one’s job as a scholar or as a professor.

I’m undecided about what to do about this plagiarism.  The obvious thing to do would be to complain to the editors, but I have no idea whether that will do any good.  The last time I had a serious plagiarism case like this was when I was in logic minimization, and parts of a paper of mine that had been rejected from the main (almost sole) journal in the field later appeared in a conference article with the editor who had rejected the paper as one of the co-authors.  In that case, complaints to the journal were useless (they just sent the complaints to the editor who had plagiarized from me—thereby ensuring that I would never get any papers published in the field).  I ended up leaving the field in disgust (as several other researchers had done—the field has been pretty stagnant since all new ideas were blocked by the powerful editor) and moving into bioinformatics instead, where rivalries were decided more on the quality of one’s solutions than on publication blocking and theft.

This case is different, though, because the plagiarist is not the editor of the journal, and so the editors may have some leverage to apply to the authors, in order to maintain the credibility of the journal.

The fix I’m looking for is pretty simple one: an apology from Jens Gundlach for not catching and correcting the plagiarism, and adding a citation to my blog to the published article. If they can’t bring themselves to cite me, they could at least cite another source (like Detection of Abrupt Changes: Theory and Application by Michèle Basseville and Igor V. Nikiforov, whom I cited as my inspiration, though Basseville and Nikiforov don’t describe the recursive algorithm I developed).

To complicate matters slightly, I’ve recently submitted a paper to PeerJ based on the same body of work (though including the improved parameterization developed in some of my later blog posts, and including some empirical evidence that the new algorithm works substantially better than filtered-derivative algorithms).  I would not want someone finding Gundlach’s group’s paper and think I had plagiarized from them, rather than them from me.

I ask my readers—how diligently should I pursue this plagiarism case?  Has anyone had any experience with Nature Biotechnology on such matters? Do they care about plagiarism? Or do they make life hell for anyone who brings up the subject?

Update 27 March 2105: I’ve heard from the lead author—see Followup on plagiarism.

2010 November 25

Comments on “The Shadow Scholar”

Filed under: Uncategorized — gasstationwithoutpumps @ 00:09
Tags: , , ,

The Chronicle of Higher Education recently published an article titled “The Shadow Scholar” , purportedly by a writer who makes his living selling papers for students to turn in as their own. This sort of cheating has a long history, but has gotten easier with the greater ease of communication and search that the Internet provides.  It is now very easy for a lazy student (with money) to find someone to do their work for them.  Because they can draw on unethical writers from all over the world, it is unlikely that they will accidentally purchase a document that the professor will recognize as having seen before.  (I suspect that most term paper services are even less ethical than the Shadow Scholar, who claims to have created custom documents for each client, and will resell the same paper repeatedly for any assignment that it comes close to matching.)

In my classes, I’ve never graded a paper that I thought had been purchased.  Most of the papers I read seem to be in the voice of the student submitting them, and usually have direct connection to things done in class, making them hard to fake from a distance.  I do have occasional problems with students plagiarizing from the Web, but even there the problem is more often one of inadequate citation than of deliberate attempt to deceive about authorship.

Many edubloggers have commented on the shadow-scholar article. For example, Mark Guzdial is concerned that

It’s even easier to cheat with code, since there are fewer degrees of freedom.  My guess is that cheating as he describes [it] is even more prevalent in computer science.

Cheating in beginning programming courses is certainly a common problem, but there are some good programs available for detecting submissions that are suspiciously similar to other submissions (from the same or previous years).  I know that the computer science department at our university routinely checks the submissions in the beginning programming classes, and flunks some students for cheating every year.  No matter how many times students are warned both of the high probability of being caught and of the serious consequences of getting caught, there are always idiots who think they are immune.  (They are usually the stupidest, most ego-centric students, so the University is better off catching them early and throwing them out.)

I found Katrin Becker’s comments interesting also:

There are also people I know personally who conveniently look the other way when students (especially their own) produce work that is suspiciously good. One colleague of mine has theorized that fully 1 in 5 faculty members got to where they are now through some form of plagiarism.
1 in 5.
The corporatization of Higher Ed is a significant influence. When everything becomes about money then everything acquires a price tag.

Personally, I doubt that there are that many plagiarists among the faculty, but I do agree that the attempt to make a college education into a commodity has resulted in loss of integrity among students.  Many feel that the large fees they pay for their education entitles them to good grades and a degree, independent of what they actually manage to do.  Many politicians support this view, rewarding universities for having high retention and large percentages of entering students graduating within 4 years.

Some students should not be retained, some should not graduate. Getting a degree from college should not just be a matter of putting in seat time and paying tuition for 4 years.  (Of course, graduating from high school or elementary school should also not just be a matter of putting in seat time, but I fear that battle was lost decades ago.)

%d bloggers like this: