FemaleScienceProfessor recently posted asking about the differences between Web of Science (now grandiosely called Web of Knowledge) and Google Scholar, the two main competitors for finding scientific citations. (PUBMED’s “related citations” can be useful for finding similar papers, but does not follow the citation chain forward, the way that Web of Science and Google Scholar do).
For me the differences in what the two sources choose to index and the care they take with the indexing makes a big difference, both in raw citations counts and in number of papers indexed.
According to my CV, I’ve published about 85 papers (42 in journals, 21 in refereed conferences, around 19 or 20 unrefereed tech reports, a book chapter, and a couple of patents).
Google Scholar has “about 205” entries when searching with my name. Most of these can be mapped to one of my actual papers, but some are vague enough citations that I can understand why Google did not try to merge them. Google Scholar did manage to find one conference paper that I’d forgotten about (it never made it to my CV, and I’m not even sure I have a copy any more). It can’t have been a very important paper, though, as no one ever cited it except me. Some of the low-count citations were simply bogus—students who were too lazy to look up one of my papers and so made up a plausible journal and page numbers (one that I checked was indeed to my work, but they got the wrong title and the wrong journal, with volume numbers that weren’t remotely plausible for the journal and year they claimed). A handful of the papers are for a different author with the same last name, and whose middle initial is my first initial. If I specifically remove his papers (using -author: in the search field), I get down to 191 “papers” cited. The h-index that Google Scholar computes is 31—that is, I have 31 papers that have been cited 31 or more times, but I don’t have 32 that have been cited 32 or more times. Four of these 31 are conference papers, the others are in conventional journals.
Web of Science has only 41 papers for me (none of which are bogus) and includes 4 for which they have found no citations. Google scholar has all 4, and even found citations for one of them. The number of citations is much smaller (my most cited paper has only 596 citations in Web of Science, but 859 in Google Scholar. Web of Science computes my h-index as only 21, mainly due to the lower citation counts it sees (though not indexing the 4 highly cited conference papers doesn’t help). The “distinct author” set for my name has only 26 papers for me. They claim that I can fix this through ResearcherID, but I’ve had 55 papers there for months, and they haven’t fixed it yet.
Interestingly, one my best known pieces of work is cited only 251 times according to Google Scholar (plus another 17 references to the associated patent), and 64 times according to Web of Science, but gets about 168,000 hits with Google. Obviously, I’ve not checked all those hits to see what fraction are bogus, but I suspect that well over half of them are real references to the algorithm in the paper. It is in a field that is not much given to academic citation.
Since I am now in bioinformatics, a lot of my stuff should be indexed in PubMed, but they only found 19 papers for me using my full name, and 34 using my last name and first initial, all of which were indeed mine. One conference paper that was not published in a journal was included, but only the bioinformatics papers appear.
Citeseer (which used to be the darling of computer scientists, because it included computer science conferences) does a much poorer job of finding my papers. Even trying 4 different variants of my name, it gets “350 citations” and the most cited paper has only 31 citations.
Bottom-line: Web of Science has the cleanest database, but is missing big chunks of scientific literature. Google Scholar is the most complete, but has not merged slightly erroneous citations, and is cluttered with bogus information.