Gas station without pumps

2010 June 30

Google Scholar vs. Web of Science

Filed under: Uncategorized — gasstationwithoutpumps @ 05:00
Tags: , , ,

FemaleScienceProfessor recently posted asking about the differences between Web of Science (now grandiosely called Web of Knowledge) and Google Scholar, the two main competitors for finding scientific citations.  (PUBMED’s “related citations” can be useful for finding similar papers, but does not follow the citation chain forward, the way that Web of Science and Google Scholar do).

For me the differences in what the two sources choose to index and the care they take with the indexing makes a big difference, both in raw citations counts and in number of papers indexed.

According to my CV, I’ve published about 85 papers (42 in journals, 21 in refereed conferences, around 19 or 20 unrefereed tech reports, a book chapter, and a couple of patents).

Google Scholar has “about 205″ entries when searching with my name.  Most of these can be mapped to one of my actual papers, but some are vague enough citations that I can understand why Google did not try to merge them.  Google Scholar did manage to find one conference paper that I’d forgotten about (it never made it to my CV, and I’m not even sure I have a copy any more).  It can’t have been a very important paper, though, as no one ever cited it except me.  Some of the low-count citations were simply bogus—students who were too lazy to look up one of my papers and so made up a plausible journal and page numbers (one that I checked was indeed to my work, but they got the wrong title and the wrong journal, with volume numbers that weren’t remotely plausible for the journal and year they claimed).  A handful of the  papers are for a different author with the same last name, and whose middle initial is my first initial.  If I specifically remove his papers (using -author: in the search field), I get down to 191 “papers” cited. The h-index that Google Scholar computes is 31—that is, I have 31 papers that have been cited 31 or more times, but I don’t have 32 that have been cited 32 or more times.  Four of these 31 are conference papers, the others are in conventional journals.

Web of Science has only 41 papers for me (none of which are bogus) and includes 4 for which they have found no citations.  Google scholar has all 4, and even found citations for one of them.  The number of citations is much smaller (my most cited paper has only 596 citations in Web of Science, but 859 in Google Scholar. Web of Science computes my h-index as only 21, mainly due to the lower citation counts it sees (though not indexing the 4 highly cited conference papers doesn’t help). The “distinct author” set for my name has only 26 papers for me.  They claim that I can fix this through ResearcherID, but I’ve had 55 papers there for months, and they haven’t fixed it yet.

Interestingly, one my best known pieces of work is cited only 251 times according to Google Scholar (plus another 17 references to the associated patent), and 64 times according to Web of Science, but gets about 168,000 hits with Google.  Obviously, I’ve not checked all those hits to see what fraction are bogus, but I suspect that well over half of them are real references to the algorithm in the paper.  It is in a field that is not much given to academic citation.

Since I am now in bioinformatics, a lot of my stuff should be indexed in PubMed, but they only found 19 papers for me using my full name, and 34 using my last name and first initial, all of which were indeed mine.  One conference paper that was not published in a journal was included, but only the bioinformatics papers appear.

Citeseer (which used to be the darling of computer scientists, because it included computer science conferences) does a much poorer job of finding my papers. Even trying 4 different variants of my name, it gets “350 citations” and the most cited paper has only 31 citations.

Bottom-line:  Web of Science has the cleanest database, but is missing big chunks of scientific literature.  Google Scholar is the most complete, but has not merged slightly erroneous citations, and is cluttered with bogus information.

About these ads

4 Comments »

  1. [...] Uncategorized — gasstationwithoutpumps @ 12:18 Tags: bibliographic databases Yesterday, I compared Google Scholar to Web of Science for getting citations to my publications. Today I’ll compare to Scopus and Scifinder, [...]

    Pingback by Google Scholar vs. Scopus and SciFinder « Gas station without pumps — 2010 July 1 @ 12:25 | Reply

  2. [...] is interesting to look for an equivalent of h-index (which I blogged about in Google Scholar vs. Web of Science and Google Scholar vs. Scopus and SciFinder) for measuring blogs. I don’t think that using [...]

    Pingback by NaBloPoMo is over « Gas station without pumps — 2010 December 1 @ 00:18 | Reply

  3. RT@MatAbraz: New useful search engine that returns full PDF scientific articles not subject to access fees http://www.freefullpdf.com

    FreeFullPDF vs Google Scholar
    1) Find free scientific article (or a free version version) not indexed in Google Scholar.
    Example: “Pattern Recognition in Medical Image Diagnosis”
    2) Scientific Synonyms
    Users’ search queries are automatically expanded with synonyms/acronyms.
    Example: Search query “DNA” has the following equivalent alternatives: “deoxyribonucleic acid”, “desoxyribonucleic acid ” or “ADN”.
    3) Refine the search results
    You can refine your search results by article, patent, thesis or poster.
    In conclusion, FreeFullPDF will not replace Google Scholar, but it appears to be a good complementary tool for scientific research.

    Comment by BSK — 2011 June 14 @ 08:52 | Reply

  4. [...] blogged before about Google Scholar as a citation source (Google vs. Scopus and SciFinder, Google Scholar vs. Web of Science).  One of my complaints was about some of the sloppy citation records that got mixed in to the [...]

    Pingback by Google Scholar Citations « Gas station without pumps — 2011 November 26 @ 00:37 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 266 other followers

%d bloggers like this: