From “Show Me the Data” by Rossner, Van Epps, and Hill in The Journal of Cell Biology:
It became clear that Thomson Scientific could not or (for some as yet unexplained reason) would not sell us the data used to calculate their published impact factor. If an author is unable to produce original data to verify a figure in one of our papers, we revoke the acceptance of the paper. We hope this account will convince some scientists and funding organizations to revoke their acceptance of impact factors as an accurate representation of the quality—or impact—of a paper published in a given journal.
Just as scientists would not accept the findings in a scientific paper without seeing the primary data, so should they not rely on Thomson Scientific’s impact factor, which is based on hidden data. As more publication and citation data become available to the public through services like PubMed, PubMed Central, and Google Scholar®, we hope that people will begin to develop their own metrics for assessing scientific quality rather than rely on an ill-defined and manifestly unscientific number.
What’s the impact?
The RePEc blog chimes in:
Besides the points reiterated and brought up in the Journal of Cell Biology, there are further accuracy issues with Thomson data. For example, to identify authors, they only use initials for the their first and middle name. As they pool papers from all fields, this is a more severe error than one might first guess. Thomson reports that Kit Baum (known to Thomson as CF Baum) has publications in the Fordham Law Review (on nuclear waste) and the Sociology of Education (on group leadership).
A further issue is Thomson’s coverage; EconLit lists some 1,240 journals in our field while the last time I checked Thomson covered but a fraction of these. I don’t have recent data for their coverage, but in total Thomson covers 8,700 journals encompassing all academic fields, so it seems doubtful that Thomas has substantially changed its economics coverage.
A further problem plaguing all citation analysis is simply extracting citation data with software. After all, citations are written for people, not machines. I haven’t seen data for Thomson on this (one wonders if it is public), but I do know that CitEc has faced a very real challenge here.
It sounds like a groundswell of opposition, but we know that committees already and will continue to base “tenure, promotion, and raises” on these data, in part because the alternative (other than data from Google Scholar or specialized sources like RePEc) is an even noisier estimate based on community perceptions.
Hat tip to Peter Klein over at Organizations & Markets.