It’s that time of year again when the new Impact Factor values are released. This is such a big deal to a lot of folks that it’s pretty hard to avoid hearing about it. We’re not the sort of folks that object to the use of impact factors in general – we are scientists after all and part of being a scientist is quantifying things. However, if we’re going to quantify things it is incumbent upon us to try do it well and there are several things that we need to address if we are going to have faith in our measures of journal quality.
1. Stop using impact factor use Eigenfactor based metrics instead
The impact factor simply determines the number of papers that cite another paper and calculates the average. This might have been a decent approach when the IF was first invented, but it’s a terrible approach now. The problem is that according to network theory, and some important applications thereof (e.g., Google), it is also important to take into account the importance of the papers/journals that are doing the citing. Fortunately we now have metrics that do this properly: the Eigenfactor and associated Article Influence Score. These are even report by ISI right next to the IF.
Here’s a quick way to think about this. You have two papers, one that has been cited 30 times by papers that are never cited, and one that has been cited 30 times by papers that are themselves each cited 30 times. If you think the two papers are equally important, then please continue using the impact factor based metrics. If you think that the second paper is more important then please never mention the words “impact factor” again and start focusing on better approaches for quantifying the influence of nodes in a network.
2. Separate reviews (and maybe methods) from original research
We’ve known pretty much forever that reviews are cited more than original research papers, so it doesn’t make sense to compare review journals to non-review journals. While it’s easy to just say that TREE and Ecology are apples and oranges, the real problem is journals that mix reviews and original research. Since reviews are more highly cited, just changing the mix of these two article types can manipulate the impact factor. Sarah Supp and I have a paper on this is you’re interested in seeing some science and further commentary on the issue. The answer is easy, separate the analyses for review papers. It has also been suggested that methods papers have higher citation rates as well, but as I admit in my back and forth with Bob O’Hara (the relevant part of which is still awaiting moderation as I’m posting) there doesn’t seem to be any actual research on this to back it up.
3. Solve the problem of metrics that are strongly influenced by the number of papers
In the citation analysis of individual scientists there has always been the problem of how to deal with the number of papers. The total number of citations isn’t great since one way to get a large number of citations is to write a lot of not particularly valuable papers. The average number of citations per paper is probably even worse because no one would argue that a scientist who writes a single important paper and then stops publishing is contributing maximally to the progress of science.
In journal level citation analyses these two end points have up until recently been all we had, with ISI choosing to focus on the average number of citations per paper and Eigenfactor the total number of citations . The problem is that these approaches encourage gaming by journals to publish either the most or fewest papers possible. Since the issues with publishing too many papers are obvious I’ll focus on the issue of publishing too few. Assuming that journals have the ability to predict the impact of individual papers , the best way to maximize per article measures like the impact factor is to publish as few papers as possible. Adding additional papers simply dilutes the average citation rate. The problem is that by doing so the journal is choosing to have less influence on the field (by adding more, largely equivalent quality, papers) in favor of having a higher perceived impact. Think about it this way. Is a journal that publishes a total of 100 papers that are cited 5 times each, really more important than a journal that publishes 200 papers, 100 of which are cited 5 times each and 100 that are cited 4 times each? I think that the second journal is more important, and that’s why I’m glad to see that Google Scholar is focusing on the kinds of integrative metrics (like the h-index) that we use to evaluate individual researchers.
The good news is that we do have better metrics, that are available right now. The first thing that we should do is start promoting those instead of the metric that shall not be named. We should also think about improving these metrics further. If they’re worth talking about, they are worth improving. I’d love to see a combination of the network approaches in Eigenfactor with the approaches to solving the number of publications problem taken by Google. Of course, more broadly, we are already in the progress of moving away from journal level metrics and focusing more on the impact of individual papers. I personally prefer this approach and think that it’s good for science, but I’ll leave my thoughts on that for another day.
UPDATE: Point 3 relates to two great pieces in Ideas in Ecology and Evolution, one by Lonnie Aarssen and one by David Wardle.
UPDATE 2: Fixed the broken link to the “Why Eigenfactor?” page.
 Both sets of metrics include both approaches with total citations from ISI and Article Influence Score, which is the per paper equivalent of the Eigen Factor, it’s just that they don’t seem to get as much… um… attention.
 And if they didn’t then all we’re measuring is how well different journals game the system plus some positive feedback where journals that are known to be highly cited garner more readers and therefore more future citations.
I just wanted to say that the Canadian funding system rocks.
You didn’t expect THAT, did you?! 😉
In seriousness, these are all reasonable suggestions. But I think there are some interesting deeper issues here, on which I’m planning to post at some point. As a hint at my thinking, let me ask a question: do you think of these metrics as estimating some underlying unobservable property of either the author or the article? Or do you think of these metrics as simply describing and summarizing citation patterns?
Alright, you got me (if Jeremy’s first comment doesn’t make any sense to you see these comments).
To your main question – No, none of these measures make much sense as measures of individuals or individual articles. Besides the general problems with using a journal level average to evaluate individual items, the distributions within journals are so skewed as to make this pretty much ineffective (someone just published a paper on this that I can’t see to find at the moment). This is pretty well trodden ground. I described the examples at the article level because they tend to make more sense to folks than talking about means/totals for journals, and in most cases given the averaging it’s basically the same thing (for the journal).
I do think that these metrics try to measure the importance/impact of the journal. Look at any of the untold number of blog posts and tweets being put up right now by the journals and you’ll see what I mean. So, yeah, maybe you could argue that it’s just a metric and therefore you can’t object to it per se(though I would note that it is called the Impact Factor), but I’m objecting to it’s use, and to be honest I can’t see a justification for it’s existence outside of this use so I’d definitely need to be convinced.
Pingback: Impact factors and where to publish | Sociobiology
So just to clarify, do you see metrics as measuring, or potentially measuring, the underlying importance/impact/quality of journals? Or as merely describing the citation patterns of those journal’s papers? (And if you ask “What’s the difference?”, well, you’ll have to wait for my post…)
The metrics themselves, by definition, measure specific citation patterns of journal papers (in the aggregate). The metrics are used by people to attempt to characterize the underlying importance/impact/quality of journals. I object to this use of the IF, not the abstract concept of its existence.
It’s called Goodhart’s law. Look it up. http://en.wikipedia.org/wiki/Goodhart%27s_law Basically, no matter what metric you use, people are going to game it. If you want to see how eigenfactor can be gamed, look no further than the blackhat SEOs who’ve been gaming Google’s pagerank forever.
@anon – My goal in this post wasn’t so much aimed at avoiding gaming as with coming up with a decent metric in the first place, but I definitely agree that any metric can (and eventually will) be gamed. ISI does work to prevent this to some degree by punishing journals that intentionally game the system, much the same way that Google does.
Thanks for making me take a look at the eigenfactor versions, which I largely ignored when they came out (your point #1). I would have to agree that they match my intuitive sense of the quality of journals much better than the impact factors.
Your point about the extreme skew is an extrmely important one (recall the modal number of citations to a paper in most any journal or most any individual’s CV is 0). In such a world a single paper can drive a whole journal ranking (indeed this happened a couple of years ago with the Bulletin of the American Museum of Natural History which was misleadingly ranked in the top 5 in Ecology). And as you say it completely invalidates any idea that the journal ranking reflects on the ranking of a paper in that journal.
An interesting thought … I bet there’s more information in the ranking of a paper within its journal than there is in the ranking of the journal the paper is in. This is something journals could do more to promote as an alternative to IF (although at the cost of making a majority of the papers in their journal look bad).
Getting at Jeremy’s point, as we all know the real solution is for hiring and tenure committees to actually take the time to read the candidates work thoughtfully and form an opinion. But we also know that will never happen – we’re all to busy trying to publish in high impact journals!
I’m glad it was useful. I definitely agree that there’s some very interesting information in the rank of a paper within a journal, but I’ve never seen anyone actually dig into it.
I have to admit that I’m not 100% sold on the (very common) idea that “the real solution is for hiring and tenure committees to actually take the time to read the candidates work thoughtfully and form an opinion.” My skepticism is based on the fact that there often isn’t anyone qualified to really evaluate a particular researcher’s work at a particular institution (this is certainly true for me at USU), and even if there were, it seems desirable to at least include more of a “whole community” measure of opinion rather than focus on the perspective of a small handful of folks. The value of quantitative measures is that they avoid personal biases with respect to both general research areas and individual researchers. We all know of more than one example of folks not being tenured or hired for all of the wrong reasons, and in at least a couple of cases that I know of these are presented as “problems” with broad reaching research areas, even when the researchers publications were in high profile journals and well received and cited by other folks working in their area. This isn’t to say that there isn’t an important place for careful evaluation of actual papers by hiring and P&T committees (and external reviewers), it’s just that I think there is real value to quantitative measures as well (paper level in the case we’re talking about here).