Publishing in the Era of Open Science, Part 2
This post is an article usage update for my lab’s PLOS ONE paper (Chen et al), which is now 100 days old. A month ago I posted a Round Table discussion of article-level metrics data on the first 30 days post publication. Highlights included punctuated bursts of readership fueled by syndicated press coverage, social media (primarily tweets) or an influential science blogger. However, since the first month, pageviews declined dramatically. Like last time, I’ve graphed total pageviews over time. Recall that total pageviews are comprised of HTML views of the article + PDF downloads of the article:
The first bump around day 65 was caused by my lab website going live. A review of Chen et al by neuroscience blogger The Cellular Scale (@cellularscale) stimulated the second, slightly larger bump around day 70. For ease of viewing, here’s the same data graphed as daily pageview counts (note log scale; blank slots are missing data):
I’ll note several things. First, thousand-pageview days are creatures of early buzz; the most my paper could muster since was 224 pageviews in response to that blog review. Second, my lab website going live seems to have stabilized the gradual slide in pageviews that set in after the first month. I see evidence of this new readership from Google Analytics traffic data for my lab website.
Now I’d like to shift gears and examine closely one article-level metric in particular, namely the ratio of HTML views to PDF downloads, hereafter “HTML/PDF.” Recently, Martin Fenner (@mfenner), the PLOS article-level metrics guru, and I exchanged a few thoughts on what HTML/PDF might mean in terms of scholarly impact.
Let’s start with my paper’s HTML/PDF, which is 15 (6748/446). So for every 15 HTML views, only one reader downloaded the PDF. In an ideal world, you might imagine that HTML/PDF would be close to unity. But that only makes sense if the sequence of events is: 1) land on article page, 2) download article PDF. Turns out that sequence describes academic readers pretty well: using traffic data from PubMed only, the HTML/PDF for my paper is 1.1 (63/57).
But most readers of my paper don’t appear to be academics. Both Fenner and I (and I’m sure others) interpret HTML views as calling cards of casual readers who might exit the article page after perusing the abstract or after scrolling through the text and realizing that the material is way over their head. On the other hand, a reader who clicks through a second time in order to download the PDF is probably an engaged reader, and I would argue more likely to be an academic or “expert.”
Now let’s try to put my HTML/PDF of 15 into context by looking at a larger sample of PLOS papers, controlling for topic area, in this case cancer genetics.
(Image provided by M. Fenner)
The correlation between HTML views and PDF downloads is stable over a 10-fold range. But the HTML/PDF is much closer to one than my paper, indicating many fewer casual readers of cancer genetics papers. Okay, that’s nice. But what about an apples-to-apples comparison? Here’s a plot comprised of PLOS papers in my specific topic area:
(Image provided by M. Fenner)
Interestingly, the HTML/PDF is higher for this group of papers than the cancer genetics papers. Therefore, pharmacology papers may be more likely to pique the interest of non-experts. But is there any way to test to that hypothesis? Take a look at the off-diagonal outliers. The one furthest to the right is my paper. The size of the circles is an indication of how much interest was generated on Facebook, the Mecca of casual readers.
The next stage in the post publication life cycle of my paper is the (long?) wait for the first citation…