Publishing in the Era of Open Science, Part 2

July 30, 2012

This post is an article usage update for my lab’s PLOS ONE paper (Chen et al), which is now 100 days old. A month ago I posted a Round Table discussion of article-level metrics data on the first 30 days post publication. Highlights included punctuated bursts of readership fueled by syndicated press coverage, social media (primarily tweets) or an influential science blogger. However, since the first month, pageviews declined dramatically. Like last time, I’ve graphed total pageviews over time. Recall that total pageviews are comprised of HTML views of the article + PDF downloads of the article:

The first bump around day 65 was caused by my lab website going live. A review of Chen et al by neuroscience blogger The Cellular Scale (@cellularscale) stimulated the second, slightly larger bump around day 70. For ease of viewing, here’s the same data graphed as daily pageview counts (note log scale; blank slots are missing data):

I’ll note several things. First, thousand-pageview days are creatures of early buzz; the most my paper could muster since was 224 pageviews in response to that blog review. Second, my lab website going live seems to have stabilized the gradual slide in pageviews that set in after the first month. I see evidence of this new readership from Google Analytics traffic data for my lab website.


Now I’d like to shift gears and examine closely one article-level metric in particular, namely the ratio of HTML views to PDF downloads, hereafter “HTML/PDF.” Recently, Martin Fenner (@mfenner), the PLOS article-level metrics guru, and I exchanged a few thoughts on what HTML/PDF might mean in terms of scholarly impact.


Let’s start with my paper’s HTML/PDF, which is 15 (6748/446). So for every 15 HTML views, only one reader downloaded the PDF. In an ideal world, you might imagine that HTML/PDF would be close to unity. But that only makes sense if the sequence of events is: 1) land on article page, 2) download article PDF. Turns out that sequence describes academic readers pretty well: using traffic data from PubMed only, the HTML/PDF for my paper is 1.1 (63/57).


But most readers of my paper don’t appear to be academics. Both Fenner and I (and I’m sure others) interpret HTML views as calling cards of casual readers who might exit the article page after perusing the abstract or after scrolling through the text and realizing that the material is way over their head. On the other hand, a reader who clicks through a second time in order to download the PDF is probably an engaged reader, and I would argue more likely to be an academic or “expert.”


Now let’s try to put my HTML/PDF of 15 into context by looking at a larger sample of PLOS papers, controlling for topic area, in this case cancer genetics.

(Image provided by M. Fenner)


The correlation between HTML views and PDF downloads is stable over a 10-fold range. But the HTML/PDF is much closer to one than my paper, indicating many fewer casual readers of cancer genetics papers. Okay, that’s nice. But what about an apples-to-apples comparison? Here’s a plot comprised of PLOS papers in my specific topic area:

(Image provided by M. Fenner)


Interestingly, the HTML/PDF is higher for this group of papers than the cancer genetics papers. Therefore, pharmacology papers may be more likely to pique the interest of non-experts. But is there any way to test to that hypothesis? Take a look at the off-diagonal outliers. The one furthest to the right is my paper. The size of the circles is an indication of how much interest was generated on Facebook, the Mecca of casual readers.


The next stage in the post publication life cycle of my paper is the (long?) wait for the first citation…



Related Posts

  • RobertDavidSTEELEVivas

    I enjoyed this, and it is being included in tonight’s round up of Open Source Everything Highlights ( Twitter hash #openall). I’ve always been a fan of citation analysis, and what the above suggests to me is that attention span will soon be plottable 24/7 across all topics, and perhaps in relation to know gender, age, ethnic, etc.

    • Ethan Perlstein

      Thanks for the pingback, and I agree that usage data will only grow more ubiquitous. And when social and interaction network graphs are overlaid on top, it’ll be fun times.

  • Andrew Miller

    Very interesting to see your analyses Ethan. Thanks for sharing them. I’m not sure if PDF downloads vs HTML correlates with a more specialised readership as both forms impart the same information, just different format. Perhaps they are more indicative of rate of ‘consumption’ of the end user, HTML loading more quickly?

    • Ethan Perlstein

      I agree that there’s still a lot of reading the tea leaves here. I do treat the HTML version and the PDF version as interchangeable in some instances, but overall l tend to read papers most closely in PDF form out of habit. I don’t have the stats but I feel that describes many of my peers in academia. I’m sure the PDF bias will lessen with time as journal article HTML interfaces becomes more app-like.

      I would love to hear more from academics on their paper-reading ritual.

      • Tomi

        Interesting stuff. Speaking as an engaged non-academic (I’m in publishing) it’s probably fair to say that I’d only download the PDF of a paper if I was going to read it closely, but if I’m reading the HTML version I could be skim-reading or reading in full. It depends partly on how good the platform is, if the HTML is awkward, without naming any publishers in particular, I guess that’ll probably drive more serious readers to the PDF?

        • Ethan Perlstein

          Thanks! I agree that the HTML reading experience on most journal websites is less than optimal — either way too cluttered or way too antiquated.

      • Becky Freeman

        I only download PDF’s of articles I intend to cite, so that I can store the
        pdf in an Endnote library for easy access. Otherwise, I’ll just read the html version.

  • tahrey

    Wait, is the PDF available for public download, or do I have to have a PubMed / Athens / etc subscription? Even if it’s free… should I have seen the link and wanted to read more, I’m so used now to the full text of scientific articles that have online abstracts being a closed door to anyone who doesn’t have the necessary account details that I wouldn’t even have bothered clicking through to attempt it… regardless of whether the click would have succeeded. The closed-access model has skinner-boxed me into submission.

    I wonder how many other ex-uni-students who are now firmly in the non academic world but maintaining a passing interest – much like myself – may be skewing your figures in this way?

    • tahrey

      Also, on a more personal note, I deeply hate it when web resources only offer a PDF as the way of accessing certain data, rather than at least having an HTML-formatted version available. In 90+ percent of cases, the PDF doesn’t do anything an HTML one could have done, other than maybe being slightly more prettified. Excusing your presence, it often seems a lazy cop-out way of just dumping a file already prepared for offline consumption (often, actually optimised for printing, so it’s not even that suitable for reading on-screen) onto the end user instead of clicking “save for web” as well as “save to PDF” in whatever program was used to prepare it.

      Or maybe I’m just sour at the moment because the amount of hokey travel companies that have done just that. The amount of price list/etc PDFs I’ve had to wait for download of, and are now cluttering up my downloads folder, is just crazy. Particularly when, if I printed them, the output wouldn’t look materially different from the same information presented in an HTML table (oh, go on, you can use CSS instead if you like) that would have loaded just as quickly as the page on which the PDF link was provided, without any need for a further click, waiting for what is sometimes a very large file (I’ve seen tariff lists reach almost 7MB because of all the pretty backgrounds etc, when the actual data could have fit into 1/1000th the space), and either a browser plugin (that’s perpetually “out of date – do you want to update, or use it this once?”) or an external reader to load (ever so slowly in the latter case) just so the thing will even render.


      Yeah, I know it’s probably demanded by the publishing body because of how they have their site set up, etc, and there’s nothing you can personally do. But even if there wasn’t the previously stated conditioning that “the full text is a local file for local people! there’s nothing for YOU here…”, the additional, albeit minor complication, holdup and data load of having to bugger about loading this non-browser-native thing instead of just reading the same text as a nicely screen-formatted piece of html would also have put me off.

    • tahrey

      (Said overly graphical PDFs are also less accessible for many other reasons, too – oftimes less compatible with screen readers, Chrome won’t auto-translate those in other languages, more difficult to copy the text out to a manually-operated translation page, won’t auto-adjust the font pitch or word wrapping if you’re on a device of more limited resolution or larger text rendering size than the author originally intended, etc…)