Publishing in the Era of Open Science

June 18, 2012

This post is the first installment in a series dedicated to Open Science (#openscience). It’s the story of my journey into the brave new world of online science communication and research article-level metrics. It is also a call to action to those in the traditional publishing ecosystem who are skeptical about the promise of Open Science – I’m looking at you, Kent Anderson! I hope to convince them that if the false idol of Impact Factor is ever going to be superseded by a more enlightened meritocracy, it begins with scientists responsibly and quantitatively promoting their work within the online scientific community.

My journey began on Wednesday April 18th 2012, when Chen et al., my lab’s paper on the bioaccumulation of Zoloft in yeast cells first appeared in the online Open Access journal PLoS ONE. I will present data on the first month of paper pageviews, and then discuss inferences regarding readership and reach.

The first plot (right) shows the growth rate of total pageviews, defined as HTML views + PDF downloads + XML downloads. Two obvious features jump out at me. First, site traffic came in spurts. My paper averaged 175 pageviews per day over the observation period, but this number was inflated by three amplifications; a better representation of site traffic is the median, 72 pageviews/day. The other neat result was that the amplifications were not made equal .

To explore those differences more closely, I re-plotted the data as daily pageviews (left), and color-coded Saturdays and Sundays in red to highlight the fact that site traffic almost always slowed to a trickle on weekends. An initial surge occurred in the first 48 hours, petering out by the first weekend. Unexpectedly, a second, equally sized surge began on Sunday May 6th (Day 18), but it exhibited a slower decay than the first surge: 3 days vs. 2 days. A third, smaller surge began on Thursday May 17th (Day 29) and faded by the fifth weekend.

The first and third surges, although differing in amplitude, exhibited comparable decay kinetics. “Newness” and press release syndication drove the first surge as adduced by an initial EurekAlert, and several popular science news aggregators that propagated it, e.g., PsychCentral. The third surge was caused by a review of my paper by Derek Lowe on his blog “In the Pipeline,” which is appointment reading for a diverse group of scientists from academia and the pharmaceutical industry. To wit: several colleagues independently congratulated me by email within 24 hrs of the blog post, and attested to Lowe’s wide sphere of influence.

The slower decaying second surge is more complex. I think that multiple independent and potentially self-reinforcing catalysts sustained site traffic buzz for an extra 24 hours. The spark appears to have been my tweeting a link to Chen et al at the hashtag #APAAM12 on the morning of May 6th. Attendees and online followers of the 2012 annual meeting of the American Psychiatric Association in Philadelphia would have seen that tweet. A Phoenix-based mental health advocate and blogger in fact did, and she retweeted my tweet, #APAAM12 hashtag and all. She also alerted me to a blurb about my paper on the psychiatry news aggregator “Mad in America,” and she claimed that it was the source of the buzz. But the timing of that blurb preceded the second surge by a few days, which led me to investigate, i.e., to Google search, other sources of amplification.

Turns out there were several good candidates. John Timmer, whom I know from Science Online NYC (@S_O_NYC on Twitter), writes for Ars Technica, and he penned another blurb about my paper but it also appeared days before the second surge. Still not entirely satisfied by the lack of direct cause and effect, I searched more and uncovered yet another blurb about my paper, this time by a Phoenix-based lawyer. His firm was apparently using Chen et al (without my knowledge, btw) to help solicit plaintiffs to a class action lawsuit claiming that Zoloft caused birth defects, which is apparently sweeping the nation. All in all, it appeared to be a perfect storm.

Some of you may be miffed by my apparent conflation of site traffic and readership. Without more sophisticated analytics, I confess that it’s difficult to gauge the background of readers (scientists vs. non-scientists), or how much of the paper they’re actually reading (abstract vs. full text). However, the ratio of HTML views to PDF downloads (HTML/PDF) may be informative here. After the initial surge, HTML/PDF was 1 in 20 and remained there until the second surge, after which it fell to 1 in 30. If we assume that PDF downloads are a proxy for “expert” readership, then the second surge diluted quality readership. Conversely, the third surge lifted the ratio to 1 in 15, with as many as 20% of readers, many of whom were presumably academics, on Day 29 choosing to download a PDF version of the paper.

In the next post in the Open Science series, I will present results of a sociological experiment in post publication review, and discuss my recipe for vibrant online commentary.