Quantified Self Publishing

August 19, 2012

If I were going by my Twitter feed alone, I’d say that Open Science is about to level up.

 

Last week several key opinion leaders in the field of genetics, including a collaborator at Princeton, Leonid Kruglyak (@leonidkruglyak), made a splash after posting papers – well, manuscripts technically – on arXiv, the preprint server of choice for physicists and mathematicians for several decades running.

 

Before that, an announcement last month of the partnership between Faculty of 1000 and the data-sharing site figshare heralded the beginning of an era of disruption in scholarly publishing (despite Kent Anderson’s implacable dyspepsia). Before that still, and in the realm of actual policy making, the Finch Report commissioned by the UK government gave its blessing to open access to all publications whose research is supported by public funding.

 

And I would be remiss if I forgot to mention the late Spring social-media-roots campaign spearheaded by John Wilbanks (@wilbanks), which netted in less than two weeks over 25,000 signatures for a White House e-petition calling for the policy just embraced by the UK. (Though months later it’s not clear the that White House has done anything about it).

 

But in the trenches, I have my concerns about the sustainability of Open Science outside of the Twitter echo chamber. As I learned the hard way from the Occupy movement, adrenaline only takes you so far.

 

Of course I wouldn’t be writing this post if I thought the future will be all gloom and doom. It is my belief that Open Science will jump to a higher orbital once more scientists are convinced that communicating their science with other scientists online, ideally in blog form, is not only worth their time but also an outreach mitzvah. In a tip of the hat to the “quantified self” movement in medicine, I present you with quantified self publishing, courtesy of Google Analytics: ‘cause nothing appeals to the academic’s self interest more than numerical approbation in the form of lab website traffic data.

 

It obviously helps if you have a shiny new self publishing platform to play with, as I do with Perlsteinlab.com. My site formally launched on June 28th, and I took a snapshot of site traffic data after one week in the wild. Now that more than a month has elapsed, I wanted to see if I’d actually amassed an audience, however small; and whether my promoting the site almost exclusively via tweeting is effective. What follows are Google Analytics data that attest to both.

 

First, here’s a nice summary of the basic suite of Google Analytics metrics from the time interval spanning 6/28 – 8/11:

Doesn’t seem too shabby, though the low pages/visits suggests that on average most of my readers are landing on the home page and then clicking through to only one piece of content (or vice versa). Remember, I’m luring visitors to my site principally by tweeting (referral) and emailing (direct), as evidenced by this breakdown of total site traffic:

 

So how did all those visits shake out day by day? Here’s a plot of unique visits per hour over the ~50-day observation period:

 

 

As expected, there was a flurry of immediate post-launch activity, followed by a long, essentially flat stretch punctuated by blips here and there, and then an upswell of traffic over several days toward the end of the observation window. To make a long story short, the blips were caused by retweet events. And the massive spike, which dwarfed even my post-launch buzz, was triggered by a tastemaker retweet, i.e., a retweet from someone with a massive number of followers.

 

First the garden variety retweets. On July 12th, I tweeted out my statement of teaching philosophy, which is part of my application package for junior faculty searches. Two people, one a follower of mine and another a follower of this follower, each retweeted my link, as shown here:

 

This plot is representative of most days on my site, in that I received approximately 100 unique visits per day, and roughly 10 visitors per hour. The tweet-driven spikes are transient – never lasting more than a few hours and usually just one hour.

 

However, things look a lot different when a retweet comes from a massively followed tastemaker. Case in point is Ben Goldacre, a British science writer who has over 200,000 followers. He tweeted the following:

 

This one innocuous tweet brought tons of eyeballs to my site, as evidenced here:

 

Instead of the doubling of traffic that usually resulted from a garden variety retweet, Goldacre’s mention caused a logarithmic jump. The U-shape corresponds to the overnight hours on the East Coast. (Remember, Goldacre is on Greenwich Mean Time). What’s more, the second peak has a “long tail,” as traffic data go.

 

There’s a lot more where this came from, and I’ll be posting future updates as new and interesting patterns emerge. In the meantime, I would love to get feedback from others who’ve analyzed traffic data for their science blogs, or from academics who are contemplating starting their own self publishing journey.

  • ben de bivort

    I think self-publishing will probably end up winning out, or at least the mathematics model, where posting to arxiv is really what matters, but most papers nevertheless find a journal home eventually. But, for early adopters, your data strongly suggests that # of views will be highly stochastic. That is, unless you already happen to have a massive twitter following.

    • ethanperlstein

      Well, not entirely stochastic. I can do things like get myself followed by people with massive followings, which is admittedly non-trivial. I think my follower with the largest following has 20,000 – 30,000 followers, which is an order of magnitude lower than where I need to be to experience regular 10-fold jumps in site traffic, and presumably a higher steady-state audience size. Obviously, it’s really hard to get followed by a tastemaker, since they’re by definition selective about whom they follow. Slow and steady…

  • http://twitter.com/rubenrellan Ruben Rellan

    It will be interesting to see how many visitors of your site actually went to your publications, download them and read them. Your site and the questions you deal with here is much more than your own research. I think altmetrics can bring in some interesting ways to look at “science impact” but I think we should not be trading citations to your article for “retweets” or “likes”. The first one requires (hopefully) that somebody read your paper and found it was interesting for her own research. We should not loose perspective.

    • ethanperlstein

      Nowhere did I advocate tossing citations overboard! Citations are fine, though I think they are gamed in some sense because of declining or simply lazy referencing habits, and what appears to be the tendency to front-load glamor references in the introduction, as opposed to the results or discussion, of papers. However, I do have a beef with Impact Factor and its distorting influence.

      Google Analytics open the doors to a broader measure of scientific impact; as you point out, one that begins to quantify the role of science outreach, which is ignored by the current evaluation systems.

      • http://twitter.com/rubenrellan Ruben Rellan

        I agree on your first paragraph, citations have their own pitfalls, mostly because the bad habits you point out. Still a much better way to define the “impact” of a researcher than journal IF.

        Google Analytics is definitely a great way to measure science outreach. In fact one of the nicest things you do in your webpage is trying to explain to the general public what is your research about. Other kind of science outreach is also: going to neighbor meetings, high school classes, farmer unions, patient associations etc, to explain why what you are doing is important for the rest of the people and this might no be cover by G Anal.
        Maybe the problem is that science outreach in general is something that has received much less attention (points) when scientists are evaluated, perhaps because is difficult to measure or because is less “attractive” for the employers.

  • mel_4_7

    I’m also skeptic about the real impact of website. I’m running our lab website since fall 2008, on a subject not really known in scientific community and it’s even worse in general public. No blog, no twitter. So nothing fancy. As you can see in this diagram, I have around 20 visits/day with some peaks, corresponding to a) during or after a conference/meeting or b) period of advertising for student’s recruitment. And summers are not a good period! ;-)
    Contrairy to you, most of our visites (near 75%) is coming from web searches, the rest is coming from direct links. Even after 4 years of stat, I’m still wondering who and mainly, why they visit the website. Do they find what they’re looking for? We have some anecdotals comments from collegues who told us they saw our website, but except that, I don’t really know what is the real impact. For sure, I will continue to maintain it and, anyway, our presence on web is more than 95% of our university’s collegues… Maybe I’m still too much “old-style” and needs some dialogs, with either blogging and Twitter.
    But, I don’t want to offense you, but as it was already pointed, your trafic coming from blogposting, and still don’t know what is your scientific domain or even you are from which country. “Science” and “Social” impact is probably two different things, that we will have to measure. So maybe the altmetrics is the way of the future and I should work to give visibility to our website before everyone show up on Twitter… ;-)

    • mel_4_7

      sorry links are not working correctly and cut some text. There is again.
      I’m also skeptic about the real impact of website. I’m running our lab website(http://hepato-neuro.ca) since Fall 2008, on a subject not “trendy” in scientific community and it’s even worse in general public. No blogging, no Twitter, nothing fancy. As you can see, it this diagram (http://ubuntuone.com/3qVTd9eKjVsHEL9IzVUCmV) I have around 20 visits/day with some peaks, corresponding to a) during or after a conference/meeting or b) period of advertising for student’s recruitment. And summers are not a good period! ;-)
      Contrairy to you, most of our visites (near 75%) is coming from web searches, the rest is coming from direct links. Even after 4 years of stat, I’m still wondering who and mainly, why they visit the website. Do they find what they’re looking for? We have some anecdotals comments from collegues who told us they saw our website, but except that, I don’t really know what is the real impact. For sure, I will continue to maintain it and, anyway, our presence on web is more than 95% of our university’s collegues… Maybe I’m still too much “old-style” and needs some dialogs, with either blogging and Twitter.
      But, I don’t want to offense you, but as it was already pointed, your trafic coming from blogposting, and still don’t know what is your scientific domain or even you are from which country. “Science” and “Social” impact is probably two different things, that we will have to measure. So maybe the altmetrics is the way of the future and I should work to give visibility to our website before everyone show up on Twitter… ;-)

      • ethanperlstein

        Interesting perspective. Thanks for the comment!