This is crazy but here’s my preprint, so arXiv maybe

August 28, 2012

The last time we saw so much upheaval in scholarly publishing must have been at the dawn of the Internet Age. Picture it, the early 1990s. I know I can. There I am as a young teenager, riding Spaceship Earth in Epcot Center. Future World and all of its glorious General Electric sponsorship was the rocket fuel for my budding technological utopianism.


Meanwhile, back at the Los Alamos National Laboratory, a band of theoretical high-energy physicists traded the printed page for a web page, specifically a preprint server called arXiv. Paul Ginsparg, the creator of arXiv, chronicled the Internet-driven transition in a fascinating interview in Nature last year. Reflecting on the big picture, Ginsparg recognized that scholarly publishing is an evolutionary process:


It is heartening, 20 years later, to see a stable and successful arXiv, running some of the original software and providing services to a community nearly a thousand times larger than expected. But at some point a thorough overhaul will be needed to keep pace with new online trends and opportunities.”


Unfortunately, not all scientists got the memo. In particular, the non-quantitative biologists who wouldn’t be caught dead posting to arXiv, either for fear of being scooped by a competitor, or because posting a preprint can disqualify the manuscript for consideration in some high Impact Factor journals, e.g, Cell, or just because preprints “don’t count for anything” in The Tenure Games.


But thanks to this cool new science broadcast and alert system I use called Twitter, I can see “new online trends and opportunities” in real-time, as they speciate. Two examples come to mind: the new membership-based business model of publishing startup PeerJ; and F1000 Research flipping peer review on its head.


I’m firmly in the “publish, then filter” camp. Actually, it’s safe to say I’m in a splinter group called “self publish, then filter.” That’s why I refer to my lab website as a self publishing platform. I know for a lot of people the term “self publishing” is a sticking point. Let me be clear: a Tumblr doesn’t cut it!


My site is really just a customized WordPress theme implementing smooth API integrations and responsive design principles. In turn, WordPress is really just a content manager for my research and blogging outputs, which I can make citable and URL-decay proof using tools like Direct Object Identifiers, DOIs, on the data-sharing site figshare. To stimulate speciation, I decided to make the base code freely downloadable so that others can adapt my template.


Two serious critiques to science self publishing that I’ve come across on Twitter are 1) long-term storage, and 2) URL decay prevention. However, there are existing solutions to these problems. For example, CLOCKSS, and possibly other dark archives, will help in the crowdsourcing of content preservation over centuries – millennia, I hope. And by implementing the aforementioned DOIs, URLs may come and go, but the associated metadata never die.


And if you’re still that concerned about preserving your content, then maybe you should create a scholarly “will and testament,” spelling out the technologies that will ensure stable, intergenerational transmission to all your scores of future devotees. For now, I’m content to have my content scraped for the real gems that are worth passing down.


There has always been an energetic cost to content preservation. Today, it’s mostly a hidden cost from the perspective of individual academic scientists, though don’t tell that to university librarians, even at Harvard’s library. My argument is it’s only a matter of time before someone, or more likely some collaborative team, successfully innovates a better and cheaper solution, as has been the case repeatedly throughout the history of invention.


George Church is onto something with that DNA repository, maybe.

  • Dave Bridges

    I have a couple problems with self-publishing. One is that it only really works in science if you get robust post-publication review, which as you have pointed out is not sufficiently incentivized. The other concern is self publishing allows for the potential of fraud in the review. People could fake comment how great it is, or the scientist could remove comments which point out valid criticism. If those two points can be addressed then I would be more on board.

    • Benjamin de Bivort

      I more or less agree. If a paper “goes viral” it will receive sufficient scrutiny and be stored in multiple locations, both of which counter fraud. But what about papers receiving less attention, which are the numerical majority of papers. What’s the self-publishing equivalent of all those papers in second-tier specialty journals with no press coverage and <10 citations? Would they get comparable scrutiny if they were self published?

      • Ethan Perlstein

        Too much crap is published as is, so I’m not sure there’s a problem here. Text mining and other forms of content scraping will recover the rare gems in the sea of crappy papers. The point is fewer papers wouldn’t be a bad thing, especially if they’re replaced by smaller publishable units that are released into the open in real-time.

    • Ethan Perlstein

      Robust post publication is doable. F1000 Research is banking their business model on it. I showed that it can be actively solicited. Scientists have ceded too much control over the publishing process to scholarly journals. Habits will change slowly in the aggregate but parts of the community can pull this off now.