Monthly Archives: July 2011
The last week has been an interesting one for academic publishing. First a 24 year old programmer name Aaron Swartz was arrested for allegedly breaking into MIT’s network and downloading 5 million articles from JSTOR. Given his background it has been surmised that he planned on making the documents publicly available. He faces up to 35 years in federal prison.
In response to the arrest Gregory Maxwell, a “technologist” and hobbyist scientist uploaded nearly 20,000 JSTOR  articles from the Philosophical Transactions of the Royal Society to The Pirate Bay, a bittorrent file sharing site infamous for facilitating the illegal sharing of music and movies. As explanation for the upload Maxwell posted a scathing, and generally trenchant, critique of the current academic publishing system that I am going to reproduce here in it’s entirety so that those uncomfortable with , or blocked from, visiting The Pirate Bay can read it . In it he notes that since all of the articles he posted were published prior to 1923 they are all in the public domain.
This archive contains 18,592 scientific publications totaling 33GiB, all from Philosophical Transactions of the Royal Society and which should be available to everyone at no cost, but most have previously only been made available at high prices through paywall gatekeepers like JSTOR. Limited access to the documents here is typically sold for $19 USD per article, though some of the older ones are available as cheaply as $8. Purchasing access to this collection one article at a time would cost hundreds of thousands of dollars. Also included is the basic factual metadata allowing you to locate works by title, author, or publication date, and a checksum file to allow you to check for corruption. I've had these files for a long time, but I've been afraid that if I published them I would be subject to unjust legal harassment by those who profit from controlling access to these works. I now feel that I've been making the wrong decision. On July 19th 2011, Aaron Swartz was criminally charged by the US Attorney General's office for, effectively, downloading too many academic papers from JSTOR. Academic publishing is an odd system - the authors are not paid for their writing, nor are the peer reviewers (they're just more unpaid academics), and in some fields even the journal editors are unpaid. Sometimes the authors must even pay the publishers. And yet scientific publications are some of the most outrageously expensive pieces of literature you can buy. In the past, the high access fees supported the costly mechanical reproduction of niche paper journals, but online distribution has mostly made this function obsolete. As far as I can tell, the money paid for access today serves little significant purpose except to perpetuate dead business models. The "publish or perish" pressure in academia gives the authors an impossibly weak negotiating position, and the existing system has enormous inertia. Those with the most power to change the system--the long-tenured luminary scholars whose works give legitimacy and prestige to the journals, rather than the other way around--are the least impacted by its failures. They are supported by institutions who invisibly provide access to all of the resources they need. And as the journals depend on them, they may ask for alterations to the standard contract without risking their career on the loss of a publication offer. Many don't even realize the extent to which academic work is inaccessible to the general public, nor do they realize what sort of work is being done outside universities that would benefit by it. Large publishers are now able to purchase the political clout needed to abuse the narrow commercial scope of copyright protection, extending it to completely inapplicable areas: slavish reproductions of historic documents and art, for example, and exploiting the labors of unpaid scientists. They're even able to make the taxpayers pay for their attacks on free society by pursuing criminal prosecution (copyright has classically been a civil matter) and by burdening public institutions with outrageous subscription fees. Copyright is a legal fiction representing a narrow compromise: we give up some of our natural right to exchange information in exchange for creating an economic incentive to author, so that we may all enjoy more works. When publishers abuse the system to prop up their existence, when they misrepresent the extent of copyright coverage, when they use threats of frivolous litigation to suppress the dissemination of publicly owned works, they are stealing from everyone else. Several years ago I came into possession, through rather boring and lawful means, of a large collection of JSTOR documents. These particular documents are the historic back archives of the Philosophical Transactions of the Royal Society - a prestigious scientific journal with a history extending back to the 1600s. The portion of the collection included in this archive, ones published prior to 1923 and therefore obviously in the public domain, total some 18,592 papers and 33 gigabytes of data. The documents are part of the shared heritage of all mankind, and are rightfully in the public domain, but they are not available freely. Instead the articles are available at $19 each--for one month's viewing, by one person, on one computer. It's a steal. From you. When I received these documents I had grand plans of uploading them to Wikipedia's sister site for reference works, Wikisource - where they could be tightly interlinked with Wikipedia, providing interesting historical context to the encyclopedia articles. For example, Uranus was discovered in 1781 by William Herschel; why not take a look at the paper where he originally disclosed his discovery? (Or one of the several follow on publications about its satellites, or the dozens of other papers he authored?) But I soon found the reality of the situation to be less than appealing: publishing the documents freely was likely to bring frivolous litigation from the publishers. As in many other cases, I could expect them to claim that their slavish reproduction - scanning the documents - created a new copyright interest. Or that distributing the documents complete with the trivial watermarks they added constituted unlawful copying of that mark. They might even pursue strawman criminal charges claiming that whoever obtained the files must have violated some kind of anti-hacking laws. In my discreet inquiry, I was unable to find anyone willing to cover the potentially unbounded legal costs I risked, even though the only unlawful action here is the fraudulent misuse of copyright by JSTOR and the Royal Society to withhold access from the public to that which is legally and morally everyone's property. In the meantime, and to great fanfare as part of their 350th anniversary, the RSOL opened up "free" access to their historic archives - but "free" only meant "with many odious terms", and access was limited to about 100 articles. All too often journals, galleries, and museums are becoming not disseminators of knowledge - as their lofty mission statements suggest - but censors of knowledge, because censoring is the one thing they do better than the Internet does. Stewardship and curation are valuable functions, but their value is negative when there is only one steward and one curator, whose judgment reigns supreme as the final word on what everyone else sees and knows. If their recommendations have value they can be heeded without the coercive abuse of copyright to silence competition. The liberal dissemination of knowledge is essential to scientific inquiry. More than in any other area, the application of restrictive copyright is inappropriate for academic works: there is no sticky question of how to pay authors or reviewers, as the publishers are already not paying them. And unlike 'mere' works of entertainment, liberal access to scientific work impacts the well-being of all mankind. Our continued survival may even depend on it. If I can remove even one dollar of ill-gained income from a poisonous industry which acts to suppress scientific and historic understanding, then whatever personal cost I suffer will be justified ΓΓé¼ΓÇ¥it will be one less dollar spent in the war against knowledge. One less dollar spent lobbying for laws that make downloading too many scientific papers a crime. I had considered releasing this collection anonymously, but others pointed out that the obviously overzealous prosecutors of Aaron Swartz would probably accuse him of it and add it to their growing list of ridiculous charges. This didn't sit well with my conscience, and I generally believe that anything worth doing is worth attaching your name to. I'm interested in hearing about any enjoyable discoveries or even useful applications which come of this archive. - ---- Greg Maxwell - July 20th 2011 email@example.com Bitcoin: 14csFEJHk3SYbkBmajyJ3ktpsd2TmwDEBb
These stories have been covered widely and the discussion has been heavy on Twitter and in the blogosphere. The important part of this discussion for academic publishing is that it has brought many of the absurdities of the current academic publishing system into the public eye, and a lot of people are shocked and unhappy . This is all happening at the same time that Britain is finally standing up to the big publishing companies as their profits  and business models increasingly hamper rather than benefit the scientific process, and serious questions are raised about whether we should be publishing in peer-reviewed journals at all. I suspect that we will look back on 2011 as the tipping point year when academic publishing changed forever.
 In an interview with Wired Campus JSTOR claimed that these aren’t technically their articles because even though JSTOR did digitize these files, and each file includes an indication of JSTORs involvement, the files lack JSTOR’s cover page, so it’s not really their files, it’s the Royal Society’s files. Which first made me think “Wow, that’s about the lamest duck and cover excuse I’ve ever heard” and then “Hey, so if I just delete the cover page off a JSTOR file then apparently they surrender all claim to it. Nice!”
 In addition to questionable legality of the site some of the advertising there isn’t exactly workplace appropriate.
 I think that given the context he would be fine with us reprinting the entire statement. I’ve done some very minor cleaning up of some junk codes for readability. The original is available here.
 ~$120 million/year for Wiley and ~$1 billion/year for Reed Elsevier (source LibraryJournal.com).
As you may have seen earlier either on Jabberwocky, EEB and Flow, or over at Oikos‘ new blog, the most recent piece about how some branch of ecology is ruining ecology has caused some discussion in the blogosphere. Everytime one of these comes out, I tell myself I’m going to write a blog post but then I think, “that’s just one cranky person,” and i get distracted doing science that is killing ecology (Given the plethora of opinions about what is ruining our field, odds are you too are killing ecology, regardless of what type of science you do). But as these opinion pieces keep emerging, I have increasingly come to feel that these debates on the ‘best’ approach reflect a very limited view of the scientific endeavor. Every approach (field ecology, microcosms, theory, meta-analysis, macroecology, insert your favorite approach that I’ve missed here) is fundamentally limited in its scope, focus, and ability to divine answers from nature, yet has unique strengths in what it allows us to do. Theory is abstracted from nature, but can also provide a concrete set of expectations and processes for empiricists to work with. Microcosms, while similarly critiqued for their abstraction from reality, can also give the clearest indication about whether ideas and theories work (or don’t) under the most ideal scenarios. Field ecology (particularly experimental manipulation) has been considered the gold standard for its ability to show cause and effect in ‘real’ ecosystems, but it is also messy, expensive, time-consuming (I say this thinking of my own field site, perhaps yours is less so) and in a natural setting it is impossible to have control over all of the important (and potentially confounding) variables. Macroecology and meta-analysis allow us to step back from individual systems and taxa to ask whether patterns and processes are general across nature, general within certain subsets of systems, or unpredictably important (and unimportant). However they lack the ability to manipulate nature directly to tease out cause and effect more definitively. Because all approaches have limitations, the exclusive use of any one approach is guaranteed to give us a limited and possibly flawed view of reality. In the scientific utopia that lives in my head, these different approaches to addressing scientific questions live together harmoniously, results from one approach generate questions best addressed with another approach and the cumulative evidence from all approaches give us a more complete understanding of nature. When I read opinion pieces that advocate for a particular approach above all others, I worry that this utopia only exists in my head. After all, those opinion pieces never seem to be balanced by a counter argument for plurality. But then sometimes I read things – often on the internet – and I think: it may be in my head, but maybe my head is not the only one that dream resides in.
There is an excellent post over at EEB & Flow on the empirical divide,inspired by an editorial by David Lindenmayer and Gene Likens in the most recent ESA Bulletin, titled “Losing the Culture of Ecology”. It was great to see some thoughtful and data driven consideration of the idea that we should choose to emphasize one broad area of ecology over another. I really like their conclusion that these “divides” are really driven by other things:
The tensions between “indoor ecology” and field ecology have been conflated with changes in the philosophy of modern ecology, in the difficulties of obtaining funding and publishing as a modern ecologist, and some degree of thinking the “grass is always greener” in the other field. In fact, the empirical divide may not be as wide as is often suggested.
This post motivated some discussion in the comments, and on Twitter,
Dear senior luminary ecologists,
Please stop writing papers about how your pet research area is discriminated against.
Ethan White (@ethanwhite) July 26, 2011
And a nice follow up post by Jeremy Fox at the Oikos blog.
It’s all pretty short and well worth the read.
Here at Weecology we’re really into open science and that’s why we’re excited to announce our first serious attempt to facilitate open science beyond the confines of our own research – The Ecological Data Wiki.
The idea behind this project is simple. There is a large and rapidly increasing amount of ecology related data available thanks to initiatives sponsoring the collection of large-scale data and efforts to increase the publication of already collected datasets. As a result, progress in ecology is increasingly limited by the speed at which we can find and use existing data. The Ecological Data Wiki is intended to serve as a central source for identifying datasets that are useful to the study of ecology and quickly figuring out the best ways to use them. The idea is to use the knowlege and effort of the entire ecological community to compile this information rather than relying on each scientist to contribute information for their own studies. Just think of it as the Wikipedia of ecology data.
We’re just getting things off the ground, but we’d love it if you’d come by, take a look around, and if you think you can be of help sign up, learn how to get started, and contribute. We’re currently in private beta, but you can generally expect to have an account activated within about 24 hours.
Let us know what you think about the site and any suggestions you have in the comments. If you’d like to chat about the wiki (or anything else) in person, Ethan will be presenting on this during the Wednesday poster session at ESA.
We are pretty excited about what modern technology can do for science and in particular the potential for increasingly rapid sharing of, and collaboration on, data and ideas. It’s the big picture that explains why we like to blog, tweet, publish data and code, and we’ve benefited greatly from others who do the same. So, when we saw this great talk by Michael Nielsen about Open Science, we just had to share.