Is it OK to cite preprints? Yes, yes it is.
Should you cite preprints in your papers and should journals allow this? This is a topic that gets debated periodically. The most recent round of Twitter debate started last week when Martin Hunt pointed out that the journal Nucleic Acids Research wouldn’t allow him to cite them. A couple of days later I suggested that journals that don’t allow citing preprints are putting their authors’ at risk by forcing them not to cite relevant work. Roughly forty games of Sleeping Queens later (my kid is really into Sleeping Queens) I reopened Twitter and found a roiling debate over whether citing preprints was appropriate at all.
The basic argument against citing preprints is that they aren’t peer reviewed. E.g.,
and that this could lead to the citation of bad work and the potential decay of science. E.g.,
There are three reasons I disagree with this argument:
- We already cite lots of non-peer reviewed things in ecology
- Lots of fields already do this and they are doing just fine.
- Responsibility for the citation lies with the citer
We already cite non-peer reviewed things in ecology
As Auriel Fournier, Stephen Heard, Michael Hoffman, TerryMcGlynn and ATMoody pointed out we already cite lots of things that aren’t peer reviewed including government agency reports, white papers, and other “grey literature”.
We also cite lots of other really important non-peer reviewed things like data and software. We been doing this for decades. Ecology hasn’t become polluted with pseudo science. It will all be OK.
Lots of other fields already do this
One of the things I find amusing/exhausting about biologists debating preprints is ignorance of their history and use in other fields. It’s a bit like debating the name of an actor for two hours when you could easily look it up on Google.
In this particular case (as Eric Pedersen pointed out) we know that citation of preprints isn’t going to cause problems for the field because it hasn’t caused issues in other fields and has almost invariably become standard practice in fields that use preprints. Unless you think Physics and Math are having real issues it’s difficult to argue that this is a meaningful problem. Just ask a physicist
You are responsible for your citations
Why hasn’t citing unreviewed work caused the wheels to fall off of science? Because citing appropriate work in the proper context is part of our job. There are good preprints and bad preprints, good reports and bad reports, good data and bad data, good software and bad software, and good papers and bad papers. As Belinda Phipson, Casey Green, Dave Harris and Sebastian Raschka point out it is up to us as the people citing research to make professional judgments about what is good science and should be cited. Casey’s take captures my thoughts on this exactly:
So yes, you should cite preprints and other unreviewed things that are important for your work. That’s called proper attribution. It has worked in ecology and other fields for decades. It will continue to work because we are scientists and evaluating the science we cite is part of our jobs. You can even cite this blog post if you want to.
Thanks to everyone both linked here and not for the spirited discussion. Sorry I wasn’t there, but Sleeping Queens is a pretty awesome game.
UPDATE: For those of you new to this discussion, it’s been going on for a long time even in biology. Here is Graham Coop’s excellent post from nearly 4 years ago.
UPDATE: Discussion of why it’s important to put preprint citations are in the reference list
Data Analyst position in ecology research group
The Weecology lab group run by Ethan White and Morgan Ernest at the University of Florida is seeking a Data Analyst to work collaboratively with faculty, graduate students, and postdocs to understand and model ecological systems. We’re looking for someone who enjoys tidying, managing, manipulating, visualizing, and analyzing data to help support scientific discovery.
The position will include:
- Organizing, analyzing, and visualizing large amounts of ecological data, including spatial and remotely sensed data. Modifying existing analytical approaches and data protocols as needed.
- Planning and executing the analysis of data related to newly forming questions from the group. Assisting in the statistical analysis of ecological data, as determined by the needs of the research group.
- Providing assistance and guidance to members of the research group on existing research projects. Working collaboratively with undergraduates, graduate students and postdocs in the group and from related projects.
- Learning new analytical tools and software as needed.
This is a staff position in the group and will be focused on data management and analysis. All members of this collaborative group are considered equal partners in the scientific process and this position will be actively involved in collaborations. Weecology believes in the importance of open science, so most work done as part of this position will involve writing open source code, use of open source software, and production and use of open data.
Weecology is a partnership between the White Lab, which studies ecology using quantitative and computational approaches and the Ernest Lab, which tends to be more field and community ecology oriented. The Weecology group supports and encourages members interested in a variety of career paths. Former weecologists are currently employed in the tech industry, with the National Ecological Observatory Network, as faculty at teaching-focused colleges, and as postdocs and faculty at research universities. We are also committed to supporting and training a diverse scientific workforce. Current and former group members encompass a variety of racial and ethnic backgrounds from the U.S. and other countries, members of the LGBTQ community, military veterans, people with chronic illnesses, and first-generation college students. More information about the Weecology group and respective labs is available on our website. You can also check us out on Twitter (@skmorgane, @ethanwhite, @weecology, GitHub, and our blog Jabberwocky Ecology.
The ideal candidate will have:
- Experience working with data in R or Python, some exposure to version control (preferably Git and GitHub), and potentially some background with database management systems (e.g., PostgreSQL, SQLite, MySQL) and spatial data.
- Research experience in ecology
- Interest in open approaches to science
- Experience collecting or working with ecological data
That said, don’t let the absence of any of these stop you from applying. If this sounds like a job you’d like to have please go ahead and put in an application.
We currently have funding for this position for 2.5 years. Minimum salary is $40,000/year (which goes a pretty long way in Gainesville), but there is significant flexibility in this number for highly qualified candidates. We are open to the possibility of someone working remotely. The position will remain open until filled, with initial review of applications beginning on May 5th. If you’re interested in applying you can do so through the official UF position page. If you have any questions or just want to let us know that you’re applying you can email Weecology’s project manager Glenda Yenni at email@example.com.
Postdoctoral research position in the Temporal Dynamics of Communities
The Weecology lab group run by Morgan Ernest and Ethan White at the University of Florida is seeking a post-doctoral researcher to study changes in ecological communities through time. This position will primarily involve broad-scale comparative analyses across communities using large time-series datasets and/or in-depth analyses of our own long-term dataset (the Portal Project). Experience with any of the following is useful, but not required: long-term data, macroecology, paleoecology, quantitative/theoretical ecology, and programming/data analysis in R or Python. The successful applicant will be expected to collaborate on lab projects on community dynamics and develop their own research projects in this area according to their interests.
Weecology is a partnership between the Ernest Lab, which tends to be more field and community ecology oriented and the White Lab, which tends to be more quantitative and computationally oriented. The Weecology group supports and encourages students interested in a variety of career paths. Former weecologists are currently employed in the tech industry, with the National Ecological Observatory Network, as faculty at teaching-focused colleges, and as postdocs and faculty at research universities. We are also committed to supporting and training a diverse scientific workforce. Current and former group members encompass a variety of racial and ethnic backgrounds from the U.S. and other countries, members of the LGBTQ community, military veterans, people with chronic illnesses, and first-generation college students. More information about the Weecology group and respective labs is available on our website. You can also check us out on Twitter (@skmorgane, @ethanwhite, @weecology), GitHub, and our blog Jabberwocky Ecology.
This 2-year postdoc has a flexible start date, but can start as early as June 1st 2017. Interested students should contact Dr. Morgan Ernest (firstname.lastname@example.org) with their CV including a list of three references, a cover letter detailing their research interests/experiences, and one or more research samples (a PDF or link to a scientific product such as a published paper, preprint, software, data analysis code, etc). The position will remain open until filled, with initial review of applications beginning on April 24th.
Fork our course: A semester-long Data Carpentry course for biologists
This is post is co-authored by Zack Brym and Ethan White
Over the last year and a half we have been actively developing a semester-long Data Carpentry course designed to be easily customized and integrated into existing graduate and undergraduate curricula.
Data Carpentry for Biologists contains course materials for teaching scientists how to work more effectively with data. The course provides introductions to data management and relational databases, data manipulation and analysis, and data visualization. It covers the same general types of material as a two-day Data Carpentry workshop, but expands the materials and opportunities for practice into a full-length university course. The teaching material uses R and SQLite, with some corresponding materials for Python as well. To help students understand the direct applications to their interests, the examples and exercises focus on biological questions and working with real data. The course emphasizes using best practices to produce reusable and reproducible data analysis.
Active-learning Teaching Materials
Learning computing requires active practice by working through programming problems. Just diving in to computing is challenging for most scientists, so the course instruction is designed to combine short live-coding introductions to concepts followed immediately by the students working on a related exercise. Additional exercises are assigned later for practice. This follows the “I do”, “We do”, “You do” approach to teaching, which leverages the benefits of active-learning and flipped classrooms without leaving students who are less comfortable with the material feeling lost. The bulk of class time is spent working on assigned exercises with the instructor moving around the room helping guide students through things they don’t understand and engaging with students who are thinking about advanced applications of what they’ve learned.
This approach is the result of lots of reading about effective teaching methods and Ethan’s experience teaching this and related courses over the last six years at Utah State University and the University of Florida. It seems to work well for both students that get the material easily and those that find it more challenging. We’ve also tried to make these materials as useful as possible for self-guided students.
Open course development
Software Carpentry and Data Carpentry have shown how powerful collaborative lesson development can be and we’re interested in bringing that to the university classroom. We have designed the course materials to be modular and easy to modify, and the course website easy to clone and set up. All of the teaching materials and associated website files are openly available at the Data Carpentry for Biologists repository on GitHub under CC-BY and MIT licenses. The course materials are all written in Markdown and everything runs on Jekyll through GitHub Pages. Making your own version of the course should take less than an hour. We’ve developed documentation for how to create your own version of the course and how to contribute to development. Exercises and assignments are modular and changing exercises and assignments simply involves reordering items in a list. Adding a new exercise involves creating a new Markdown file and then adding its title to the list of exercises for an assignment.
If you teach, or want to teach, a course like this, we’d love to get you involved. Here are some useful links for getting started.
– I want to teach the course.
– I have some feedback.
– I want to contribute to the project.
We want to be sure getting involved is as easy as possible. We’ve worked hard to provide documentation and help resources for students and instructors. Students can find all they need to know at our student start guide. Instructors have access to course content and site design documentation.
If your having trouble finding something or getting something to work, or simply have some feedback about the course please open a new issue at GitHub or send us an email.
Development of this course was generously support by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.
New release of the EcoData Retriever
We are very exited to announce the newest release of the EcoData Retriever, our software for automating the downloading, cleaning, and installing of ecological and environmental data. Instead of hours or days trying to get complicated datasets like the Breeding Bird Survey ready for analysis, the Retriever lets you simply click a button or run a single command from R or the command line, and your computer does the rest.
It’s been over a year since the last retriever release and there are lots of new features and improvements to be excited about.
- We’ve added 21 new datasets, including major ecological and environmental datasets like eBird, Vertnet, and the Global Wood Density Database, and the PRISM climate data.
- To support all of these datasets we’ve added support for additional data types including greater than memory archive files, and we’ve also improved the ability to control where downloaded files are stored and how they are clustered together.
- We’ve significantly improved documentation and now have a new automatically built documentation site at Read The Docs.
- We’ve also made a lot of under the hood improvements.
This is also the first release that has been overseen by Weecology’s new software engineer, Henry Senyondo. We’re excited to have Henry on the team, and now that he’s around development of both the EcoData Retriever and other lab software projects will be happening more quickly.
A big thanks to the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative for funding this development through Grant GBMF4563 and to the National Science Foundation for funding as part of a CAREER award to Ethan White.
UPDATE: Led by Dan McGlinn we also released a new version of the ecoretriever R interface for the Retriever last fall. This makes using the Retriever from R as simple as:
data <- ecoretriever::fetch("BBS")
GEB adds unlimited data references section to papers
In a big step forward for allowing proper credit to be provided to all of the awesome folks collecting and publishing data, the journal Global Ecology & Biogeography has just announced that they will start supporting an unlimited set of references to datasets used in a paper.
A growing concern in the macroecological community has been that many papers whose data are used in meta-analyses or data-compilation papers have not been getting citation credit because most journals require these papers to only be listed in the supplemental material (which is not indexed by most indexing services). GEB is proud to support the inclusion of a second list of references within the main paper for all data papers used… To our knowledge, GEB is the first journal in the ecological field to do this. And we’ll be working with Wiley to further improve options in this area.
These references will be included immediately following the traditional references section in both the html and pdf versions of the paper. You can see an example in Olds et al. (2016).
What this means is that when you combine data from dozens or hundreds of studies to conduct a synthetic analysis, you can cite all of the sources in a way that will provide citation credit to those collecting the data1. It also means that scientists using large data compilations can cite the original data sources as well as the compilation itself2.
This is important for encouraging the publication of data, since one of the common reasons that scientists don’t publish data is a lack of credit, and citation only in non-indexed supplementary materials sections is a common concern.
Facilitating proper citation of all data sources is something the community has been requesting and it’s great to see GEB taking the lead in this area. Since Wiley, the publisher of GEB, is the largest publisher of ecology journals, it should be straightforward to implement this new approach widely. If other journals follow GEB’s lead, we will enter a new era where citation of data can be as complete as possible, allowing proper credit to everyone who collects and publishes data.
1GEB will need to make sure that this section gets properly picked up by the indexers, and tweak the presentation as necessary if it isn’t.
2Provided that the compilation provides a method for compiling a citation list of all associated sources.
How technology can help scientists with chronic illnesses (or Technology FTW!)
This is a guest post by Elita Baldridge (@elitabaldridge)
I am currently the remotely working member of Weecology, finishing up my PhD in the lower elevation and better air of Kansas, while the rest of my colleagues are still in Utah, due to developing a chronic illness and finally getting diagnosed with fibromyalgia. The relocation is actually working out really well. I’m in better shape because I’m not having to fight the air too, and I’m finally making real progress toward finishing my dissertation again.
I ruthlessly culled everything that wasn’t directly working on my dissertation. I was going to attend the Gordon Conference this year, as I had heard fantastic things about it for years, but had not been ready to go yet, but I had to drop that because I wasn’t physically able to travel. I did not go to ESA, because I couldn’t travel. There are working groups and workshops galore, all involving travel, which I cannot do. Right now, the closest thing that we have to bringing absent scientists to an event is live tweeting, which is not nearly as good as hearing a speaker for yourself, and is pretty heartbreaking if you had to cancel your plans to attend an event because you were too infirm to go.The tools that I’m using to do science remotely are not just for increasing accessibility for a single chronically ill macroecologist. They are good tools for science in general. I’m using GitHub to version control my code, and Dropbox to share data and figures. Ethan can see what I’m working on as I’m doing it, and I’ve got a clear record of what I was doing and what decisions that I made. While my cognitive dysfunction may be a bit more extreme of a problem, I know that we’ve all stayed up too late coding and broken something we shouldn’t have and the ability to wave the magic Git wand and make any poor decisions that I made while my brain was out to lunch go away is priceless.
Open access? Having open access to papers is really important when you are going to be faced shortly with probably not having any institutional access anymore. Also, important for everyone else who isn’t at a major university with very expensive subscriptions to all the journals. Having open access to data and code is crucial when you can’t collect your own data and are going to be doing research from your home computer on the cheap because you can’t rely on your body to work reliably at any given point in time.
Video conferencing is working well for me to meet with the lab, but could also be great for attending conferences and workshops. This would not only be good for a certain macroecologist, but would also be good to include people from smaller universities, etc. who would like to participate in these type of things too, but can’t otherwise due to the travel. I did my master’s degree at Fort Hays State University, and I still love it dearly. This type of increased accessibility would have been great for me while I was a perfectly healthy master’s student. Fort Hays is a primarily undergraduate institution in the middle of Kansas, about four hours away from any major city, and it does not have some of the resources that a larger university would have. No seminar series, no workshops, not much travel money to go to workshops or conferences, which doesn’t mean that good science can’t still be happening.
Many of my labmates are looking for post-docs, or are already in postdoc positions at this point. I’m very excited for all of them, and await eagerly all the stories of the exciting new things they are doing. Having a chronic illness limits what I am capable of doing physically. I am not going to be able to move across the country for a post-doc. That does not mean that I do not want to play science too. I’ve got my home base set up, and I can reach pretty far from here. I still want to be a part of living science, I don’t want to have to get to the party after everyone else has gone home.
And I wonder, why can I not do these things? Is it not the future? Do we not have the internet, with video chat? I get to meet with Ethan and talk science at our weekly meetings every week. I go to lab meetings with video chat, and get to see what my labmates are doing, and crack jokes, and laugh at other people’s jokes. It wouldn’t be hard to get me to conferences and working groups either.
With technology, I get to be a part of living, breathing science, and it is a beautiful thing.
White Lab PhD openings at the University of Florida
I’m looking for one or more graduate students to join my group next fall. In addition to the official add (below) I’d like to add a few extra thoughts. As Morgan Ernest noted in her recent ad, we have a relatively unique setup at Weecology in that we interact actively with members of the Ernest Lab. We share space, have joint lab meetings, and generally maintain a very close intellectual relationship. We do this with the goal of breaking down the barriers between the quantitative side of ecology and the field/lab side of ecology. Our goal is to train scientists who span these barriers in a way that allows them to tackle interesting and important questions.
I also believe it’s important to train students for multiple potential career paths. Members of my lab have gone on to faculty positions, postdocs, and jobs in both science non-profits and the software industry.
Scientists in my group regularly both write papers (e.g., these recent papers from dissertation chapters: Locey & White 2013, Xiao et al. 2014) and develop or contribute to software (e.g., EcoData Retriever, ecoretriever, rpartitions & pypartitions) even if they’ve never coded before they joined my lab.
My group generally works on problems at the population, community, and ecosystem levels of ecology. You can find out more about what we’ve been up to by checking out our website. If you’re interested in learning more about where the lab is headed I recommend reading my recently funded Moore Investigator in Data-Driven Discovery proposal.
PH.D STUDENT OPENINGS IN QUANTITATIVE, COMPUTATIONAL, AND MACRO- ECOLOGY
The White Lab at the University of Florida has openings for one or more PhD students in quantitative, computational, and/or macro- ecology to start fall 2015. The student(s) will be supported as graduate research assistants from a combination of NSF, Moore Foundation, and University of Florida sources depending on their research interests.
The White Lab uses computational, mathematical, and advanced statistical/machine learning methods to understand and make predictions/forecasts for ecological systems using large amounts of data. Background in quantitative and computational techniques is not necessary, only an interest in learning and applying them. Students are encouraged to develop their own research projects related to their interests.
The White Lab is currently at Utah State University, but is moving to the Department of Wildlife Ecology and Conservation at the University of Florida starting summer 2015.
Interested students should contact Dr. Ethan White (email@example.com) by Nov 15th, 2014 with their CV, GRE scores, and a brief statement of research interests.
UPDATE: Added a note that we work at population, community, and ecosystem levels.
EcoData Retriever now supports R and environmental data, and has more datasets
We are very excited to announce the newest release of our EcoData Retriever software and the first release of a supporting R package, ecoretriever. If you’re not familiar with the EcoData Retriever you can read more here.
The biggest improvement to the Retriever in this set of releases is the ability to run it directly from R. Dan McGlinn did a great job leading the development of this package and we got ton of fantastic help from the folks at rOpenSci (most notably Scott Chamberlain, Gavin Simpson, and Karthik Ram). Now, once you install the main EcoData Retriever, you can run it from inside R by doing things like:
install.packages('ecoretriever') library(ecoretriever) # List the datasets available via the Retriever ecoretriever::datasets() # Install the Gentry dataset into csv files in your working directory ecoretriever::install('Gentry', 'csv') # Download the raw Gentry dataset files, without any processing, # to the subdirectory named data ecoretriever::download('Gentry', './data/') # Install and load a dataset as a list Gentry = ecoretriever::fetch('Gentry') names(Gentry) head(Gentry$counts)
The other big advance in this release is the ability to have the Retriever directly download files instead of processing them. This allows us to support data that doesn’t come in standard tabular forms. So, we can now include things like environmental data in GIS formats and phylogenetic data such as supertrees. We’ve used this new capability to allow the automatic downloading of the Bioclim data, one of the most widely used climate datasets in ecology, and the supertree for mammals from Fritz et al. 2009.
Finally, we’ve also add the very cool mammalian diet dataset from Dryad
Weecology is moving to the University of Florida
We are excited to announce that Weecology will be moving to the University of Florida next summer. We were recruited as part of the UF Rising Preeminence Plan, a major hiring campaign to bring together researchers in a number of focal areas including Big Data and Biodiversity. We will both be joining the Wildlife Ecology and Conservation department, Ethan will be part of UF’s new Informatics Institute, and Morgan will be part of UF’s new Biodiversity Initiative.
As excited as we are about the opportunities at Florida, we are also incredibly sad to be saying goodbye to Utah State University. Leaving was not an easy decision. We have amazing colleagues and friends here in Utah that we will greatly miss. We have also felt extremely well treated by Utah State. They were very supportive while we were getting our programs up and running, including helping us solve the two-body problem. They allowed us to take risks in both research and the classroom. They have been incredibly supportive of our desires for work-life balance, and were very accommodating following the birth of our daughter. It was a fantastic place to spend nearly a decade and we will miss it and the amazing people who made it home.
So why are we leaving? It was a many faceted decisions, but at its core was the realization that the scale of the investment and recruiting of talented folks in both of our areas of interest was something we were unlikely to see again in our careers. The University of Florida has always had a strong ecology group, but between the new folks who have already accepted positions and those we know who are being considered, it is going to be such a talented and exciting group that we just had to be part of it!
As part of the move we’ll be hiring for a number of different positions, so stay tuned!
Ecology Letters now allows preprints; and why this is a big deal for ecology
As announced by Noam Ross on Twitter (and confirmed by the Editor in Chief of Ecology Letters), Ecology Letters will now allow the submission of manuscripts that have been posted as preprints. Details will be published in an editorial in Ecology Letters. I want to say a heartfelt thanks to Marcel Holyoak and the entire Ecology Letters editorial board for listening to the ecological community and modifying their policies. Science is working a little better today than it was yesterday thanks to their efforts.
For those of you who are new to the concept of preprints, they are manuscripts, that have not yet been published in peer reviewed journals, which are posted to websites like arXiv, PeerJ, and bioRxiv. This process allows for more rapid communication of scientific results and improved quality of published papers though more expansive pre-publication peer-review. If you’d like to read more check out our paper on The Case for Open Preprints in Biology.
The fact that Ecology Letters now allows preprints is a big deal for ecology because they were the last of the major ecology journals to make the transition. The ESA journals began allowing preprints just over two years ago and the BES journals made the switch about 9 months ago. In addition, Science, Nature, PNAS, PLOS Biology, and a number of other ecology journals (e.g., Biotropica) all support preprints. This means that all of the top ecology journals, and all of the top general science journals that most ecologists publish in, allow the posting of preprints. As such, there is not longer a reason to not post preprints based on the possibility of not being able to publish in a preferred journal. This can potentially shave months to years off of the time between discovery and initial communication of results in ecology.
It also means that other ecology journals that still do not allow the posting of preprints are under significant pressure to change their policies. With all of the big journals allowing preprints they have no reasonable excuse for not modernizing their policies, and they risk loosing out on papers that are initially submitted to higher profile journals and are posted as preprints.
It’s a good day for science. Celebrate by posting your next manuscript as a preprint.
Sharing in Science: my full reply to Eli Kintisch
A couple of weeks ago Eli Kintisch (@elikint) interviewed me for what turned out to be a great article on “Sharing in Science” for Science Careers. He also interviewed Titus Brown (@ctitusbrown) who has since posted the full text of his reply, so I thought I’d do the same thing.
How has sharing code, data, R methods helped you with your scientific research?
Definitely. Sharing code and data helps the scientific community make more rapid progress by avoiding duplicated effort and by facilitating more reproducible research. Working together in this way helps us tackle the big scientific questions and that’s why I got into science in the first place. More directly, sharing benefits my group’s research in a number of ways:
- Sharing code and data results in the community being more aware of the research you are doing and more appreciative of the contributions you are making to the field as a whole. This results in new collaborations, invitations to give seminars and write papers, and access to excellent students and postdocs who might not have heard about my lab otherwise.
- Developing code and data so that it can be shared saves us a lot of time. We reuse each others code and data within the lab for different projects, and when a reviewer requests a small change in an analysis we can make a small change in our code and then regenerate the results and figures for the project by running a single program. This also makes our research more reproducible and allows me to quickly answer questions about analyses years after they’ve been conducted when the student or postdoc leading the project is no longer in the lab. We invest a little more time up front, but it saves us a lot of time in the long run. Getting folks to work this way is difficult unless they know they are going to be sharing things publicly.
- One of the biggest benefits of sharing code and data is in competing for grants. Funding agencies want to know how the money they spend will benefit science as a whole, and being able to make a compelling case that you share your code and data, and that it is used by others in the community, is important for satisfying this goal of the funders. Most major funding agencies have now codified this requirement in the form of data management plans that describe how the data and code will be managed and when and how it will be shared. Having a well established track record in sharing makes a compelling argument that you will benefit science beyond your own publications, and I have definitely benefited from that in the grant review process.
What barriers exist in your mind to more people doing so?
There is a lot of fear about openly sharing data and code. People believe that making their work public will result in being scooped or that their efforts will be criticized because they are too messy. There is a strong perception that sharing code and data takes a lot of extra time and effort. So the biggest barriers are sociological at the moment.
To address these barriers we need to be a better job of providing credit to scientists for sharing good data and code. We also need to do a better job of educating folks about the benefits of doing so. For example, in my experience, the time and effort dedicated to developing and documenting code and data as if you plan to share it actually ends up saving the individual research time in the long run. This happens because when you return to a project a few months or years after the original data collection or code development, it is much easier if the code and data are in a form that makes it easy to work with.
How has twitter helped your research efforts?
Twitter has been great for finding out about exciting new research, spreading the word about our research, getting feedback from a broad array of folks in the science and tech community, and developing new collaborations. A recent paper that I co-authored in PLOS Biology actually started as a conversation on twitter.
How has R Open Science helped you with your work, or why is it important or not?
rOpenSci is making it easier for scientists to acquire and analyze the large amounts of scientific data that are available on the web. They have been wrapping many of the major science related APIs in R, which makes these rich data sources available to large numbers of scientists who don’t even know what an API is. It also makes it easier for scientists with more developed computational skills to get research done. Instead of spending time figuring out the APIs for potentially dozens of different data sources, they can simply access rOpenSci’s suite of packages to quickly and easily download the data they need and get back to doing science. My research group has used some of their packages to access data in this way and we are in the process of developing a package with them that makes one of our Python tools for acquiring ecological data (the EcoData Retriever) easy to use in R.
Any practical tips you’d share on making sharing easier?
One of the things I think is most important when sharing both code and data is to use standard licences. Scientists have a habit of thinking they are lawyers and writing their own licenses and data use agreements that govern how the data and code and can used. This leads to a lot of ambiguity and difficulty in using data and code from multiple sources. Using standard open source and open data licences vastly simplifies the the process of making your work available and will allow science to benefit the most from your efforts.
And do you think sharing data/methods will help you get tenure? Evidence it has helped others?
I have tenure and I certainly emphasized my open science efforts in my packet. One of the big emphases in tenure packets is demonstrating the impact of your research, and showing that other people are using your data and code is a strong way to do this. Whether or not this directly impacted the decision to give me tenure I don’t know. Sharing data and code is definitely beneficial to competing for grants (as I described above) and increasingly to publishing papers as many journals now require the inclusion of data and code for replication. It also benefits your reputation (as I described above). Since tenure at most research universities is largely a combination of papers, grants, and reputation, and I think that sharing at least increases one’s chances of getting tenure indirectly.
UPDATE: Added missing link to Titus Brown’s post: http://ivory.idyll.org/blog/2014-eli-conversation.html