Sharing in Science: my full reply to Eli Kintisch
A couple of weeks ago Eli Kintisch (@elikint) interviewed me for what turned out to be a great article on “Sharing in Science” for Science Careers. He also interviewed Titus Brown (@ctitusbrown) who has since posted the full text of his reply, so I thought I’d do the same thing.
How has sharing code, data, R methods helped you with your scientific research?
Definitely. Sharing code and data helps the scientific community make more rapid progress by avoiding duplicated effort and by facilitating more reproducible research. Working together in this way helps us tackle the big scientific questions and that’s why I got into science in the first place. More directly, sharing benefits my group’s research in a number of ways:
- Sharing code and data results in the community being more aware of the research you are doing and more appreciative of the contributions you are making to the field as a whole. This results in new collaborations, invitations to give seminars and write papers, and access to excellent students and postdocs who might not have heard about my lab otherwise.
- Developing code and data so that it can be shared saves us a lot of time. We reuse each others code and data within the lab for different projects, and when a reviewer requests a small change in an analysis we can make a small change in our code and then regenerate the results and figures for the project by running a single program. This also makes our research more reproducible and allows me to quickly answer questions about analyses years after they’ve been conducted when the student or postdoc leading the project is no longer in the lab. We invest a little more time up front, but it saves us a lot of time in the long run. Getting folks to work this way is difficult unless they know they are going to be sharing things publicly.
- One of the biggest benefits of sharing code and data is in competing for grants. Funding agencies want to know how the money they spend will benefit science as a whole, and being able to make a compelling case that you share your code and data, and that it is used by others in the community, is important for satisfying this goal of the funders. Most major funding agencies have now codified this requirement in the form of data management plans that describe how the data and code will be managed and when and how it will be shared. Having a well established track record in sharing makes a compelling argument that you will benefit science beyond your own publications, and I have definitely benefited from that in the grant review process.
What barriers exist in your mind to more people doing so?
There is a lot of fear about openly sharing data and code. People believe that making their work public will result in being scooped or that their efforts will be criticized because they are too messy. There is a strong perception that sharing code and data takes a lot of extra time and effort. So the biggest barriers are sociological at the moment.
To address these barriers we need to be a better job of providing credit to scientists for sharing good data and code. We also need to do a better job of educating folks about the benefits of doing so. For example, in my experience, the time and effort dedicated to developing and documenting code and data as if you plan to share it actually ends up saving the individual research time in the long run. This happens because when you return to a project a few months or years after the original data collection or code development, it is much easier if the code and data are in a form that makes it easy to work with.
How has twitter helped your research efforts?
Twitter has been great for finding out about exciting new research, spreading the word about our research, getting feedback from a broad array of folks in the science and tech community, and developing new collaborations. A recent paper that I co-authored in PLOS Biology actually started as a conversation on twitter.
How has R Open Science helped you with your work, or why is it important or not?
rOpenSci is making it easier for scientists to acquire and analyze the large amounts of scientific data that are available on the web. They have been wrapping many of the major science related APIs in R, which makes these rich data sources available to large numbers of scientists who don’t even know what an API is. It also makes it easier for scientists with more developed computational skills to get research done. Instead of spending time figuring out the APIs for potentially dozens of different data sources, they can simply access rOpenSci’s suite of packages to quickly and easily download the data they need and get back to doing science. My research group has used some of their packages to access data in this way and we are in the process of developing a package with them that makes one of our Python tools for acquiring ecological data (the EcoData Retriever) easy to use in R.
Any practical tips you’d share on making sharing easier?
One of the things I think is most important when sharing both code and data is to use standard licences. Scientists have a habit of thinking they are lawyers and writing their own licenses and data use agreements that govern how the data and code and can used. This leads to a lot of ambiguity and difficulty in using data and code from multiple sources. Using standard open source and open data licences vastly simplifies the the process of making your work available and will allow science to benefit the most from your efforts.
And do you think sharing data/methods will help you get tenure? Evidence it has helped others?
I have tenure and I certainly emphasized my open science efforts in my packet. One of the big emphases in tenure packets is demonstrating the impact of your research, and showing that other people are using your data and code is a strong way to do this. Whether or not this directly impacted the decision to give me tenure I don’t know. Sharing data and code is definitely beneficial to competing for grants (as I described above) and increasingly to publishing papers as many journals now require the inclusion of data and code for replication. It also benefits your reputation (as I described above). Since tenure at most research universities is largely a combination of papers, grants, and reputation, and I think that sharing at least increases one’s chances of getting tenure indirectly.
UPDATE: Added missing link to Titus Brown’s post: http://ivory.idyll.org/blog/2014-eli-conversation.html