A couple of weeks ago Eli Kintisch (@elikint) interviewed me for what turned out to be a great article on “Sharing in Science” for Science Careers. He also interviewed Titus Brown (@ctitusbrown) who has since posted the full text of his reply, so I thought I’d do the same thing.
How has sharing code, data, R methods helped you with your scientific research?
Definitely. Sharing code and data helps the scientific community make more rapid progress by avoiding duplicated effort and by facilitating more reproducible research. Working together in this way helps us tackle the big scientific questions and that’s why I got into science in the first place. More directly, sharing benefits my group’s research in a number of ways:
- Sharing code and data results in the community being more aware of the research you are doing and more appreciative of the contributions you are making to the field as a whole. This results in new collaborations, invitations to give seminars and write papers, and access to excellent students and postdocs who might not have heard about my lab otherwise.
- Developing code and data so that it can be shared saves us a lot of time. We reuse each others code and data within the lab for different projects, and when a reviewer requests a small change in an analysis we can make a small change in our code and then regenerate the results and figures for the project by running a single program. This also makes our research more reproducible and allows me to quickly answer questions about analyses years after they’ve been conducted when the student or postdoc leading the project is no longer in the lab. We invest a little more time up front, but it saves us a lot of time in the long run. Getting folks to work this way is difficult unless they know they are going to be sharing things publicly.
- One of the biggest benefits of sharing code and data is in competing for grants. Funding agencies want to know how the money they spend will benefit science as a whole, and being able to make a compelling case that you share your code and data, and that it is used by others in the community, is important for satisfying this goal of the funders. Most major funding agencies have now codified this requirement in the form of data management plans that describe how the data and code will be managed and when and how it will be shared. Having a well established track record in sharing makes a compelling argument that you will benefit science beyond your own publications, and I have definitely benefited from that in the grant review process.
What barriers exist in your mind to more people doing so?
There is a lot of fear about openly sharing data and code. People believe that making their work public will result in being scooped or that their efforts will be criticized because they are too messy. There is a strong perception that sharing code and data takes a lot of extra time and effort. So the biggest barriers are sociological at the moment.
To address these barriers we need to be a better job of providing credit to scientists for sharing good data and code. We also need to do a better job of educating folks about the benefits of doing so. For example, in my experience, the time and effort dedicated to developing and documenting code and data as if you plan to share it actually ends up saving the individual research time in the long run. This happens because when you return to a project a few months or years after the original data collection or code development, it is much easier if the code and data are in a form that makes it easy to work with.
How has twitter helped your research efforts?
Twitter has been great for finding out about exciting new research, spreading the word about our research, getting feedback from a broad array of folks in the science and tech community, and developing new collaborations. A recent paper that I co-authored in PLOS Biology actually started as a conversation on twitter.
How has R Open Science helped you with your work, or why is it important or not?
rOpenSci is making it easier for scientists to acquire and analyze the large amounts of scientific data that are available on the web. They have been wrapping many of the major science related APIs in R, which makes these rich data sources available to large numbers of scientists who don’t even know what an API is. It also makes it easier for scientists with more developed computational skills to get research done. Instead of spending time figuring out the APIs for potentially dozens of different data sources, they can simply access rOpenSci’s suite of packages to quickly and easily download the data they need and get back to doing science. My research group has used some of their packages to access data in this way and we are in the process of developing a package with them that makes one of our Python tools for acquiring ecological data (the EcoData Retriever) easy to use in R.
Any practical tips you’d share on making sharing easier?
One of the things I think is most important when sharing both code and data is to use standard licences. Scientists have a habit of thinking they are lawyers and writing their own licenses and data use agreements that govern how the data and code and can used. This leads to a lot of ambiguity and difficulty in using data and code from multiple sources. Using standard open source and open data licences vastly simplifies the the process of making your work available and will allow science to benefit the most from your efforts.
And do you think sharing data/methods will help you get tenure? Evidence it has helped others?
I have tenure and I certainly emphasized my open science efforts in my packet. One of the big emphases in tenure packets is demonstrating the impact of your research, and showing that other people are using your data and code is a strong way to do this. Whether or not this directly impacted the decision to give me tenure I don’t know. Sharing data and code is definitely beneficial to competing for grants (as I described above) and increasingly to publishing papers as many journals now require the inclusion of data and code for replication. It also benefits your reputation (as I described above). Since tenure at most research universities is largely a combination of papers, grants, and reputation, and I think that sharing at least increases one’s chances of getting tenure indirectly.
UPDATE: Added missing link to Titus Brown’s post: http://ivory.idyll.org/blog/2014-eli-conversation.html
Recently a bunch of folks in the biological sciences have started sharing their grant proposals openly. Their reasons for doing so are varied (see the links next to their names below), but part of the common justification is a general interest in opening up science so that all stages of the process can benefit from better interaction and communication, and part of it is to provide examples for younger scientists writing grants. To help accomplish both of these goals I’m going to do what Titus Brown suggested and compile a list of all of the available open proposals in the biological sciences (if you’re looking for math proposals they have a list too). Given the limited number of proposals available at the moment I’m just going to maintain the list here, sorted alphabetically by PI. Another way to find proposals is to look at the ‘grant’ and ‘proposal’ tags on figshare, where several of us have been posting proposals. If you know of more proposals, decide to post some yourself, or have corrections to proposal in the list, just let me know in the comments and I’ll keep the list updated. Enjoy!
- 2014 / Postdoctoral fellowships (several) – Making sense of cancer data: Implications for personalized therapy and cancer biology
- 2008 / Tools and Resources Development Fund Application – pubmed2ensembl: a resource for linking biological literature to genome sequences (BBSRC) *funded
- 2008 / New Investigator Grant Application (NERC) *funded
- 2008 / EMBO Young Investigator Programme Application (EMBO)
- 2007 / Responsive Mode Grant Application (BBSRC)
- 2014 / Children’s Foundation Research Institute – Prediction of Weight Gain in Inbred Mouse Strains *funded
- 2012 / NSF Office of Cyberinfrastructure proposal, Materials and Workshops for Cyberinfrastructure Education in Biology supplement to BEACON. *funded
- 2012 / NSF CAREER proposal, Assembling Extremely Large Metagenomes
- 2012 / NSF BIGDATA proposal, Low-memory Streaming Prefilters for Biological Sequencing Data
- 2012 / Moore Foundation proposal on marine metagenomics
- 2011 / NSF CAREER proposal: “Scaling and Improving de Bruijn graph assembly”
- 2010 / Next-gen course (NIH R25) *funded
- 2009 / Web tools for next-gen sequence analysis (USDA) *funded
- 2007 / Cartwheel
- Kathryn Fuller Doctoral Fellowship application (WWF)
- 2010 / Prairie Biotic Research proposal *funded
- 2009 / Ecological and evolutionary impacts of pollinator sharing between cultivated and wild sunflowers (Norman Hackerman Advanced Research Program)
- 2009 / Lewis and Clark grant proposal (American Philosophical Society)
- Doctoral Dissertation Improvement Grant proposal (NSF)
- Forest Shreeve Award proposal
- Ariel Appleton Research Fellowship Proposal – Ecological Networks
- How do crop-mediated changes in mutualist and antagonist communities affect selection on floral and defense traits?
- 2015 / Marie Skłodowska-Curie Individual Fellowship *funded
- 2011 / “Automated and community synthesis of the tree of life” (NSF AVATOL) *funded
- 2010 / “Towards a comprehensive, community-owned and sustainable repository of reusable phylogenetic knowledge” w/Hilmar Lapp (NSF ABI)
- 2009 / “A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)” w/Hilmar Lapp (NSF INTEROP)
- 2009 / NSF Plant Genome *funded
- 2013 / “Reversing long-term experiments to understand regime shifts” (NSF DEB preproposal) *funded
- 2012 / Understanding range shift model error: The inﬂuence of generation time and rate of adaptation on species distribution model predictions. w/Scott Chamberlain (NCEAS proposal).
- 2008 / Evolution under simulated climate change in response to trophic shifts. (NSF DDIG) *funded
- 2010 / Protein Design Using Quantum Mechanics (Danish Center for Supercomputing) *funded
- 2008 / Computational Design of Stable Enzymes (Danish National Science Foundation, DSF-NABIIT) *funded
- 2006 / Modeling pH-Dependence in Drug Design (EU Marie Curie Program) *funded
- 2006 / Computational Prediction and Validation of Protein Structure and Function in Protein Engineering and Rational Drug Design (Danish National Science Foundation, FNU) *funded
- 2006 / Prediction and Interpretation of Protein pKa’s Using QM/MM (US National Science Foundation – MCB; rescinded when I moved to Denmark) *funded
- 2002 / The Prediction and Interpretation of Protein pKa’s Using QM/MM (US National Science Foundation – MCB) *funded
- 2010 / Ontology-enabled reasoning across phenotypes from evolution and model organisms w/Todd Vision (NSF) *funded
- 2013 / NSF Postdoctoral Research Fellowship in Biology
- 2010 / Leakey Foundation General Research Grant
- 2009 / US Student Fulbright
- 2009 / NSF Dissertation Improvement Grant
Heather Piwowar (@researchremix) & Jason Priem (@jasonpriem) (read their thoughts on sharing proposals)
- Uptake proposal (CIHR)
- 2007 / Sxy proposal (CIHR) *funded
- 2001 / CIHR proposal *funded
- 1999 / NIH proposal *funded
- 2009 / USDA/NIFA: “Scanning for yield: high-throughput discovery of candidate agronomic loci for marker-assisted selection in maize” *funded
- 2015 / NSF Plant Genome Research Program: “The genetics of highland adaptation in maize” *funded
- Netherlands organization for scientific research postdoc fellowship
- Netherlands organization for scientific research PhD fellowship
- Netherlands organization for scientific research PhD fellowship
- Netherlands organization for scientific research PhD fellowship *funded
- Netherlands organization for scientific research postdoc fellowship *funded
- 2010 / NSF Graduate Research Fellowship *funded
- 2016 / NSF Postdoctoral Fellowship *funded (associated reviews)
- 2014 / Dynamic macroecology: Globally assessing body size diversity response to environmental change (NSF Postdoctoral Fellowship) *funded
- 2012 / Data Management and Computational Skills Training for LTER Scientists w/Ethan White & Greg Wilson (LTER Training Working Groups Proposal)
- 2011 / Fuelwood, Savannas, and Climate Change: Integrating Modeling, Field Experimentation, and Optical and Radar Remote Sensing (NASA Predoctoral Graduate Fellowship) *funded
- 2014 / NSF Postdoctoral Fellowship – Diversity-stability relationships and coexistence: new theory and empirical tests *funded
- 2012 / Genomic tools to study coral reef resilience (University of Melbourne)
- 2012 / Plastid endosymbiosis: a detailed study of genome dynamics (Australian Research Council)
- 2012 / Evolutionary dynamics of the algae: Understanding adaptive potential under environmental change (Australian Research Council) *funded
- Probing key innovations with next generation sequencing
- 2009 / Macroevolutionary dynamics of marine algae
- 2012 / Sustainable and Scalable Infrastructure for the Publication of Data (NSF) *funded
- 2008 / A Digital Repository for Preservation and Sharing of Data Underlying Published Works in Evolutionary Biology (NSF) *funded
- 2014 / Moore Investigator in Data Driven Discovery proposal *funded
- 2010 / CAREER: Advancing Macroecology Using Informatics and Entropy Maximization (NSF CAREER Award) *funded
- 2005 / Broad-scale patterns of the distribution of body sizes of individuals in ecological communities (NSF Postdoc Fellowship) *funded
- 2008 / Understanding multimodality in animal size distributions (NSF Research Starter Grant) *funded
As I announced on Twitter about a week ago, I am now making all of my grant proposals open access. To start with I’m doing this for all of my sole-PI proposals, because I don’t have to convince my collaborators to participate in this rather aggressively open style of science. At the moment this includes three funded proposals: my NSF Postdoctoral Fellowship proposal, an associated Research Starter Grant proposal, and my NSF CAREER award.
So, why am I doing this, especially with the CAREER award that still has several years left on it and some cool ideas that we haven’t worked on yet. I’m doing it for a few reasons. First, I think that openness is inherently good for science. While there may be benefits for me in keeping my ideas secret until they are published, this certainly doesn’t benefit science more broadly. By sharing our proposals the cutting edge of scientific thought will no longer be hidden from view for several years and that will allow us to make more rapid progress. Second, I think having examples of grants available to young scientists has the potential to help them learn how to write good proposals (and other folks seem to agree) and therefore decrease the importance of grantsmanship relative to cool science in the awarding of limited funds. Finally, I just think that folks deserve to be able to see what their tax dollars are paying for, and to be able to compare what I’ve said I will do to what I actually accomplish. I’ve been influenced in my thinking about this by posts by several of the big open science folks out there including Titus Brown, Heather Piwowar, and Rod Page.
To make my grants open access I chose to use figshare for several reasons.
- Credit. Figshare assigns a DOI to all of its public objects, which means that you can easily cite them in scientific papers. If someone gets an idea out of one of my proposals and works on it before I do, this let’s them acknowledge that fact. Stats are also available for views, shares, and (soon) citations, making it easier to track the impact of your larger corpus of research outputs.
- Open Access. All public written material is licensed under CC-BY (basically just cite the original work) allowing folks to do cool things without asking.
- Permanence. I can’t just change my mind and delete the proposal and I also expect that figshare will be around for a long time.
- Version control. For proposals that are not funded, revised, not funded, revised, etc. figshare allows me to post multiple versions of the proposal while maintaining the previous versions for posterity/citation.
During this process I’ve come across several other folks doing similar things and even inspired others to post their proposals, so I’m in the process of compiling a list of all of the publicly available biology proposals that I’m aware of and will post a list with links soon. It’s my hope that this will serve as a valuable resource for young and old researchers alike and will help to lead the way forward to a more open scientific dialogue.
When last we left our intrepid scientists, they were starting to ponder the changes that might result from the new pre-proposal process. In general, we really like the new system because it helps reviewers focus on the value of big picture thinking and potentially reduces the overall workload of both grant writing and grant reviewing. Of course academics are generally nervous about the major shift in the proposal process (and, let’s face it, change in general). Below we’ll talk about: 1) things we like about the new process; 2) concerns that we’ve heard expressed by colleagues and our thoughts on those issues; and 3) modifications to the system that we think are worth considering.
An emphasis on big picturing thinking. As discussed in part 1, the 4-page proposal seems to shift the focus of the reader from the details of the project to the overall goals of the study. We are excited by this. The combined pre-proposal/full proposal process – with their different strengths and weaknesses – can potentially generate a strong synergy: the pre-proposal panel assesses which proposals could yield important enough results to warrant further scrutiny and the full-proposal panel assesses whether the research plan is sound enough to yield a reasonable chance of success. In the current reality of limited funding, it seems logical to increase the probability that funds go towards research that is both conceptually important and scientifically sound. Since many of us are more comfortable critiquing work based on specific methodological issues than on ‘general interest’ having a phase in the review that helps focus on the importance of the research seems valuable. However, if reviewers still focus primarily on methodological details (as seemed to be the case on Prof-like substance’s panel) then the new system could end up putting even less emphasis on big ideas, because the 4 pages will be entirely filled up with methods. Based on our experience this wasn’t a major concern, but it is definitely a possibility that NSF needs to be aware of.
Reduced reviewer workload: This was the primary motivation for the new system. We feel like we probably spent about as much time pre-panel reading and reviewing proposals, but we enjoyed it more because it involved more thinking about big questions and looking around in the literature and less slogging through 10 pages of methodological details. More importantly, there were no ad hoc reviewers for the pre-proposals, which greatly reduces the overall reviewer burden. The full-proposals will have ad hocs, but because there are fewer of them we should all end up getting fewer requests from NSF.
Reduced grant writer workload: One common concern about the new system is that people who write a successful pre-proposal will then have to also write a 15-page proposal, thus increasing the workload to 20 pages spread across two separate submissions (pre-proposal + proposal). Folks argue that this results in more time grant writing and less time doing science. Our perspective is that while not perfect, the new system is much better than the old system where many people we knew were putting in 1-2 (or even more) 15-page proposals per deadline (i.e., 2-4 proposals/year) with only a 5-10% funding rate (vs. 20-30% for full proposals under the new system). That’s a lot more wasted effort, especially when you consider that much of the prose from the pre-proposal will presumably be used in the full proposal. As grant writers we also really liked that we didn’t need to generate dozens of pages of time consuming supplemental documents (budgets, postdoc mentoring plans, etc.) until we knew there was at least a reasonable chance of the proposal being funded. The scientific community should definitely have a discussion about how to streamline the process further to optimize the ratio of effort in proposal writing and review to quality of science being funded, but the current system is definitely a step forward in our opinion. If you’re interested in some of the mechanisms for how the PI proposal writing workload could be modified – both Prof-Like Substance and Jack’s posts contain some interesting ideas.
New investigators: Everyone, everyone, everyone is concerned about the untenured people. Given the culture among universities that grants = tenure, untenured faculty don’t have the luxury of time, and the big concern is that only having 1 deadline/year gives untenured people fewer chances to get funding before tenure decisions. Since the number of proposals NSF is funding isn’t changing, this isn’t quite as bad as it seems. However, if it takes a new investigator a couple of rounds to make it past the prepoposal stage then they may not have very many tries to figure out how to write a successful full proposal. The counterarguments are that the once-yearly deadline gives investigators more time to refine ideas, digest feedback, obtain friendly reviews from colleagues and therefore (hopefully) submit stronger proposals as a result. It also (potentially) restricts the amount of time that untenured folks spend writing grants, therefore freeing up more time to focus on scholarly publications, mentoring students, and creating strong learning environments in our classrooms, which (theoretically) also are important for tenure. We love the ideas behind the counterarguments and if things really play out that way it would be to the betterment of science, but we do worry about how this ideal fares against the grants=tenure mentality.
Collaboration: One of our big concerns (and that of others as well ) is the potential impact of the 2 proposal limit on interdisciplinary collaboration. Much of science is now highly interdisciplinary and collaborative and if team size is limited because of proposal limits this will make both justifying and accomplishing major projects more difficult. We have already run into this problem both in having former co-PIs remove themselves from existing proposals and in having to turn down potential collaborations. We have no problem with a limit on the number of lead-PI proposals, in a lot of ways we think it will help improve the balance between proposing science and actually doing it, but the limit on collaboration is a major concern.
In general, we think that the new system is a definite improvement over the old system, but there are clearly still things to be discussed and fine tuned. Possible changes to consider include:
- Find a way to allow full proposals that do well to skip the pre-proposal stage the next year. This will reduce stochasticity and frustration. These proposals could still count towards any limit on the number of proposals.
- Clearly and repeatedly communicate to the pre-proposal panels (let’s face it, faculty don’t tend to listen very well) the desired difference in emphasis between evaluating preliminary proposals and full proposals. This will help maintain the emphasis on interesting ideas and might also help alleviate the angst some panelists felt about what to do about proposals that were missing important details but not obviously flawed.
- Consider making the proposal limit on the number of proposals on which someone will be the lead PI. This still discourages excessive submissions without hurting the collaborative, interdisciplinary approach to science that we’ve all been working hard to foster.
So there it is. Our 2-part opinion piece on the new NSF-process. If you were hoping for a pre-proposal magic template, we’re sorry to disappoint, but hopefully you found a lot to think about here while you were looking for it!
UPDATE: If you were hoping for a pre-proposal magic template, checkout the nice post over at Sociobiology.
Before we start, this post refers to posts already written on this topic. To make sure no one gets lost, please follow the sequence of operations below:
Step 1: Do you know about the new pre-proposal process at NSF?
Step 2: Have you read Jack William’s most excellent post (posted on Jacquelyn Gill’s most excellent blog) about a preproposal panelist’s perspective on the new process?
Step 3: Have you read Prof-like Substance’s post about his experience on a pre-proposal panel? (What? You haven’t read Prof-Like Substance’s blog before?! Go check him out.)
- If Yes, continue to Step 4
- If No, go to The Spandrel Shop and read Prof-like Substance’s post and return.
Step 4: Read our post! Like Jack and Prof-Like Substance, we also have experience with the new pre-proposal panels. The nuts and bolts of our experiences were similar to theirs (i.e., number of proposals read, assigning pre-proposals to one of three categories, etc). The main differences are really in our perceptions of the experience and the implications for the broader field. Please remember, there were a TON of pre-proposal panels this spring in both IOS and DEB. Differences from other panelists may reflect idiosyncratic differences in panels or differences in disciplines or just different takes on the same thing – because of NSF confidentiality rules, we can’t identify anything specific about our experiences – so don’t ask. And, speaking of rules: [start legalese] all opinions expressed within this post (including our comments, but not the comments of others) reflect only the aggregated opinions of Ethan & Morgan – henceforth referred to as Weecology – and do not represent official opinions by any entity other than Morgan & Ethan (even our daughter does not claim affiliation with our opinion…though to be honest, she’s two and she disagrees with everything we say anyway). [end legalese]
1) The Importance of Big Ideas. Our perspective on what made for a successful pre-proposal jives largely with Jack’s. The scope of the question being asked was really important. The panelists had to believe that the research would be a strong and important contribution to the field as a whole – not just to a specific system or taxon. Not only did the question being proposed need to be one that would have broad relevance to the program’s mission, it needed a logical framework for accomplishing that goal. In our experience, disconnects between what you propose to address and what you’re actually doing become glaringly obvious in 4 pages.
2) Judging Methods. The limited space for methods was tricky for both reviewers and writers. Sometimes the methods are just bad – if a design is flawed in 4 pages, it’ll still be flawed in 40 pages. The challenge was how to judge proposals where nothing was obviously wrong, but important details were missing. After reviewing full-proposals where you are trying to decide whether a proposal should be funded as is, this was a rough transition to make because all the details can’t reasonably be fit into 4 pages. While the panel was cognizant of this, it is still hard to jettison old habits. Sometimes proposals were nixed because of those missing details and sometimes not. We honestly don’t have a good feel for why, but it might reflect a complex algorithm involving: a) how cool the idea was, b) the abilities of the research team – i.e. is there a PI with demonstrated experience related to the unclear area, and c) just how important did those missing details really seem to a panelist.
3) Methods vs. Ideas. Our impression is that the 4-page format seems to alter the focus of the reviewer. In 15-pages, so much of the proposal is the methods – the details of questions, designs, data collection, analyses. It’s only natural for the reader to focus on what takes up most of the proposal. In contrast, the structure of the pre-proposal really shifts the focus of the reviewer to the idea. Discussions with our fellow panelists suggest we weren’t the only ones to perceive this though it’s important to note that not everyone feels this way – Prof-Like Substance’s post and comments flesh out an alternative to our experience.
4) Reviewers spend more time thinking about your proposal. This was an interesting and unexpected outcome of the short proposals. We both spent more time reading the literature to better understand the relevance of a pre-proposal for the field, looking up techniques, cited literature, etc. There was also a general feeling that panelists were more likely to reread pre-proposals. In our experience, most panelists felt like they spent about as much time reviewing each preproposal as they would a 15-pager, but more of this time was spent reading the literature and thinking about the proposal.
In general, like Jack, we came away with a positive feeling about the ability of the panel to assess the pre-proposals. A common refrain among panelists is that we were generally surprised how well assessing a 4-page proposal actually worked. However, the differences in how a 4-pager is evaluated could have some interesting implications for the type of science funded – something we will speculate on in our next blog post (yes, this is as close as an academic blog gets to a cliff-hanger….).
UPDATE: If you’re looking for the information for 2014, checkout the DEBrief post for links.
UPDATE: If you’re looking for the information for 2013, here’s an updated post.
Since I have now spent far too much time on multiple occasions trying to track down the instructions for the new pre-proposals for NSF DEB and IOS grants I’m going to post the link here under the assumptions that other folks will be looking for this information as well (and also finding it difficult to track down).
Happy post-holiday grant writing to all.
UPDATE 1: Also note that the Biosketches are different for the pre-proposals (changes noted in bold-italics)
Biographical Sketches (2-page limit for each) should be included for each person listed on the Personnel page. It should include the individual’s expertise as related to the proposed research, professional preparation, professional appointments, five relevant publications, five additional publications, and up to five synergistic activities. Advisors, advisees, and collaborators should not be listed on this document, but in a separate table (see below).
UPDATE 2: Though it is not explicitly clear from the link above, Current & Pending Support should NOT be included in pre-proposals (thanks to Alan Tessier for clearing this up).