Author Archives: Ethan White

Open talks and posters from Weecology at #ESA2013

We had a great time at ESA this year and enjoyed getting to interact with lots of both old and new friends and colleagues. Since we’re pretty into open science here at Weecology, it’s probably not surprising that we have a lot of slides (and even scripts) from our many and varied talks and posters posted online, and we thought it might be helpful to aggregate them all in one place. Enjoy.

Thanks to Dan McGlinn for help to assembling the links.

Ignite Talk: Big Data in Ecology

Slides and script from Ethan White’s Ignite talk on Big Data in Ecology from Sandra Chung and Jacquelyn Gill‘s excellent ESA 2013 session on Sharing Makes Science Better. Slides are also archived on figshare.

Title slide

1.  I’m here to talk to you about the use of big data in ecology and to help motivate a lot of the great tools and approaches that other folks will talk about later in the session.

Photos of field work

2.  The definition of big is of course relative, and so when we talk about big data in ecology we typically mean big relative to our standard approaches based on observations and experiments conducted by single investigators or small teams.

Image of Microsoft Excel

3.  And for those of you who prefer a more precise definition, my friend Michael Weiser defines big data and ecoinformatics as involving anything that can’t be successfully opened in Microsoft Excel.

Map of Breeding Bird Survey

4.  Data can be of unusually large size in two ways. It can be inherently large, like citizen science efforts such as Breeding Bird Survey, where large amounts of data are collected in a consistent manner.

Images of Dryad, figshare, and Ecological Archives

5.  Or it can be large because it’s composed of a large number of small datasets that are compiled from sources like Dryad, figshare, and Ecological Archives to form useful compilation datasets for analysis.

Dataset logos

6.  We have increasing amounts of both kinds of data in ecology as a result of both major data collection efforts and an increased emphasis on sharing data.

Maps and quote about large scale ecology from NEON

7-8.  But what does this kind of data buy us. First, big data allows us to work at scales beyond those at which traditional approaches are typically feasible. This is critical because many of the most pressing issues in ecology including climate change, biodiversity, and invasive species operate at broad spatial and long temporal scales.

Map and results of general analysis

9-10.  Second, big data allows us to answer questions in general ways, so that we get the answer today instead of waiting a decade to gradually compile enough results to reach concensus. We can do this by testing theories using large amounts of data from across ecosystems and taxonomic groups, so that we know that our results are general, and not specific to a single system (e.g., White et al. 2012).

The most interesting man in the worlds says: I don't always analyze data, but when I do, I prefer a lot of it

11. This is the promise of big data in ecology, but realizing this potential is difficult because working with either truly big data or data compilations is inherently challenging, and we still lack sufficient data to answer many important questions.

Bullett points: 1. Training, 2. Tools, 3. More data.

12. This means that if we are going to take full advantage of big data in ecology we need 3 things. Training in computational methods for ecologists, tools to make it easier to work with existing data, and more data.

Logos of groups running training initiatives

13. We need to train ecologists in the computational tools needed for working with big data, and there are an increasing number of efforts to do this including Software Carpentry (which I’m actively involved in) as well as training initiatives at many of the data and synthesis centers.

Logos for DataONE, Dryad, NEON, Morpho, and DataUP

14. We need systems for storing, distributing, and searching data like DataONE, Dryad, NEON‘s data portal, as well as the standardized metadata and associated tools that make finding data to answer a particular research question easier.

Screenshot of Ecological Data Wiki

15. We need crowd-sourced systems like the Ecological Data Wiki to allow us to work together on improving insufficient metadata and understanding what kinds of analyses are appropriate for different datasets and how to conduct them rigorously.

rOpenSci and EcoData Retriever logos

16. We need tools for quickly and easily accessing data like rOpenSci and the EcoData Retriever so that we can spend our time thinking and analyzing data rather than figuring out how to access it and restructure it.

Map of Life, GBIF, and EcoData Retriever logos

17. We also need systems that help turn small data into big data compilations, whether it be through centralized standardized databases like GBIF or tools that pull data together from disparate sources like Map of Life.

Screen shot of preprint, and Morpho, DataUP, and CC0 logos

18. And finally we we need to continue to share more and more data and share it in useful ways. With the good formats, standardized metadata, and open licenses that make it easy to work with.

Dataset logos

19. And so, what I would like to leave you with is that we live in an exciting time in ecology thanks to the generation of large amounts of data by citizen science projects, exciting federal efforts like NEON, and a shift in scientific culture towards sharing data openly.

River Ernest-White saying "Aw Dad, Big Data s sch a buzz word"

20. If we can train ecologists to work with and combine existing tools in interesting ways, it will let us combine datasets spanning the surface of the globe and diversity of life to make meaningful predictions about ecological systems.

[Preprint] Nine simple ways to make it easier to (re)use your data

I’m a big fan of preprints, the posting of papers in public archives prior to peer review. Preprints speed up the scientific dialogue by letting everyone see research as it happens, not 6 months to 2 years later following the sometimes extensive peer review process. They also allow more extensive pre-publication peer review because input can be solicited from the entire community of scientists, not just two or three individuals. You can read more about the value of preprints in our preprint about preprints (yes, really) posted on figshare.

In the spirit of using preprints to facilitate broad pre-publication peer review a group of weecologists have just posted a preprint on how to make it easier to reuse data that is shared publicly. Since PeerJ‘s commenting system isn’t live yet we would like to encourage your to provide feedback about the paper here in the comments. It’s for a special section of Ideas in Ecology and Evolution on data sharing (something else I’m a big fan of) that is being organized by Karthik Ram (someone I’m a big fan of).

Our nine recommendations are:

  1. Share your data
  2. Provide metadata
  3. Provide an unprocessed form of the data
  4. Use standard data formats (including file formats, table structures, and cell contents)
  5. Use good null values
  6. Make it easy to combine your data with other datasets
  7. Perform basic quality control
  8. Use an established repository
  9. Use an established and liberal license

Most of this territory has been covered before by a number of folks in the data sharing world, but if you look at the state of most ecological and evolutionary data it clearly bears repeating. In addition, I think that our unique contribution is three fold: 1) We’ve tried hard to stick to relatively simple things that don’t require a huge time commitment to get right; 2) We’ve tried to minimize the jargon and really communicate with the awesome folks who are collecting great data but don’t have much formal background in the best practices of structuring and sharing data; and 3) We contribute the perspective of folks who spend a lot of time working with other people’s data and have therefore encountered many of the most common issues that crop up in ecological and evolutionary data.

So, if you have the time, energy, and inclination, please read the preprint and let us know what you think and what we can do to improve the paper in the comments section.

UPDATE: This manuscript was written in the open on GitHub. You can also feel free to file GitHub issues if that’s more your style.

UPDATE 2: PeerJ has now enabled commenting on preprints, so comments are welcome directly on our preprint as well (https://peerj.com/preprints/7/).

Some alternative advice on how to decide where to submit your paper

Over at Dynamic Ecology this morning Jeremy Fox has a post giving advice on how to decide where to submit a paper. It’s the same basic advice that I received when I started grad school almost 15 years ago and as a result I don’t think it considers some rather significant changes that have happened in academic publishing over the last decade and a half. So, I thought it would be constructive for folks to see an alternative viewpoint. Since this is really a response to Jeremy’s post, not a description of my process, I’m going to use his categories in the same order as the original post and offer my more… youthful… perspective.

  • Aim as high as you reasonably can. The crux of Jeremy’s point is “if you’d prefer for more people to read and think highly of your paper, you should aim to publish it in a selective, internationally-leading journal.” From a practical perspective journal reputation used to be quite important. In the days before easy electronic access, good search algorithms, and social networking, most folks found papers by reading the table of contents of individual journals. In addition, before there was easy access to paper level citation data, and alt-metrics, if you needed to make a quick judgment on the quality of someones science the journal name was a decent starting point. But none of those things are true anymore. I use searches, filtered RSS feeds, Google Scholar’s recommendations, and social media to identify papers I want to read. I do still subscribe to tables of contents via RSS, but I watch PLOS ONE and PeerJ just as closely as Science and Nature. If I’m evaluating a CV as a member of a search committee or a tenure committee I’m interested in the response to your work, not where it is published, so in addition to looking at some of your papers I use citation data and alt-metrics related to your paper. To be sure, there are lots of folks like Jeremy that focus on where you publish to find papers and evaluate CVs, but it’s certainly not all of us.
  • Don’t just go by journal prestige; consider “fit”. Again, this used to mater more before there were better ways to find papers of interest.
  • How much will it cost? Definitely a valid concern, though my experience has been that waivers are typically easy to obtain. This is certainly true for PLOS ONE.
  • How likely is the journal to send your paper out for external review? This is a strong tradeoff against Jeremy’s point about aiming high since “high impact” journals also typically have high pre-review rejection rates. I agree with Jeremy that wasting time in the review process is something to be avoided, but I’ll go into more detail on that below.
  • Is the journal open access? I won’t get into the arguments for open access here, but it’s worth noting that increasing numbers of us value open access and think that it is important for science. We value open access publications so if you want us to “think highly of your paper” then putting it where it is OA helps. Open access can also be important if you “prefer for more people to read… your paper” because it makes it easier to actually do so. In contrast to Jeremy, I am more likely to read your paper if it is open access than if it is published in a “top” journal, and here’s why: I can do it easily. Yes, my university has access to all of the top journals in my field, but I often don’t read papers while I’m at work. I typically read papers in little bits of spare time while I’m at home in the morning or evenings, or on my phone or tablet while traveling or waiting for a meeting to start. If I click on a link to your paper and I hit a paywall then I have to decide whether it’s worth the extra effort to go to my library’s website, log in, and then find the paper again through that system. At this point unless the paper is obviously really important to my research the activation energy typically becomes too great (or I simply don’t have that extra couple of minutes) and I stop. This is one reason that my group publishes a lot using Reports in Ecology. It’s a nice compromise between being open access and still being in a well regarded journal.
  • Does the journal evaluate papers only on technical soundness? The reason that many of us think this approach has some value is simple, it reduces the amount of time and energy spent trying to get perfectly good research published in the most highly ranked journal possible. This can actually be really important for younger researchers in terms of how many papers they produce at certain critical points in the career process. For example, I would estimate that the average amount of time that my group spends getting a paper into a high profile journal is over a year. This is a combination of submitting to multiple, often equivalent caliber, journals until you get the right roll of the dice on reviewers, and the typically extended rounds of review that are necessary to satisfy the reviewers about not only what you’ve done, but satisfying requests for additional analyses that often aren’t critical, and changing how one has described things so that it sits better with reviewers. If you are finishing your PhD then having two or three papers published in a PLOS ONE style journal vs. in review at a journal that filters on “importance” can make a big difference in the prospect of obtaining a postdoc. Having these same papers out for an extra year accumulating citations can make a big difference when applying for faculty positions or going up for tenure if folks who value paper level metrics over journal name are involved in evaluating your packet.
  • Is the journal part of a review cascade? I don’t actually know a lot of journals that do this, but I think it’s a good compromise between aiming high and not wasting a lot of time in review. This is why we think that ESA should have a review cascade to Ecosphere.
  • Is it a society journal? I agree that this has value and it’s one of the reasons we continue to support American Naturalist and Ecology even though they aren’t quite as open as I would personally prefer.
  • Have you had good experiences with the journal in the past? Sure.
  • Is there anyone on the editorial board who’d be a good person to handle your paper? Having a sympathetic editor can certainly increase your chances of acceptance, so if you’re aiming high then having a well matched editor or two to recommend is definitely a benefit.

To be clear, there are still plenty of folks out there who approach the literature in exactly the way Jeremy does and I’m not suggesting that you ignore his advice. In fact, when advising my own students about these things I often actively consider and present Jeremy’s perspective. However, there are also an increasing number of folks who think like I do and who have a very different set of perspectives on these sorts of things. That makes life more difficult when strategizing over where to submit, but the truth is that the most important thing is to do the best science possible and publish it somewhere for the world to see. So, go forth, do interesting things, and don’t worry so much about the details.

UPDATE: More great discussion here, here, here and here. [If I missed yours just let me known in the comments and I"ll add it]

NSF Preproposal Guidelines 2013

UPDATE: If you’re looking for the information for 2014, checkout the DEBrief post for links.

It’s that time of year again when we’re all busy working on preproposals for the National Science Foundation, and just like last year it’s more difficult than you would think to track down the official guidelines using Google. So, for all of you who don’t have a minute to spare, here they are:

Also, remember that Biographical Sketches are different than for full proposals:

Biographical Sketches (2-page limit for each) should be included for each person listed on the Personnel page. It should include the individual’s expertise as related to the proposed research, professional preparation, professional appointments, five relevant publications, five additional publications, and up to five synergistic activities. Advisors, advisees, and collaborators should not be listed on this document, but in a separate table (see below).

And that there is a big stack of things that should not be included at this stage in the process:

Budget, Budget Justification, Facilities, Equipment and Other Resources, Current and Pending Support, Letters of Collaboration, Data Management Plan, Postdoctoral Mentoring Plan, RUI Impact Statement, Certification of RUI Eligibility, or any other Supplementary Documents.

Good luck!

UPDATE: Included separate links for DEB and IOS.

Graduate student opening with Weecology

We’re looking for a new student to join our interdisciplinary research group. The opening is in Ethan’s lab, but the faculty, students, and postdocs in Weecology interact seamlessly among groups. If you’re interested in macroecology, community ecology, or just about anything with a computational/quantitative component to it, we’d love to hear from you. The formal ad is included below (and yes, we did include links to our blog, twitter, and our GitHub repositories in the ad). Please forward this to any students who you think might be a good fit, and let us know if you have any questions.

GRADUATE STUDENT OPENING

The White Lab at Utah State University has an opening for a graduate student with interests in Macroecology, Community Ecology, or Ecological Theory/Modeling.  Active areas of research in the White lab include broad scale patterns related to biodiversity, abundance and body size, ecological dynamics, and the use of sensor networks for studying ecological systems. We use computational, mathematical, and advanced statistical methods in much of our work, so students with an interest in these kinds of methods are encouraged to apply. Background in these quantitative techniques is not necessary, only an interest in learning and applying them. While students interested in one of the general areas listed above are preferred, students are encouraged to develop their own research projects related to their interests. The White Lab is part of an interdisciplinary ecology research group (http://weecology.org) whose goal is to facilitate the broad training of ecologists in areas from field work to quantitative methods. Students with broad interests are jointly trained in an interdisciplinary setting. We are looking for students who want a supportive environment in which to pursue their own ideas. Graduate students are funded through a combination of research assistantships, teaching assistantships, and fellowships. Students interested in pursuing a PhD are preferred. Utah State University has an excellent graduate program in ecology with over 50 faculty and 80+ graduate students across campus affiliated with the USU Ecology Center (http://www.usu.edu/ecology/).

Additional information about the position and Utah State University is available at:
http://whitelab.weecology.org/grad-student-opening

Interested students can find more information about our group by checking out:
Our websites: http://whitelab.weecology.org, http://weecology.org
Our code repositories: http://github.com/weecology
Our blog: http://jabberwocky.weecology.org
And Twitter: http://twitter.com/ethanwhite

Interested students should contact Dr. Ethan White (ethan.white@usu.edu) by December 1st, 2012 with their CV, GPA, GRE scores (if available), and a brief statement of research interests.

ESA journals will now allow papers with preprints

ESA has just announced that it has changed its policy on preprints and will now allow articles that have been posted on major preprint servers, like arXiv, to be considered for publication in its journals.

I am very excited about this change for two reasons. First, as nicely laid out in INNGE blog post by Philippe Desjardins-Proulx*, there are many positive benefits to science of the preprint culture. They make science more accessible, allow researchers to get feedback from the community prior to peer review, and speed up the scientific process by making ideas available to others as quickly as possible. We should take this opportunity as a community to start developing the kind of vibrant preprint culture that has benefited so many other disciplines. Second, I am encouraged by the rapid response of ESA to the concerns expressed by myself and other members of the community, and take it as a sign that my favorite society is open to making the kinds of changes that are necessary to best facilitate science in the modern era. More work is clearly necessary, but this is a very encouraging start.

UPDATE: Carl Boettiger has posted his very nice letter to Don Strong that played an critical roll in taking this discussion from a bunch of folks talking over social media to something that effected meaningful change.

—————————————————————————————————————————–

*See also, posts by GCBias and Titus Brown

A list of publicly available grant proposals in the biological sciences

Recently a bunch of folks in the biological sciences have started sharing their grant proposals openly. Their reasons for doing so are varied (see the links next to their names below), but part of the common justification is a general interest in opening up science so that all stages of the process can benefit from better interaction and communication, and part of it is to provide examples for younger scientists writing grants. To help accomplish both of these goals I’m going to do what Titus Brown suggested and compile a list of all of the available open proposals in the biological sciences (if you’re looking for math proposals they have a list too). Given the limited number of proposals available at the moment I’m just going to maintain the list here, sorted alphabetically by PI. Another way to find proposals is to look at the ‘grant’ and ‘proposal’ tags on figshare, where several of us have been posting proposals. If you know of more proposals, decide to post some yourself, or have corrections to proposal in the list, just let me know in the comments and I’ll keep the list updated. Enjoy!

Casey Bergman (@caseybergman)

Titus Brown (@ctitusbrown; read Titus’ thoughts on sharing proposals)

Scott Chamberlain (@recology_)

Karen Cranston (@kcranstn)

Edmund (Ted) Harte (@DistribEcology)

Jan Jensen (@janhjensen; read Jan’s thoughts on sharing proposals)

Paula Mabee

Rod Page (@rdmpage; read Rod’s thoughts on sharing proposals)

David Pappano (@djpappano)

Heather Piwowar (@researchremix) & Jason Priem (@jasonpriem) (read their thoughts on sharing proposals)

Rosie Redfield (@RosieRedfield)

Tracy Teal (@tracykteal)

Andrew Tredennick (@ATredennick)

Heroen Verbruggen

Todd Vision (@tjvision)

Ethan White (@ethanwhite; read Ethan’s thoughts on sharing proposals)

On making my grant proposals open access

As I announced on Twitter about a week ago, I am now making all of my grant proposals open access. To start with I’m doing this for all of my sole-PI proposals, because I don’t have to convince my collaborators to participate in this rather aggressively open style of science. At the moment this includes three funded proposals: my NSF Postdoctoral Fellowship proposal, an associated Research Starter Grant proposal, and my NSF CAREER award.

So, why am I doing this, especially with the CAREER award that still has several years left on it and some cool ideas that we haven’t worked on yet. I’m doing it for a few reasons. First, I think that openness is inherently good for science. While there may be benefits for me in keeping my ideas secret until they are published, this certainly doesn’t benefit science more broadly. By sharing our proposals the cutting edge of scientific thought will no longer be hidden from view for several years and that will allow us to make more rapid progress. Second, I think having examples of grants available to young scientists has the potential to help them learn how to write good proposals (and other folks seem to agree) and therefore decrease the importance of grantsmanship relative to cool science in the awarding of limited funds. Finally, I just think that folks deserve to be able to see what their tax dollars are paying for, and to be able to compare what I’ve said I will do to what I actually accomplish. I’ve been influenced in my thinking about this by posts by several of the big open science folks out there including Titus Brown, Heather Piwowar, and Rod Page.

To make my grants open access I chose to use figshare for several reasons.

  1. Credit. Figshare assigns a DOI to all of its public objects, which means that you can easily cite them in scientific papers. If someone gets an idea out of one of my proposals and works on it before I do, this let’s them acknowledge that fact. Stats are also available for views, shares, and (soon) citations, making it easier to track the impact of your larger corpus of research outputs.
  2. Open Access. All public written material is licensed under CC-BY (basically just cite the original work) allowing folks to do cool things without asking.
  3. Permanence. I can’t just change my mind and delete the proposal and I also expect that figshare will be around for a long time.
  4. Version control. For proposals that are not funded, revised, not funded, revised, etc. figshare allows me to post multiple versions of the proposal while maintaining the previous versions for posterity/citation.

During this process I’ve come across several other folks doing similar things and even inspired others to post their proposals, so I’m in the process of compiling a list of all of the publicly available biology proposals that I’m aware of and will post a list with links soon. It’s my hope that this will serve as a valuable resource for young and old researchers alike and will help to lead the way forward to a more open scientific dialogue.

ESA journals do not allow papers with preprints

Over the weekend I saw this great tweet:

by Philippe Desjardins-Proulx and was pleased to see yet another actively open young scientist. Then I saw his follow up tweet:

At first I was confused. I thought ESA’s policy was that preprints were allowed based on the following text on their website (emphasis mine: still available in Google’s Cache):

A posting of a manuscript or thesis on a personal or institutional homepage or ftp site will generally be considered as a preprint; this will not be grounds for viewing the manuscript as published. Similarly, posting of manuscripts in public preprint archives or in an institution’s public archive of unpublished theses will not be considered grounds for declaring a manuscript published. If a manuscript is available as part of a digital publication such as a journal, technical series or some other entity to which a library can subscribe (especially if that publication has an ISSN or ISBN), we will consider that the manuscript has been published and is thus not eligible for consideration by our journals. A partial test for prior publication is whether the manuscript has appeared in some entity with archival value so that it is permanently available to reasonably diligent scholars. A necessary test for prior publication is whether the author can legally transfer copyright to ESA.

So I asked Philippe to explain his tweet:

This got me a little riled up so I broadcast my displeasure:

And then Jarrett Byrnes questioned where this was coming from given the stated policy:

So I emailed ESA to check and, sure enough, preprints on arXiv and similar preprint servers are considered prior publication and therefore cannot be submitted to ESA journals, despite the fact that this isn’t a problem for a few journals you may have heard of including Science, Nature, PNAS, and PLoS Biology. ESA (to their credit) has now clarified this point on their website (emphasis mine; thanks to Jaime Ashander for the heads up):

A posting of a manuscript or thesis on an author’s personal or home institution’s website or ftp site generally will not be considered previous publication. Similarly posting of a “working paper” in an institutional repository is allowed so long as at least one of the authors is affiliated with that institution. However, if a manuscript is available as part of a digital publication such as a journal, technical series, or some other entity to which a library can subscribe (especially if that publication has an ISSN or ISBN), we will consider that the manuscript has been published and is thus not eligible for consideration by our journals. Likewise, if a manuscript is posted in a citable public archive outside the author’s home institution, then we consider the paper to be self-published and ineligible for submission to ESA journals. Finally, a necessary test for prior publication is whether the author can legally transfer copyright to ESA.

In my opinion the idea that a preprint is “self-published” and therefore represents prior publication is poorly justified* and not in the best interests of science, and I’m not the only one:

So now I’m hoping that Jarrett is right:

and that things might change (and hopefully soon). If you know someone on the ESA board, please point them in the direction of this post.

UPDATE: Just as I was finishing working on this post ESA responded to the tweet stream from the last few days:

I’m very excited that ESA is reviewing their policies in this area. As I should have said in the original post, I have, up until this year, been quite impressed with ESA’s generally open, and certainly pro-science policies. This last year or so has been a bad one, but I’m hoping that’s just a lag in adjusting to the new era in scientific publishing.

UPDATE 2: ESA has announced that they have changed their policy and will now consider articles with preprints.

———————————————————————————————————————————————————————–

*I asked ESA if they wanted to clarify their justification for this policy and haven’t heard back (though it has been less than 2 days). If they get back to me I’ll update or add a new post.
   
Follow

Get every new post delivered to your Inbox.

Join 1,523 other followers