Weecology is moving to the University of Florida

And yes... River is doing the Gator Chomp

We are excited to announce that Weecology will be moving to the University of Florida next summer. We were recruited as part of the UF Rising Preeminence Plan, a major hiring campaign to bring together researchers in a number of focal areas including Big Data and Biodiversity. We will both be joining the Wildlife Ecology and Conservation department, Ethan will be part of UF’s new Informatics Institute, and Morgan will be part of UF’s new Biodiversity Initiative.

As excited as we are about the opportunities at Florida, we are also incredibly sad to be saying goodbye to Utah State University. Leaving was not an easy decision. We have amazing colleagues and friends here in Utah that we will greatly miss. We have also felt extremely well treated by Utah State. They were very supportive while we were getting our programs up and running, including helping us solve the two-body problem. They allowed us to take risks in both research and the classroom. They have been incredibly supportive of our desires for work-life balance, and were very accommodating following the birth of our daughter. It was a fantastic place to spend nearly a decade and we will miss it and the amazing people who made it home.

So why are we leaving? It was a many faceted decisions, but at its core was the realization that the scale of the investment and recruiting of talented folks in both of our areas of interest was something we were unlikely to see again in our careers. The University of Florida has always had a strong ecology group, but between the new folks who have already accepted positions and those we know who are being considered, it is going to be such a talented and exciting group that we just had to be part of it!

As part of the move we’ll be hiring for a number of different positions, so stay tuned!

Ecology Letters now allows preprints; and why this is a big deal for ecology

As announced by Noam Ross on Twitter (and confirmed by the Editor in Chief of Ecology Letters), Ecology Letters will now allow the submission of manuscripts that have been posted as preprints. Details will be published in an editorial in Ecology Letters. I want to say a heartfelt thanks to Marcel Holyoak and the entire Ecology Letters editorial board for listening to the ecological community and modifying their policies. Science is working a little better today than it was yesterday thanks to their efforts.

For those of you who are new to the concept of preprints, they are manuscripts, that have not yet been published in peer reviewed journals, which are posted to websites like arXiv, PeerJ, and bioRxiv. This process allows for more rapid communication of scientific results and improved quality of published papers though more expansive pre-publication peer-review. If you’d like to read more check out our paper on The Case for Open Preprints in Biology.

The fact that Ecology Letters now allows preprints is a big deal for ecology because they were the last of the major ecology journals to make the transition. The ESA journals began allowing preprints just over two years ago and the BES journals made the switch about 9 months ago. In addition, Science, Nature, PNAS, PLOS Biology, and a number of other ecology journals (e.g., Biotropica) all support preprints. This means that all of the top ecology journals, and all of the top general science journals that most ecologists publish in, allow the posting of preprints. As such, there is not longer a reason to not post preprints based on the possibility of not being able to publish in a preferred journal. This can potentially shave months to years off of the time between discovery and initial communication of results in ecology.

It also means that other ecology journals that still do not allow the posting of preprints are under significant pressure to change their policies. With all of the big journals allowing preprints they have no reasonable excuse for not modernizing their policies, and they risk loosing out on papers that are initially submitted to higher profile journals and are posted as preprints.

It’s a good day for science. Celebrate by posting your next manuscript as a preprint.

Macroecology Meeting in the Works

We macroecologists are scattered across the globe, often with little in person access to other macroecologists. Often we’re lucky if there’s another person at our institution that has even heard the word macroecology. Sadly, we don’t have a lot of venues for bringing large groups of macroecologists together. Many of the ones that do occur tend to be local in attendance or focused on one area of macroecology. But plans are afoot, my friends! This week, I received the below letter from Sally Keith – a macroecologist at the Danish Centre for Macroecology, Evolution & Climate and a member of the British Ecological Society Macroecology Special Interest Group. Some of the European Macroecology groups are planning a meet up and there is interest in reaching out across the pond. They are in the  planning phase and are looking for input on good dates/length of meeting. If you’re interested, read the email below and take the short survey to help them think about logistics!

If you want more info, you should email one of the people who signed the below email (I’ve linked to their websites). I’m not an organizer, just a messenger!


Dear All

We are delighted to announce an upcoming joint meeting of the BES Macroecology Group, the GfO Macroecology Group, and the Center for Macroecology, Evolution & Climate (CMEC). The meeting will be hosted by CMEC in Copenhagen, Denmark during June 2015. We are sure it will provide an exciting opportunity for the members of these groups to share their latest research and ideas, and to initiate new collaborations in the relatively informal atmosphere consistent with the society group meetings. 

To help us find the best dates, length of meeting and a good estimate of participant numbers, we would appreciate it if you could spare a couple of minutes to fill out this very short survey: https://www.surveymonkey.com/s/KKFBMHY

Thanks very much 

The organising committee

CMEC: http://macroecology.ku.dk/

BES Macroecology: http://macroecologyuk.weebly.com/

GfO Macroecology: http://www.gfoe.org/en/gfoe-specialist-groups/macroecology.html

 

UPDATE: Fixed lots of broken links and a couple of typos

Organizing a Gender Balanced Conference

There is a lot of discussion on the internet about highly skewed speaker lists at symposia and conferences. For the past year, I’ve been co-organizing a small conference (~110 people) with Michael Angilletta where we’ve been practicing some of the approaches I developed and blogged about earlier for organizing a seminar series. However, in ecology we know that what works at small scales may not apply to larger scales. So, do I still think organizing a conference that is both strong on research and gender diversity is very doable? Read and find out.

But before I give you my thoughts, first, some stats and background. For this conference, we had a lot of moving pieces: Discussion Leaders who helped organize their sessions, Invited Speakers – both for long and short talks, a Mentoring Program for Young Scientists which involved selecting both mentors and mentees. In the end (I hope, I’m writing this about a week before the conference, so hopefully things don’t change drastically), we ended up with the following numbers for each of these parts.

Discussion Leaders: 5 men, 4 women

Invited Long-Talk Speakers (40 min talks): 9 men, 9 women

Invited Short Talk Speakers (12 min talks):9 men, 7 women

Students/Postdocs in Mentoring Program: 10 men, 10 women

Professors (all ranks) in Mentoring Program: 11 men, 9 women

So over all the slots that we invited people to fill, we have 53% men.

What did we do? Just like I talked about in my post about the seminar series, we generated a large pool of names. We started by making a big list of people to lead the various sessions at the conference and developed an invite list that was balanced. We then used our balanced group of Discussion Leaders to brainstorm potential speakers. Each Discussion Leader provided a list of people they thought would be excellent for their session. They were given detailed instructions about how to generate their list – diverse perspectives on their topic, diversity of taxa/ecosystems, including domestic and international scientists, and a reminder to be aware of the gender ratios of their list.

From those lists, Mike and I sat down and constructed our dream team speaker list – balancing research areas, topics and taxa/ecosystems, career stages, making sure we had some international representation, and keeping an eye on gender balance in the process. Then we set out to convince these people to come speak at the conference.

For the Mentoring Program, we ran an application process. We advertised on every social media outlet and listserv we could think of. Our pool of applicants was very gender balanced (23 women, 21 men). We selected 20 young scientists, equally split among male and female, again balancing across various dimensions of research & people diversity.

The Mentors and Short Talk Speakers were harder. Most of our Short Talk Speakers are students from the mentoring program but we had some slots leftover to fill. Both the mentor and short talk speakers needed to fill specific topic requirements either for the program or to overlap with already chosen mentees.

So what lessons did I learn?

Gender balancing a conference is hard, but not in the ways I generally heard about before I started. It is not harder to find female speakers – as long as you don’t restrict yourself to only senior female professors. There are lots of kickass women out there, but you need to embrace the fact that they are scattered across career stages. Women were not more likely to say no. I don’t know why we didn’t experience this commonly reported problem. Maybe it was because I was sending the invites. Are women more likely to say yes to another women asking? I also spent a lot of time sending personalized emails communicating why we thought they in particular would be a good fit for the conference and why we wanted them there (I did this for the men too). What was hard about it was spending the extra time sending personalized emails to communicate clearly why I was inviting them. Did those efforts make a difference? I really don’t know. You’ll have to ask the amazing scientists who said yes to our invites.

Developing at the get-go a diverse pool of people you would like involved is critical. This is another time intensive step. Crowd-sourcing this to our Discussion Leaders helped a lot. Many of them knew speakers (men and women) we hadn’t thought of. When we pooled all those suggestions, we had 123 suggestions for 16 speaking slots. That gave us a ton of flexibility when thinking about the program we wanted to create. It was also really handy when someone said no because all the brainstorming work had already been done. We could sit down with our list and come to consensus quickly on the next invite to send. We often saved up rejections to fill as a group, thus allowing us to manage the diversity better.

The more restricted the slot you’re trying to fill, the harder it is to get gender balance. If your need is 2 kickass people who work in general area X, then gender balance is easy. The more criteria you place or the fewer the number of slots (or both) the harder it gets. Need a senior researcher studying organism X on specific subtopic Y and need another senior researcher studying organisms Z on specific subtopic A in ecosystem Q? Yeah, both those slots are probably going to end up being men, just because of the numbers game. View your program creatively. Be willing to think about different ways people can fit into the program given the diversity of research you’re trying to cover and the multiple facets that everyone has in their research programs.

So my final thoughts on the matter? Making a gender balanced conference is not easy and because of the strong gender skew at the senior levels, it doesn’t just magically happen. It takes work, planning, creativity, and a great team of people helping you brainstorm names. But a 80:20 split in invited speakers is far from the grim ‘reality’ that some might think.

Pre-tenure Advice: Blocking out time for your research

As part of the Carnival that Prof-like Substance is organizing on Pre-tenure advice, I thought I’d throw in a piece of advice that anyone who asks me this question gets from me. Here it is:

Create a calendar and block out time for you.

Sounds simple, and honestly a little stupid, but it’s the best advice I can give. Why?

When you start your job, or a semester, your calendar is empty. You have oodles and oodles of time for you to work on your science. However, you will quickly find that there are a lot of demands on your time. Students want meetings. Faculty want meetings. Collaborators want meetings. Administrators want meetings. You name a type of person (and some you suspect might not be people at all) and they will want you to do something for them. Many of these things will seem quick. The most dangerous words are “This will only take 5 minutes” (It won’t. I promise you, it never does). Next thing you know you have a week chopped up into meetings with only 15 minutes here or there of ‘free time’. That’s now the free time you have to do all that science you were planning on. Trust me, in the 15 minutes you’ll have between meetings and things you have to do to prepare for meetings, you won’t feel like working on your science.

This is where the calendar comes in. Schedule blocks of time during each week for research. Once on your calendar, these blocks are sacrosanct. You wouldn’t cancel that meeting with the Department Chair to meet with Fred down the hall to talk about the seminar committee. Your research is in many ways more important for your future than the Department Chair, so don’t cancel on it.

Why do you need the calendar? Why can’t you just tell yourself that every Monday morning you will focus on research? Because it doesn’t work. When someone comes and asks for a meeting and you think about your week, Monday morning will be ‘free’ in your head. The easiest thing in the world is to fill that ‘empty’ slot and now you’ve just broken up your research time into useless chunks. (Trust me, I’ve been there). On the other hand, if that time is already blocked out on your calendar as ‘busy’ then it’s a reminder that you have something you need to be doing during that time.

As an ancillary note, blocking out time for you and your mental health is also important. Exercise important for keeping you sane? Need time for activities with friends or you go ballistic? Whatever you need to keep you sane and happy, make sure you schedule time for it. Because when you’re insane, your work is never as good as you think it is. Trust me on this.

Which preprint server should I use?

Preprints are rapidly becoming popular in biology as a way to speed up the process of science, get feedback on manuscripts prior to publication, and establish precedence (Desjardins-Proulx et al. 2013). Since biologists are still learning about preprints I regularly get asked which of the available preprint servers to use. Here’s the long-form version of my response.

The good news is that you can’t go wrong right now. The posting of a preprint and telling people about it is far more important than the particular preprint server you choose. All of the major preprint servers are good choices.Of course you still need to pick one and the best way to do that is to think about the differences between available options. Here’s my take on four of the major preprint servers: arXiv, bioRxiv, PeerJ, and figshare.

arXiv

arXiv is the oldest of the science preprint servers. As a result it is the most well established, it is well respected, more people have heard of it than any of the other preprint servers, and there is no risk of it disappearing any time soon. The downside to having been around for a long time is that arXiv is currently missing some features that are increasingly valued on the modern web. In particular there is currently no ability to comment on preprints (though they are working on this) and there are no altmetrics (things like download counts that can indicate how popular a preprint is). The other thing to consider is that arXiv’s focus is on the quantitative sciences, which can be both a pro and a con. If you do math, physics, computer science, etc., this is the preprint server for you. If you do biology it depends on the kind of research you do. If your work is quantitative then your research may be seen by folks outside of your discipline working on related quantitative problems. If your work isn’t particularly quantitative it won’t fit in as well. arXiv allows an array of licenses that can either allow or restrict reuse. In my experience it can take about a week for a preprint to go up on arXiv and the submission process is probably the most difficult of the available options (but it’s still far easier than submitting a paper to a journal).

bioRxiv

bioRxiv is the new kid on the block having launched less than a year ago. It has both commenting and altmetrics, but whether it will become as established as arXiv and stick around for a long time remains to be seen. It is explicitly biology focused and accepts research of any kind in the biological sciences. If you’re a biologist, this means that you’re less likely to reach people outside of biology, but it may be more likely that biology folks come across your work. bioRxiv allows an array of licenses that can either allow or restrict reuse. However, they explicitly override the less open licenses for text mining purposes, so all preprints there can be text-mined. In my experience it can take about a week for a preprint to go up on bioRxiv.

PeerJ Preprints

PeerJ Preprints is another new preprint server that is focused on biology and accepts research from across the biological sciences. Like bioRxiv it has commenting and altmetrics. It is the fastest of the preprint servers, with less than 24 hours from submission to posting in my experience. PeerJ has a strong commitment to open access, so all of it’s preprints are licensed with the Creative Commons Attribution License. PeerJ also publishes an open access journal, but you can post preprints to PeerJ Preprints with out submitting them to the journal (and this is very common). If you do decide to submit your manuscript to the PeerJ journal after posting it as a preprint you can do this with a single click and, should it be published, the preprint will be linked to the paper. PeerJ has the most modern infrastructure of any of the preprint servers, which makes for really pleasant submission, reading, and commenting experiences. You can also earn PeerJ reputation points for posting preprints and engaging in discussions about them. PeerJ is the only major preprint server run by a for-profit company. This is only an issue if you plan to submit your paper to a journal that only allows the posting of non-commercial preprints. I only know of only one journal with this restriction, but it is American Naturalist which can be an important journal in some areas of biology.

Figshare

figshare is a place to put any kind of research output including data, figures, slides, and preprints. The benefit of this general approach to archiving research outputs is that you can use figshare to store all kinds of research outputs in the same place. The downside is that because it doesn’t focus on preprints people may be less likely to find your manuscript among all of the other research objects. One of the things I like about this broad approach to archiving anything is that I feel comfortable posting that isn’t really manuscripts. For example, I post grant proposals there. figshare accepts research from any branch of science and has commenting and altmetrics. There is no delay from submission to posting. Like PeerJ, figshare is a for-profit company and any document posted there will be licensed with the Creative Commons Attribution License.

Those are my thoughts. I have preprints on all three preprint servers + figshare and I’ve been happy with all three experiences. As I said at the beginning, the most important thing is to help speed up the scientific process by posting your work as preprints. Everything else is just details.

UPDATE: It looks like due to a hiccup with scheduling this post than an early version went out to some folks without the figshare section.

UPDATE: In the comments Richard Sever notes that bioRxiv’s preprints are typically posted within 48 hours of submission and that their interpretation of the text mining clause is that this is covered by fair use. See our discussion in the comments for more details.

Why the Ecology Letters editorial board should reconsider its No vote on preprints

As I’ve argued here, and in PLOS Biology, preprints are important. They accelerate the scientific dialog, improve the quality of published research, and provide both a fair mechanism for establishing precedence and an opportunity for early-career researchers to quickly demonstrate the importance of their research. And I’m certainly not the only one who thinks this:

One of the things slowing the use of preprints in ecology is the fact that some journals still have policies against considering manuscripts that have been posted as preprints. The argument is typically based on the Ingelfinger rule, which prohibits publishing the same original research in multiple journals. However, almost no one actually believes that this rule applies to preprints anymore. Science, Nature, PNAS, the Ecological Society of America, the British Ecological Society, the Royal Society, Springer, Wiley, and Elsevier all generally allow the posting of preprints. In fact, there is only one major journal in ecology that does not consider manuscripts that are posted as preprints: Ecology Letters.

I’ve been corresponding with the Editor in Chief of Ecology Letters for some time now attempting to convince the journal to address their outdated approach to preprints. He kindly asked the editorial board to vote on this last fall and has been nice enough to both share the results and allow me to blog about them.

Sadly, the editorial board voted 2:1 to not allow consideration of manuscripts posted as preprints based primarily on the following reasons:

  1. Authors might release results before they have been adequately reviewed and considered. In particular the editors were concerned that “early career authors might do this”.
  2. Because Ecology Letters is considered to be a quick turnaround journal the need for preprints is lessened

I’d like to take this opportunity to explain to the members of the editorial board why these arguments are not valid and why it should reconsider its vote.

First, the idea that authors might release results before they have been sufficiently reviewed is not a legitimate reason for a journal to not consider preprinted manuscripts for the following reasons:

  1. This simply isn’t a journal’s call to make. Journals can make policy based on things like scientific ethics, but preventing researchers from making poor decisions is not their job.
  2. Preprints are understood to not have been peer reviewed. We have a long history in science of getting feedback from other scientists on papers prior to submitting them to journals and I’ve personally heard the previous Editor in Chief of Ecology Letters argue passionately for scientists to get external feedback before submitting to the journal. This is one of the primary reasons for posting preprints; to get review from a much broader audience than the 2-3 reviewers that will look at a paper for a journal.
  3. All of the other major ecology and general science journals already allow preprints. This means that any justification for not allowing them would need to explain why Ecology Letters is different from Science, Nature, PNAS, the ESA journals, the BES journals, the Royal Society journals, and several of the major corporate publishers. In addition, since every other major ecology journal allows preprints, this policy would only influence papers that were intended to be submitted to Ecology Letters. This is such a small fraction of the ecology literature that it will have no influence on the stated goal.
  4. We already present results prior to publication in all kinds of forms, the most common of which is at conferences, so unless we are going to disallow presenting results in talks that aren’t already published this won’t accomplish its stated goal.

Second, the idea that because Ecology Letters is so fast that preprints are unnecessary doesn’t actually hold for most papers. Most importantly, this argument ignores the importance of preprints for providing prepublication review. In addition, in the best case scenario this reasoning only holds for articles that are first submitted to Ecology Letters and are accepted. Ecology Letters has roughly a 90% rejection rate (the last time I heard a number). Since a lot of the papers that are accepted there are submitted elsewhere first I suspect that the proportion of the papers they handle that this argument works for is <5%. For all other papers the delay will be much longer. For example, let’s say I do some super exciting research (well, at least I think it’s super exciting) that I think has a chance at Science/Nature. Science and Nature are fine with me posting a preprint, but since there’s a chance that it won’t get in there, I still can’t post a preprint because I might end up submitting to Ecology Letters. My paper goes out for review at Science but gets rejected, I send it to Nature where it doesn’t go out for review, and then to PNAS where it goes out again and is rejected. I then send it to Letters where it goes out for 2 rounds of review and is eventually accepted. Give or take this process will take about a year, and that’s not a short period of time in science at all.

So, I am writing this in the hopes that the editorial board will reconsider their decision and take Ecology Letters from a journal that is actively slowing down the scientific process back to its proud history of increasing the speed with which scientific communication happens. If you know members of the Ecology Letters editorial board personally I encourage you to email them a link to this article. If any members of the editorial board disagree with the ideas presented here and in our PLOS Biology paper, I encourage them to join me in the comments section to discuss their concerns.

UPDATE: Added Wiley to the list of major publishers that allow preprints. As Emilio Bruna points out in the comments they are happy to have journals that allow posting of preprints and Biotropica is a great example of one of their journals making this shift.

Sharing in Science: my full reply to Eli Kintisch

A couple of weeks ago Eli Kintisch (@elikint) interviewed me for what turned out to be a great article on “Sharing in Science” for Science Careers. He also interviewed Titus Brown (@ctitusbrown) who has since posted the full text of his reply, so I thought I’d do the same thing.

How has sharing code, data, R methods helped you with your scientific research?

Definitely. Sharing code and data helps the scientific community make more rapid progress by avoiding duplicated effort and by facilitating more reproducible research. Working together in this way helps us tackle the big scientific questions and that’s why I got into science in the first place. More directly, sharing benefits my group’s research in a number of ways:

  1. Sharing code and data results in the community being more aware of the research you are doing and more appreciative of the contributions you are making to the field as a whole. This results in new collaborations, invitations to give seminars and write papers, and access to excellent students and postdocs who might not have heard about my lab otherwise.
  2. Developing code and data so that it can be shared saves us a lot of time. We reuse each others code and data within the lab for different projects, and when a reviewer requests a small change in an analysis we can make a small change in our code and then regenerate the results and figures for the project by running a single program. This also makes our research more reproducible and allows me to quickly answer questions about analyses years after they’ve been conducted when the student or postdoc leading the project is no longer in the lab. We invest a little more time up front, but it saves us a lot of time in the long run. Getting folks to work this way is difficult unless they know they are going to be sharing things publicly.
  3. One of the biggest benefits of sharing code and data is in competing for grants. Funding agencies want to know how the money they spend will benefit science as a whole, and being able to make a compelling case that you share your code and data, and that it is used by others in the community, is important for satisfying this goal of the funders. Most major funding agencies have now codified this requirement in the form of data management plans that describe how the data and code will be managed and when and how it will be shared. Having a well established track record in sharing makes a compelling argument that you will benefit science beyond your own publications, and I have definitely benefited from that in the grant review process.

What barriers exist in your mind to more people doing so?

There is a lot of fear about openly sharing data and code. People believe that making their work public will result in being scooped or that their efforts will be criticized because they are too messy. There is a strong perception that sharing code and data takes a lot of extra time and effort. So the biggest barriers are sociological at the moment.

To address these barriers we need to be a better job of providing credit to scientists for sharing good data and code. We also need to do a better job of educating folks about the benefits of doing so. For example, in my experience, the time and effort dedicated to developing and documenting code and data as if you plan to share it actually ends up saving the individual research time in the long run. This happens because when you return to a project a few months or years after the original data collection or code development, it is much easier if the code and data are in a form that makes it easy to work with.

How has twitter helped your research efforts?

Twitter has been great for finding out about exciting new research, spreading the word about our research, getting feedback from a broad array of folks in the science and tech community, and developing new collaborations. A recent paper that I co-authored in PLOS Biology actually started as a conversation on twitter.

How has R Open Science helped you with your work, or why is it important or not?

rOpenSci is making it easier for scientists to acquire and analyze the large amounts of scientific data that are available on the web. They have been wrapping many of the major science related APIs in R, which makes these rich data sources available to large numbers of scientists who don’t even know what an API is. It also makes it easier for scientists with more developed computational skills to get research done. Instead of spending time figuring out the APIs for potentially dozens of different data sources, they can simply access rOpenSci’s suite of packages to quickly and easily download the data they need and get back to doing science. My research group has used some of their packages to access data in this way and we are in the process of developing a package with them that makes one of our Python tools for acquiring ecological data (the EcoData Retriever) easy to use in R.

Any practical tips you’d share on making sharing easier?
We actually wrote a paper on this for data last year: Nine simple ways to make it easier to (re)use your data

One of the things I think is most important when sharing both code and data is to use standard licences. Scientists have a habit of thinking they are lawyers and writing their own licenses and data use agreements that govern how the data and code and can used. This leads to a lot of ambiguity and difficulty in using data and code from multiple sources. Using standard open source and open data licences vastly simplifies the the process of making your work available and will allow science to benefit the most from your efforts.

And do you think sharing data/methods will help you get tenure? Evidence it has helped others?

I have tenure and I certainly emphasized my open science efforts in my packet. One of the big emphases in tenure packets is demonstrating the impact of your research, and showing that other people are using your data and code is a strong way to do this. Whether or not this directly impacted the decision to give me tenure I don’t know. Sharing data and code is definitely beneficial to competing for grants (as I described above) and increasingly to publishing papers as many journals now require the inclusion of data and code for replication. It also benefits your reputation (as I described above). Since tenure at most research universities is largely a combination of papers, grants, and reputation, and I think that sharing at least increases one’s chances of getting tenure indirectly.

UPDATE: Added missing link to Titus Brown’s post: http://ivory.idyll.org/blog/2014-eli-conversation.html

EcoData Retriever: quickly download and cleanup ecological data so you can get back to doing science

Retreiver Logo

If you’ve every worked with scientific data, your own or someone elses, you know that you can end up spending a lot of time just cleaning up the data and getting it in a state that makes it ready for analysis. This involves everything from cleaning up non-standard nulls values to completely restructuring the data so that tools like R, Python, and database management systems (e.g., MS Access, PostgreSQL) know how to work with them. Doing this for one dataset can be a lot of work and if you work with a number of different databases like I do the time and energy can really take away from the time you have to actually do science.

Over the last few years Ben Morris and I been working on a project called the EcoData Retriever to make this process easier and more repeatable for ecologists. With a click of a button, or a single call from the command line, the Retriever will download an ecological dataset, clean it up, restructure and assemble it (if necessary) and install it into your database management system of choice (including MS Access, PostgreSQL, MySQL, or SQLite) or provide you with CSV files to load into R, Python, or Excel.

Just click on the box to get the data:

retriever_main

Or run a command like this from the command line:

retriever install msaccess BBS --file myaccessdb.accdb

This means that instead of spending a couple of days wrangling a large dataset like the North American Breeding Bird Survey into a state where you can do some science, you just ask the Retriever to take care of it for you. If you work actively with Breeding Bird Survey data and you always like to use the most up to date version with the newest data and the latest error corrections, this can save you a couple of days a year. If you also work with some of the other complicated ecological datasets like Forest Inventory and Analysis and Alwyn Gentry’s Forest Transect data, the time savings can easily be a week.

The Retriever handles things like:

  1. Creating the underlying database structures
  2. Automatically determining delimiters and data types
  3. Downloading the data (and if there are over 100 data files that can be a lot of clicks)
  4. Transforming data into standard structures so that common tools in R and Python and relational database management systems know how to work with it (e.g., converting cross-tabulated data)
  5. Converting non-standard null values (e.g., 999.0, -999, NoData) into standard ones
  6. Combining multiple data files into single tables
  7. Placing all related tables in a single database or schema

The EcoData Retriever currently includes a number of large, openly available, ecological datasets (see a full list here). It’s also easy to add new datasets to the EcoData Retriever if you want to. For simple data tables a Retriever script can be as simple as:

name: Name of the dataset
description: A brief description of the dataset of ~25 words.
shortname: A one word name for the dataset
table: MyTableName, http://awesomedatasource.com/dataset

The Retriever has an installer for Windows, an App for Mac, and a package for Ubuntu/Debian Linux. See the quick explanation of how to get started and then go take it for a spin.

If you’re interested in reading more about the Retriever you can checkout the website or read our paper on the project.

We also have some exciting new features on the To Do list including:

  • Automatically cleaning up the taxonomy using existing services
  • Providing detailed tracking of the provenance of your data by recording the date it was downloaded, the version of the software used, and information about what cleanup steps the Retriever performed
  • Integration into R and Python

Let us know what you think we should work on next in the comments.

Why I like this: Martorell and Freckleton (2014)

Martorell, C. & R.P. Freckleton. 2014. Testing the roles of competition, facilitation and stochasticity on community structure in a species-rich assemblage. Journal of Ecology doi:10.1111/1365-2745.12173

At a given location in nature, why are some species present and others absent? Why do some species thrive and have lots of individuals and others are barely eeking out an existence? What determines how many species can live together there? These questions have fascinated (some might say obsessed) community ecologists for an almost embarrassing number of decades. They have proven difficult questions to answer and everyone has their favorite process they like to use to answer those questions. Competition for limiting resources is perennially a favorite process used to explain who gets into a community and who does well once they’re in it. But there are also a number of other processes that clearly play important roles. Theory and data are showing that the movement of species from location to location can alter what species exist where and how many individuals they have at a site. The role of facilitation (positive interactions among species) has increasingly been getting play as well, especially in stressful environments. There can also be a random component to the order that species arrive at a particular location. Because it can be difficult for very similar species to coexist, who is already at a location can influence who can then get into that location (this is sometimes referred to as historical or priority effects).  I’m sure I missed some processes and I’m equally sure that someone out there right now is upset I didn’t include theirs. Others might (and by might I mean probably will) disagree with what I’m about to say, but most of the time it seems to me that we spend most of our time arguing about which process is most important. It’s competition! No it’s dispersal limitation! Niches! No niches! I have come to find this binary approach to studying communities wearisome. And here’s why. Does competition influence who exists at a particular location? Yes. Does dispersal? Yes. Does facilitation? Yes. Do stochastic processes? Yes. Do priority effects? Yes. We are at a point in ecology where I think we can feel confident that these various processes both exist and that they affect what we see in nature. Instead, we need to figure out how these processes work together to create the communities we observe. Does the role of a process stay constant through time? Or does it change depending on whether a community has been recently disturbed or is more established? Can we weave together these processes to predict how a community will look through time?

Right about now, you’re wondering if I will ever actually mention the Martorell & Freckleton paper. Here you go. Martorell & Freckleton (2014) take data from a long-term study of plants in Mexico and analyze all the pair-wise interactions among species in order to “document the intensity and demographic importance of interactions and stochasticity in terms of per capita effects, and to set them in a community context”. In effect, they used population models and the spatio-temporal data on plants to assess for each species observed how its presence and population growth/abundance was impacted by interactions with other species, interactions with individuals of the same species, variability in the environment, dispersal, and population stochasticity. If you want to know how they did this, you’ll need to read the paper. They found that both competition and facilitation between species played an important role in determining whether a new species could colonize a particular site. Once established, competition and facilitation played less important roles in explaining the abundance of species. Most of the variation in abundance between species can be explained by interactions with other members of the same species and by stochastic events influencing dynamics at a location.*

So why do I like this paper? Because it’s a step towards that integration of processes that I think we need to start doing. Their end message isn’t: process x affects ‘thing I’m interested in’ y. Their end message is about how these processes are working together and when they play a more (or less) important role for determining what species are present and how well they are doing at a site. Their results suggest a model of communities where interactions among species influences who establishes at a particular location (i.e. the species composition in community ecology lingo). However, stochastic events and interactions among members of the same species become important for understanding differences among species in abundances and population growth rates. Only time will tell if this particular integration of processes holds across different types of ecosystems. But right now it allows us to start talking about more sophisticated models of how species come together to create the diversity of species and abundances in a community.

And what does this paper say about predicting the species in a community and their abundances? My interpretation is that it says what I think a growing number of us have suspected for a while. For a specific location there is not a single expected configuration of a community. There are many possible configurations. This means that precisely predicting the species composition of a community will be difficult. But it also makes me wonder whether it might be possible to predict the space of possibilities and how probable those possibilities are. Given this disturbance rate and this pool of possible species, there’s a 60% chance of this configuration of species, but only a 10% chance for this one. I suspect many of my colleagues think that even this level of prediction or forecasting is pure science fiction thinking on my part. But like some of my other blogging colleagues (hi, Brian! hi, Peter!) I believe that pushing our field from one focused on ‘understanding’ to one focused on ‘forecasting’ or ‘predicting’ is one of the greatest challenges our science faces**. Figuring out how and when different processes operate and what aspects of community structure they are controlling is the first step towards forecasting. And that is exactly why I like this article.

****

* Disclaimer: I’ve distilled the paper down to the core message of what I found interesting and why. To understand what Martorell & Freckleton did, all of their results, and what they thought made their results interesting, you should really read the paper.

**Acknowledgments: Sadly, I can’t also link to the long and awesome conversations that Ethan, Allen Hurlbert and I have been having on this topic while on sabbatical. Trust me, they’ve been revolutionary experiences that you wish you were there for.

Follow

Get every new post delivered to your Inbox.

Join 1,932 other followers