Ignite Talk: Big Data in Ecology

Slides and script from Ethan White’s Ignite talk on Big Data in Ecology from Sandra Chung and Jacquelyn Gill‘s excellent ESA 2013 session on Sharing Makes Science Better. Slides are also archived on figshare.

Title slide

1.  I’m here to talk to you about the use of big data in ecology and to help motivate a lot of the great tools and approaches that other folks will talk about later in the session.

Photos of field work

2.  The definition of big is of course relative, and so when we talk about big data in ecology we typically mean big relative to our standard approaches based on observations and experiments conducted by single investigators or small teams.

Image of Microsoft Excel

3.  And for those of you who prefer a more precise definition, my friend Michael Weiser defines big data and ecoinformatics as involving anything that can’t be successfully opened in Microsoft Excel.

Map of Breeding Bird Survey

4.  Data can be of unusually large size in two ways. It can be inherently large, like citizen science efforts such as Breeding Bird Survey, where large amounts of data are collected in a consistent manner.

Images of Dryad, figshare, and Ecological Archives

5.  Or it can be large because it’s composed of a large number of small datasets that are compiled from sources like Dryad, figshare, and Ecological Archives to form useful compilation datasets for analysis.

Dataset logos

6.  We have increasing amounts of both kinds of data in ecology as a result of both major data collection efforts and an increased emphasis on sharing data.

Maps and quote about large scale ecology from NEON

7-8.  But what does this kind of data buy us. First, big data allows us to work at scales beyond those at which traditional approaches are typically feasible. This is critical because many of the most pressing issues in ecology including climate change, biodiversity, and invasive species operate at broad spatial and long temporal scales.

Map and results of general analysis

9-10.  Second, big data allows us to answer questions in general ways, so that we get the answer today instead of waiting a decade to gradually compile enough results to reach concensus. We can do this by testing theories using large amounts of data from across ecosystems and taxonomic groups, so that we know that our results are general, and not specific to a single system (e.g., White et al. 2012).

The most interesting man in the worlds says: I don't always analyze data, but when I do, I prefer a lot of it

11. This is the promise of big data in ecology, but realizing this potential is difficult because working with either truly big data or data compilations is inherently challenging, and we still lack sufficient data to answer many important questions.

Bullett points: 1. Training, 2. Tools, 3. More data.

12. This means that if we are going to take full advantage of big data in ecology we need 3 things. Training in computational methods for ecologists, tools to make it easier to work with existing data, and more data.

Logos of groups running training initiatives

13. We need to train ecologists in the computational tools needed for working with big data, and there are an increasing number of efforts to do this including Software Carpentry (which I’m actively involved in) as well as training initiatives at many of the data and synthesis centers.

Logos for DataONE, Dryad, NEON, Morpho, and DataUP

14. We need systems for storing, distributing, and searching data like DataONE, Dryad, NEON‘s data portal, as well as the standardized metadata and associated tools that make finding data to answer a particular research question easier.

Screenshot of Ecological Data Wiki

15. We need crowd-sourced systems like the Ecological Data Wiki to allow us to work together on improving insufficient metadata and understanding what kinds of analyses are appropriate for different datasets and how to conduct them rigorously.

rOpenSci and EcoData Retriever logos

16. We need tools for quickly and easily accessing data like rOpenSci and the EcoData Retriever so that we can spend our time thinking and analyzing data rather than figuring out how to access it and restructure it.

Map of Life, GBIF, and EcoData Retriever logos

17. We also need systems that help turn small data into big data compilations, whether it be through centralized standardized databases like GBIF or tools that pull data together from disparate sources like Map of Life.

Screen shot of preprint, and Morpho, DataUP, and CC0 logos

18. And finally we we need to continue to share more and more data and share it in useful ways. With the good formats, standardized metadata, and open licenses that make it easy to work with.

Dataset logos

19. And so, what I would like to leave you with is that we live in an exciting time in ecology thanks to the generation of large amounts of data by citizen science projects, exciting federal efforts like NEON, and a shift in scientific culture towards sharing data openly.

River Ernest-White saying "Aw Dad, Big Data s sch a buzz word"

20. If we can train ecologists to work with and combine existing tools in interesting ways, it will let us combine datasets spanning the surface of the globe and diversity of life to make meaningful predictions about ecological systems.

Weecology at #ESA2013

It’s that time of year again where we let people know which Weecologists are doing what and where at the annual Ecological Society of America meeting! We have an action packed schedule this year.

Sunday:

Workshop 12: Software Carpentry for Ecologists

8am-5pm, 101G Minneapolis Convention Center

Ethan White and former Weecology undergrad Ben Morris will be helping out with a Software Carpentry Workshop. Learn cool tools to improve your scientific computing practices! So go become a computing ninja*

*your mileage may vary.

Monday

OOS 3-8 Evaluating a general theory of macroecology using big data

4pm, 101C Minneapolis Convention Center

Ethan White will be speaking in the NEON organized oral session on “Plugging into NEON” about why testing theories with a lot of data is a good thing. Trust me, when he means a lot of data, he means A LOT of data.

Tuesday

OOS 8-1 Biotic responses to shifting ecological drivers in a desert community

8 am, 101D Minneapolis Convention Center

Morgan Ernest is speaking in the LTER/LTREB organized oral session on “Legacies From Long-Term Ecological Studies: Using The (Recent) Past To Inform Future Research”. As the title suggests, she’ll be talking about how short-term and long-term shifts in ecological drivers can reorganize communities. Oh, and for the Portal Project fans out there, this is a Portal Project talk.

IGN 2: Sharing Makes Science Better

Organized by Sandra Chung and Jacquelyn Gill

8am-10am, M100IB Minneapolis Convention Center

Ben Morris and Ethan White will both be talking in this Ignite Session and how and why to set your data free.

IGN 2-1: Ethan White – Big Data in Ecology

IGN 2-2: Ben Morris – EcoData Retriever – automates the tasks of fetching, cleaning up, and storing available data sets.

COS 20-6: Life-history trade-offs among core and transient species regulate local diversity and community structure

9:50 am, 101D Minneapolis Convention Center

Sarah Supp (now a post-doc at Stony Brook University) will be presenting some of her dissertation work exploring the cool idea that communities are actually composed of groups representing two different syndromes of traits.  She is also helping represent the Portal crew this year!

PS 21-48: How species richness and total abundance constrain the distribution of abundance

Exhibit Hall B, Minneapolis Convention Center

Ken Locey will be presenting a poster on his dissertation work. Ever have a sneaking suspicion that there was something about the species abundance distribution that we didn’t understand? Grab a beer (or other beverage of choice), go see Ken’s poster, and let Ken blow your mind.

Wednesday

COS 55-1: Connecting the environment to a maximum entropy prediction of the species-abundance distribution across continents and taxa

8 am, L100C, Minneapolis Convention Center

Dan McGlinn, current Weecology post-doc, will present on his work on his latest work applying the Maximum Entropy Theory of Ecology to species-abundance distributions. If Ethan White’s talk on Monday made you crave more Maximum Entropy Theory, then you can get your fix here.

IGN-10 Constraints in Ecology

Organized by Elita Baldridge and Ethan White

1:30-3:30pm, 101C Minneapolis Convention Center

This Ignite session is organized by Weecology graduate student Elita Baldridge & Ethan White (primarily Elita) and it focused on providing a forum for different perspectives on how studying constraints can give us important insights into ecology. It features some Weecologists (Ernest, Locey, McGlinn), but also a bunch of other really outstanding scientists.

IGN 10-1 Ernest – Why constraint based approaches to ecology?

IGN 10-2 Locey – The feasible set: putting pattern in perspective

IGN 10-3 Rominger – Evolutionary Constraints and information entropy in ecology

IGN 10-4 Kaspari, Kay and Powers – Leibig is dead; long love Leibig

IGN 10-5 Lamanna – Constrains on carbon flux in extreme climates

IGN 10-6 McGlinn – Ecological constraints predict the spatial structure of biodiversity

IGN 10-7 Buckley – Thermal and energetic constraints on biogeography in changing environments

IGN 10-8 Diamond – Physiological constraints and predicting

PS 39-65: Evaluating physiological efficiencies of branching structure in low-intensity tart cherry and high-density apple.

Exhibit Hall B, Minneapolis Convention Center

Zack Brym will be presenting a poster on his dissertation research. Do you think apple and cherry trees are just like other trees (except w/ delicious fruit)? Find out why you’re both right and wrong.

Other events:

Finally, last, but certainly not least, we’d like to give a shout out to former Weecology post-doc Kate Thibault, who is now a scientist at NEON. We would list all of her stuff, but she is involved in so many activities and presentations that she would require her own blog post! We are super proud of her through, so go check out: this, this, this, and she’s a co-author on Ethan’s presentation on Monday!

Creating a diverse speaking series: an anecdote

If you have been to a conference recently where speakers are invited, the odds are that you (or someone with you) noticed that the speaker list didn’t really reflect the demographics of the field.

There have been various conversations about a number of recent conferences.  For an example, check out this hilarious post by Jonathan Eisen. The bottom of the post also contains numerous updates containing links to studies and other comments on this issue.

So, with this background in mind, I agreed to organize a small seminar series for the Ecology Center at Utah State University this past year. The seminar series was a monthly lunch meant to bring together the diverse group of people on our campus with scholarly activities related to the environment (which spans multiple departments and colleges).  With the increasing importance of interdisciplinary research, the thought was to create a venue to help us understand the diversity of activities here and to eventually foster activities across disciplines.  It’s one of many things I probably should have said no to*, but I have a special interest in fostering interdisciplinary communication*** so I decided to give it a try. For each lunch I invited 3-5 people to come and each give a 5 minute talk about their research. My mission: to create a speaking list for each lunch that was close to 50:50 gender ratio, at minimum reflected the background ethnic diversity here, and represented multiple departments. Over the academic year, a total of 28 people gave talks, representing 10 departments and 6 colleges. The gender ratio ended up being 50:50. My ethnic diversity was lower than I wanted: 11% from underrepresented groups. It’s hard to figure out what the ethnic diversity of USU’s faculty really is, but this site suggests that perhaps I wasn’t too far off the background****

So that’s the stats of what I did. Given all the scuttlebutt around the internet about people attending conferences/symposia where the invited speakers are highly white male biased, and the recent study that suggests part of the problem is that women say ‘no’ more frequently than men to invites, how did my seminar series end up the way it did?

Short answer: It wasn’t easy.

In reflecting over my past year, here are what I think the important steps were.

Start with a big pool: I generated a pool of possible invitees by going through every department remotely affiliated with the Ecology Center and making a list of names of people who fit the broad theme of the lunch series. Then I talked to colleagues who interacted a lot with other units to get even more names from units that we don’t normally interact with. In the end my list was 54 people. But with that big pool to start with, I had a lot of flexibility as I tried to balance the multiple axes of speaker diversity.

Invitee Categories: When I would set out to organize a lunch, I would start by deciding what departments I wanted. Then I would use my list to pick out a gender balanced list w/ representation from an underrepresented group if that was an option. By being clear upfront about the different diversity axes I was managing, there were clear decisions to make about invitations. Because I had already vetted my list to be all people suitable to speak at the event.

Managing the rejections: This is where the time investment and the big pool really become important. Let’s face it: most of us are in reactionary crisis management mode and when our carefully crafted balanced speaker list gets disrupted by a ‘no’ we just go to the next name that pops in our head. I don’t care what gender/ethnic/other group you belong to, the studies suggest that your knee jerk response won’t add diversity to your list. Given the low diversity in our field, the truth is that even if you were strategic, there may not be another woman/underrepresented minority available that fits a specific type of slot. So what do you do? I waited. I waited for all the rejections to come in and then crafted a second invite list organized around who said yes. Did the man from Biology and the woman from Engineering decline? Not another woman on my list from engineering? Invite a woman from biology and a man from engineering. If the original pool of invitees is big enough, this kind of rearrangement on the second round of invites can be accomplished fairly easily.

Persistence: The advantage of a seminar series over a conference is that if a specific date didn’t work, I could send them other possible dates and see if one of those did work. These people would then be the starting point that I crafted the rest of the speaker list around for some other month.

How do I know these rules worked? Because like you, I’m really busy and sometimes I didn’t follow them. For two months in particular, I let chaos reign. What did I get? One month was all men and 75% were from one department. Interestingly, the other month was all women – though from different departments.  The lesson that I learned from that? Diversity on multiple axes doesn’t just ‘happen’.

My story is frankly, just that. Maybe I got lucky that my seminar series ended up as diverse as it did. But I have to agree with Edna’s (from the Incredibles) paraphrasing of a famous Louis Pasteur quote:

edna

Luck favors the prepared.

 

 

 

 

 

*I think there’s a law of academia where the number of requests you get to do stuff can be fit by the following equation**: Sum(number of times your name has been mentioned recently, weighted by whether your name was used in a positive or negative context) + [social aptitude]^(whether you represent an underrepresented group in your field and how underrepresented is that group)

**And to the quantitative folk out there, no I have no idea what that equation would actually look like. My guess: absolute garbage.

*** Getting familiar with another discipline’s vocabulary can be extremely important for communication. For example, when my ecoinformatics husband says “Sudo, review that manuscript for me” I have learned that he is actually attempting to use Jedi-like computer programming mind tricks to make me do what he wants. Fortunately, I am not (yet) a computer.

**** It depends on what one thinks the ‘ethnicity unknown’ and ‘non-resident alien’ groups represent. I like to think the non-resident aliens are from other planets. That would seriously help the diversity problem on our campus.

The Scientific Sweet Spot? [Updated]

We here at Weecology have just recently discovered John Bruno’s blog SeaMonster, and have been getting a great deal of enjoyment out of it. While perusing some of the posts, we ran across one that made Ethan and I both laugh and cringe at the same time: Are unreasonably harsh reviewers retarding the pace of coral reef science? It’s the troubled story of a young manuscript just trying to get a break in this cruel world of academic publishing. In particular it was this part summarizing the reviews the paper has received that caught our attention:

Reviewer 1: This is impossible!  They are clearly wrong!

Reviewer 2: Everyone knows this! The study lacks novelty and impact!

Ethan and I have long had the hypothesis that getting these types of reviews, where your idea is both wrong and trivial, is a sign you’re on to something good. We call it the Charnov Zone. Why you ask? I spent a couple of years as a postdoc working with Eric Charnov and I learned all sorts of great things from him, some of it scientific and some of it about the more practical aspects of being a scientist. The latter lessons were often delivered as stories, and one of his stories was about him presenting his work at Oxford as a newly minted PhD student1:

Eric Charnov went out to Oxford to present his dissertation research on optimal foraging. When he finished his talk, an eminent biologist2 stood up and proceeded to explain why Ric’s work was deeply and horrifically flawed. After that an extremely eminent evolutionary biologist3, stood up and explained kindly how the work wasn’t wrong at all, just trivial.

What was this work that was both wrong and trivial?

Charnov, EL. 1976. Optimal foraging, the marginal value theorem. Theoretical population biology 9: 129-136.

And yes, it is a Citation Classic that’s been cited nearly 3000 times according to Google Scholar.

While having a manuscript that lands in the Charnov Zone doesn’t necessarily mean you have a Citation Classic on your hands, it probably does mean you have an idea that is causing cognitive dissonance in your field. This particular brand of cognitive dissonance seems to be an indicator that there’s something in the paper that part of your field takes as uninteresting trivia (often without proof) and another part of your field rejects as impossible (and you must be wrong). Thus you have something the field needs to think very carefully about. So, give your manuscript caught in the Charnov Zone a little love. At Weecology, we think that papers that are paradoxically wrong and trivial are in a scientific sweet spot4 and well-worth the effort.

UPDATE: Eric Charnov emailed with a correction to the story. The paper in question was actually Charnov, E.L. 1976. Optimal foraging: attack strategy of a mantid. American Naturalist 110:141-151. This paper is also well cited (~750 citations) and Ric says that the Current Contents group (who managed Citation Classics and solicited the essays about the papers) gave him a choice of writing up either the mantid paper or the marginal value theorem paper. Ric chose the Marginal Value Theorem. Thus the story generally still stands. For insights into the troubles the Marginal Value Theorem paper had in the review process (which is also a Charnov Zone story), see Ric’s comment below.

1 Please note that my memory has become less reliable after having a kid, so this story may or may not accurately reflect what was actually told to me ten years ago!

2 Honestly can’t remember the name, but I assume he was eminent because surely Oxford doesn’t have any other type!

3 My memory says the man’s name rhymed with Dichard Rawkins.

4 Admittedly a frustrating one

[Preprint] Nine simple ways to make it easier to (re)use your data

I’m a big fan of preprints, the posting of papers in public archives prior to peer review. Preprints speed up the scientific dialogue by letting everyone see research as it happens, not 6 months to 2 years later following the sometimes extensive peer review process. They also allow more extensive pre-publication peer review because input can be solicited from the entire community of scientists, not just two or three individuals. You can read more about the value of preprints in our preprint about preprints (yes, really) posted on figshare.

In the spirit of using preprints to facilitate broad pre-publication peer review a group of weecologists have just posted a preprint on how to make it easier to reuse data that is shared publicly. Since PeerJ‘s commenting system isn’t live yet we would like to encourage your to provide feedback about the paper here in the comments. It’s for a special section of Ideas in Ecology and Evolution on data sharing (something else I’m a big fan of) that is being organized by Karthik Ram (someone I’m a big fan of).

Our nine recommendations are:

  1. Share your data
  2. Provide metadata
  3. Provide an unprocessed form of the data
  4. Use standard data formats (including file formats, table structures, and cell contents)
  5. Use good null values
  6. Make it easy to combine your data with other datasets
  7. Perform basic quality control
  8. Use an established repository
  9. Use an established and liberal license

Most of this territory has been covered before by a number of folks in the data sharing world, but if you look at the state of most ecological and evolutionary data it clearly bears repeating. In addition, I think that our unique contribution is three fold: 1) We’ve tried hard to stick to relatively simple things that don’t require a huge time commitment to get right; 2) We’ve tried to minimize the jargon and really communicate with the awesome folks who are collecting great data but don’t have much formal background in the best practices of structuring and sharing data; and 3) We contribute the perspective of folks who spend a lot of time working with other people’s data and have therefore encountered many of the most common issues that crop up in ecological and evolutionary data.

So, if you have the time, energy, and inclination, please read the preprint and let us know what you think and what we can do to improve the paper in the comments section.

UPDATE: This manuscript was written in the open on GitHub. You can also feel free to file GitHub issues if that’s more your style.

UPDATE 2: PeerJ has now enabled commenting on preprints, so comments are welcome directly on our preprint as well (https://peerj.com/preprints/7/).

Some alternative advice on how to decide where to submit your paper

Over at Dynamic Ecology this morning Jeremy Fox has a post giving advice on how to decide where to submit a paper. It’s the same basic advice that I received when I started grad school almost 15 years ago and as a result I don’t think it considers some rather significant changes that have happened in academic publishing over the last decade and a half. So, I thought it would be constructive for folks to see an alternative viewpoint. Since this is really a response to Jeremy’s post, not a description of my process, I’m going to use his categories in the same order as the original post and offer my more… youthful… perspective.

  • Aim as high as you reasonably can. The crux of Jeremy’s point is “if you’d prefer for more people to read and think highly of your paper, you should aim to publish it in a selective, internationally-leading journal.” From a practical perspective journal reputation used to be quite important. In the days before easy electronic access, good search algorithms, and social networking, most folks found papers by reading the table of contents of individual journals. In addition, before there was easy access to paper level citation data, and alt-metrics, if you needed to make a quick judgment on the quality of someones science the journal name was a decent starting point. But none of those things are true anymore. I use searches, filtered RSS feeds, Google Scholar’s recommendations, and social media to identify papers I want to read. I do still subscribe to tables of contents via RSS, but I watch PLOS ONE and PeerJ just as closely as Science and Nature. If I’m evaluating a CV as a member of a search committee or a tenure committee I’m interested in the response to your work, not where it is published, so in addition to looking at some of your papers I use citation data and alt-metrics related to your paper. To be sure, there are lots of folks like Jeremy that focus on where you publish to find papers and evaluate CVs, but it’s certainly not all of us.
  • Don’t just go by journal prestige; consider “fit”. Again, this used to mater more before there were better ways to find papers of interest.
  • How much will it cost? Definitely a valid concern, though my experience has been that waivers are typically easy to obtain. This is certainly true for PLOS ONE.
  • How likely is the journal to send your paper out for external review? This is a strong tradeoff against Jeremy’s point about aiming high since “high impact” journals also typically have high pre-review rejection rates. I agree with Jeremy that wasting time in the review process is something to be avoided, but I’ll go into more detail on that below.
  • Is the journal open access? I won’t get into the arguments for open access here, but it’s worth noting that increasing numbers of us value open access and think that it is important for science. We value open access publications so if you want us to “think highly of your paper” then putting it where it is OA helps. Open access can also be important if you “prefer for more people to read… your paper” because it makes it easier to actually do so. In contrast to Jeremy, I am more likely to read your paper if it is open access than if it is published in a “top” journal, and here’s why: I can do it easily. Yes, my university has access to all of the top journals in my field, but I often don’t read papers while I’m at work. I typically read papers in little bits of spare time while I’m at home in the morning or evenings, or on my phone or tablet while traveling or waiting for a meeting to start. If I click on a link to your paper and I hit a paywall then I have to decide whether it’s worth the extra effort to go to my library’s website, log in, and then find the paper again through that system. At this point unless the paper is obviously really important to my research the activation energy typically becomes too great (or I simply don’t have that extra couple of minutes) and I stop. This is one reason that my group publishes a lot using Reports in Ecology. It’s a nice compromise between being open access and still being in a well regarded journal.
  • Does the journal evaluate papers only on technical soundness? The reason that many of us think this approach has some value is simple, it reduces the amount of time and energy spent trying to get perfectly good research published in the most highly ranked journal possible. This can actually be really important for younger researchers in terms of how many papers they produce at certain critical points in the career process. For example, I would estimate that the average amount of time that my group spends getting a paper into a high profile journal is over a year. This is a combination of submitting to multiple, often equivalent caliber, journals until you get the right roll of the dice on reviewers, and the typically extended rounds of review that are necessary to satisfy the reviewers about not only what you’ve done, but satisfying requests for additional analyses that often aren’t critical, and changing how one has described things so that it sits better with reviewers. If you are finishing your PhD then having two or three papers published in a PLOS ONE style journal vs. in review at a journal that filters on “importance” can make a big difference in the prospect of obtaining a postdoc. Having these same papers out for an extra year accumulating citations can make a big difference when applying for faculty positions or going up for tenure if folks who value paper level metrics over journal name are involved in evaluating your packet.
  • Is the journal part of a review cascade? I don’t actually know a lot of journals that do this, but I think it’s a good compromise between aiming high and not wasting a lot of time in review. This is why we think that ESA should have a review cascade to Ecosphere.
  • Is it a society journal? I agree that this has value and it’s one of the reasons we continue to support American Naturalist and Ecology even though they aren’t quite as open as I would personally prefer.
  • Have you had good experiences with the journal in the past? Sure.
  • Is there anyone on the editorial board who’d be a good person to handle your paper? Having a sympathetic editor can certainly increase your chances of acceptance, so if you’re aiming high then having a well matched editor or two to recommend is definitely a benefit.

To be clear, there are still plenty of folks out there who approach the literature in exactly the way Jeremy does and I’m not suggesting that you ignore his advice. In fact, when advising my own students about these things I often actively consider and present Jeremy’s perspective. However, there are also an increasing number of folks who think like I do and who have a very different set of perspectives on these sorts of things. That makes life more difficult when strategizing over where to submit, but the truth is that the most important thing is to do the best science possible and publish it somewhere for the world to see. So, go forth, do interesting things, and don’t worry so much about the details.

UPDATE: More great discussion here, here, here and here. [If I missed yours just let me known in the comments and I"ll add it]

Do macroecological patterns respond to altered species interactions? [Research Summary]

Communicating research more broadly is not only important for outreach to the public, but with the rapidly expanding literature, we think it’ll also be important for communicating to other scientists. Back in 2012 we started a post type called [Research Summary] which is based on the idea that people might not have time to read a multi-page paper but might be willing to read a <1000 word post conveying the ideas in the paper in a more casual format. Ethan did one of these for an Ecology paper his group published last year. Below, one of our graduate students, Sarah Supp, has taken up the challenge to communicate about her first-authored Ecology paper that just came out and has written the guest post below.

Now, introducing, Sarah Supp (@srsupp for those on twitter):

***************************************

This is a research summary of: S. R. Supp, X. Xiao, S. K. M. Ernest, and E. P. White. 2012. An experimental test of the response of macroecological patterns to altered species interactions. Ecology 93: 2505-2511. doi:10.1890/12-0370.1

While many ecologists focus on why individuals, species, or ecological habitats differ, macroecologists are often fascinated by similarities among different groups or ecosystems. This focus on similarities emerges because macroecologists treat individuals, populations and species as ecological particles, and identify patterns in the structure of these particles to understand ecological systems and organization.

Despite a long history of documenting macroecological patterns, an understanding of what determines pattern behavior, why patterns are so easily predicted by so many different models, and how we should go about addressing real ecological problems using a macroecological approach has still not been reached.

Three common macroecological patterns, and the focus for our study, include: the species abundance distribution (SAD; distribution of abundance across species), the species-area relationship (SAR; accumulation of species across spatial scales), and the species-time relationship (STR; accumulation of species through time). Since these patterns exhibit regular behavior across taxonomic groups and ecological habitats, they are increasingly being used to make inferences about local-scale ecological processes and to inform management decisions.

Diagrams of three macroecological patterns and how they could potentially respond to changes in the removal of seed-eating rodents. Left: Species-Abundance distribution (SAD), Middle: Species-Area Relationship (SAR), Right: Species-Time Relationship (STR)

Diagrams of three macroecological patterns and how they could potentially respond to changes in the removal of seed-eating rodents. Left: Species-Abundance distribution (SAD), Middle: Species-Area Relationship (SAR), Right: Species-Time Relationship (STR)

Recently, discussion on how to do ecology has sometimes presented a dichotomy between two groups: “species identity matters” vs. “species identity is unimportant”.  A more useful way of discussing the problem likely lies in asking more nuanced questions such as: How important are species identities for my specific question? When does species identity impact ecological organization? At what spatial/temporal scales? When are species identities necessary for prediction? For example, recent models suggest that the identity of species within a community or ecosystem is unimportant for predicting the shape of macroecological patterns. Instead, these models suggest that some macroecological patterns may only be sensitive to changes in the species richness or total abundance of the ecosystem being examined. This idea is pretty radical (in our opinion), but untested.

We took an approach that we felt could address a few problems simultaneously: 1) Does species identity play a role in determining the form of macroecological patterns, or are patterns only sensitive to changes in species richness or total abundance?  2) Can we effectively synthesize our detailed knowledge of a system with a macroscopic approach in order to link pattern with process?

We used 15 years of data from the Portal Project, our lab’s long-term research site in southeastern Arizona, to evaluate the response of the summer and winter annual plant communities to selective removal of rodent seed predators. It is not known if altered species identity alone (changes in species composition caused by manipulating an important interaction, seed predation [above]) can lead to shifts in the form of these patterns, in the absence of other changes (such as species richness and total abundance). At the Portal site, interactions within and among rodent and plant communities are well studied and we felt that this made our site an ideal experimental venue for this project. Among experimental treatments (control, kangaroo rat removal, and removal of all rodents), we compared changes in plant species composition, species richness (S), total abundance (N), and the form of each of the macroecological patterns (SAD, SAR, and STR).  Below are examples of the data for each pattern from plot 22, a control plot, and a photo of a plant sampling quadrat. In the photo, California poppy (Escholtzia mexicana) and Stork’s bill dominate (Erodium cicutarium).

Example of sampling quadrat and observed empirical patterns from a specific experimental plot in 2008.

Example of sampling quadrat and observed empirical patterns from a specific experimental plot in 2008.

We found that plant species composition was always influenced by the removal of kangaroo rats and by removal of all rodent seed predators. Interestingly, we also found that removing kangaroo rats (keystone species) did not influence plant species richness or total abundance. This suggests that compensatory dynamics are at work. Finally, when we compared the macroecological patterns among the experimental treatments, we found differences in the macro-patterns only occurred when plant species richness or total abundance was altered, and not by compositional changes alone. (Below: Where the parameters do not cross the dotted line, there were significant differences among the paired manipulations [R-C: total rodent removals vs. controls; K-C: kangaroo rat removals vs. controls]. Note that the only place this occurs is in the winter annual community when species richness [S] and total abundance [N] are affected by the removal of all rodent seed predators. The rodent pictured is Merriam’s Kangaroo Rat, Dipodomys merriami.)

Does removing seed-eating rodents influence the shapes of macroecological patterns in plant communities? Depends on whether removing rodents influences species richness and/or total abundance of plants in the community. (Dashed line represents no difference between treatments)

Does removing seed-eating rodents influence the shapes of macroecological patterns in plant communities? Depends on whether removing rodents influences species richness and/or total abundance of plants in the community. (Dashed line represents no difference between treatments)

So what does this mean? Let’s revisit our initial questions:

1)    Does species identity plan a role in determining the form of macroecological patterns, or are patterns only sensitive to changes in species richness or total abundance?  Our research suggests that the key to interpreting and predicting patterns lies in our ability to make realistic predictions of community level variables, such as species richness or total abundance. Although changes in species richness or total abundance were most the important predictors for change in our macroecological patterns, we are not suggesting that changes in species identity are inconsequential. In fact, we believe that our findings suggest an important, but indirect role, for the role of species interactions in determining macroecological patterns. Species interactions may facilitate or hinder compensation dynamics, which in turn may lead to shifts in the number of species or the total number of individuals in response to manipulation. In a recent paper, Brian McGill suggests that what drives S and N is a central unanswered question in ecology.

2)    Can we effectively synthesize our detailed knowledge of a system with a macroscopic approach in order to link pattern with process? We feel that our approach of using small-scale experimental field data with macroecology (which has largely relied on large-scale observational data) provides a potentially powerful framework for improving our understanding of the linkages between pattern and process. Studies using a similar approach may help bridge important gaps between pattern and process, between local and regional scale ecology and between basic and applied science.

NSF Preproposal Guidelines 2013

UPDATE: If you’re looking for the information for 2014, checkout the DEBrief post for links.

It’s that time of year again when we’re all busy working on preproposals for the National Science Foundation, and just like last year it’s more difficult than you would think to track down the official guidelines using Google. So, for all of you who don’t have a minute to spare, here they are:

Also, remember that Biographical Sketches are different than for full proposals:

Biographical Sketches (2-page limit for each) should be included for each person listed on the Personnel page. It should include the individual’s expertise as related to the proposed research, professional preparation, professional appointments, five relevant publications, five additional publications, and up to five synergistic activities. Advisors, advisees, and collaborators should not be listed on this document, but in a separate table (see below).

And that there is a big stack of things that should not be included at this stage in the process:

Budget, Budget Justification, Facilities, Equipment and Other Resources, Current and Pending Support, Letters of Collaboration, Data Management Plan, Postdoctoral Mentoring Plan, RUI Impact Statement, Certification of RUI Eligibility, or any other Supplementary Documents.

Good luck!

UPDATE: Included separate links for DEB and IOS.

The NSF Proposal Revolution: The DEB Data

Over the past year, you can’t get two scientists together who submit to the BIO Directorate at NSF without the conversation drifting to the radical changes in the proposal process. If you have no idea what I’m talking about, I’ve added some links at the bottom of the post for you to check out.  For everyone else, suffice it to say that there has been immense speculation about the possible impacts of these changes on the scientific process.  Well, over the winter break, DEB released its data to date (IOS did this a little earlier and comparisons between IOS and DEB are discussed here). So let’s see what happened!

Table 1. Basic Stats on Funding Rates

Preproposals Submitted 1624
Preproposal Invites for Full Submission 380
Full proposals recommended for funding 259
*^Number of proposals to be funded 83.6
Preproposal Invitation Rate 23.4%
New Investigator Preproposal Invitation Rate 20.4%
Full Proposal Panel Recommendation Rate 68%
Early Career Investigator Full Proposal Panel Recommendation Rate 35%
*Anticipated Overall Fund Rate on Full Proposal Panel 22%
*^Overall fund rate from preproposal pool 5.1%

^ numbers I’ve estimated given the statistics provided by NSF *value complicated by uncertain fund rate

You’ll notice some of the items in the table are starred.  That’s because things get a little…complicated in the full proposal funding data. When DEB released the data, funding decisions weren’t finalized, so they only had an estimate of funding rates. Also some full proposals didn’t need to submit preproposals to DEB (e.g. CAREER, OPUS, RCN, co-reviewed proposals from other divisions), so the starred items have two possible sources of fuzziness: non-preproposal proposals and uncertain fund rates. The NSF info doesn’t make some of this transparent. For example, NSF reports a full proposal  ‘success rate’ of 35% of the 82 full proposals submitted by early career investigators through the pre proposal process.  However, the accompanying table (see below) on success rates over the past 5 years shows the 2012 data as 16% out of 181 proposals. I assume the numbers don’t match due to the proposals submitted outside the preproposal process (i.e. CAREERs). It’s also unclear to me whether ‘success rate’ is ‘recommended for funding’ or actual funding.

Table 2. Statistics for Early Career Investigators over past 5 years:

Fiscal Year Success Rate # proposals % total submissions
2007 15.3% 308 23.7
2008 13.8% 320 24.2
2009 17.6% 289 22.8
2010 12.4% 363 24.5
2011 12.3% 350 24.8
2012 16.0% 181 22.7

Interesting Stats to Chew on:

1)      Preproposal Funding rates: Let’s assume funding rates for full proposals did not differ between CAREERS, RCNs, or invited preproposals (an assumption that is probably wrong). If that’s the case, then in table 1 I estimated the funding rate of the preproposals at 5.1% (i.e. 5.1% of preproposals eventually got funded as a full proposal). It’s important to note that 5.1% is probably wrong, but how wrong is unclear as it hinges on how different the fund rate is for CAREERS, etc. My guess is that the preproposal fund rate is a little higher because things like CAREERs have a lower fund rate and thus bring down the overall average. However, I’d be surprised if the difference takes preproposals above 10%.

2)      Quality of Full Proposals: 68% of proposals made a funding category (i.e. not allocated to the ‘Do Not Fund’ category). I’d be interested in seeing the data from previous years, but 68% seems high from my limited experience.

3)      Early Career vs overall funding rates: Focusing on the preproposal process data only (i.e. not Table 2 data), my interpretation is that the young people fared well through the preproposal process but took a serious hit in the full proposal process (35% of young investigators recommended for funding vs. an overall recommendation rate of 68%). As a disclaimer, the preproposal data is post-portfolio balancing while the full proposal data seems to be pre-portfolio balancing, so it’s possible that the preproposal panels were equally hard on the youngsters but that the Program Directors corrected for it.

4)      Early Career funding rates: I’ve been studiously ignoring Table 2 (as you might have noticed). The truth is even if funding rates were equal between established and early career scientists, 5.1% success rates (or even 10%) mean that anyone who needs a grant to have a job (whether to get tenure or because they are on soft money) or to keep research going (labs that needs techs or uninterrupted data collection) is in a tough spot right now. Additional biases in funding rates clearly exasperate the situation for our young scientists and this is something we should all be aware of when our young colleagues ask for our help or come up for tenure.

Summary:

There’s enough nebulous stuff here that I’m going to hold off on any grandiose statements until NSF releases its full report in early 2013. But the following are things that the data made me start thinking about:

1)      There’s nothing in the NSF data thus far that changes my opinion about the preproposal revolution: until NSF has more money to fund science, 5.1% funding rates are the real enemy of science. What NSF is doing is more akin to shuffling the deck chairs than to blowing a hole in the hull.

2)      The preproposals per se don’t seem to be filtering out the young people. It’s the full proposal process that seems to be the big hurdle to funding. I suspect that is not a novel result of the new system, but has been true all along. The interesting insight that the preproposal data might suggest is that the lower funding rates have nothing to do with the ideas of the young scientists but more to do about either the methodologies or with how those methodologies are being communicated.

3)      There’s clearly two things that will help our younger scientists: a) increasing funding rates overall (not a solution in NSF’s power) and b) figuring out why the bias in the full proposal rates exists and figuring out how to fix it (something we can all try to work on). Assuming that the lower recommendation rate for full proposals of young career scientists is due to their proposal and not a bias against young scientists (i.e. lower name recognition), then this might be a legitimate argument for how the new system could hurt young people: young people may need more submissions of full proposals (and more panel feedback) before managing to get a proposal recommended for funding.

Additional Links about the Changes at NSF:

Prof Like Substance: NSF BIO decides to screw new investigators, What I learned at an NSF BIO preproposal panel

Contemplative Mammoth: Inside NSF-DEB’s New Pre-Proposals: A Panelist’s Perspective

Jabberwocky Ecology: Changes in NSF process for submissions to DEB and IOS*, NSF Proposal Changes – Follow-up

Graduate student opening with Weecology

We’re looking for a new student to join our interdisciplinary research group. The opening is in Ethan’s lab, but the faculty, students, and postdocs in Weecology interact seamlessly among groups. If you’re interested in macroecology, community ecology, or just about anything with a computational/quantitative component to it, we’d love to hear from you. The formal ad is included below (and yes, we did include links to our blog, twitter, and our GitHub repositories in the ad). Please forward this to any students who you think might be a good fit, and let us know if you have any questions.

GRADUATE STUDENT OPENING

The White Lab at Utah State University has an opening for a graduate student with interests in Macroecology, Community Ecology, or Ecological Theory/Modeling.  Active areas of research in the White lab include broad scale patterns related to biodiversity, abundance and body size, ecological dynamics, and the use of sensor networks for studying ecological systems. We use computational, mathematical, and advanced statistical methods in much of our work, so students with an interest in these kinds of methods are encouraged to apply. Background in these quantitative techniques is not necessary, only an interest in learning and applying them. While students interested in one of the general areas listed above are preferred, students are encouraged to develop their own research projects related to their interests. The White Lab is part of an interdisciplinary ecology research group (http://weecology.org) whose goal is to facilitate the broad training of ecologists in areas from field work to quantitative methods. Students with broad interests are jointly trained in an interdisciplinary setting. We are looking for students who want a supportive environment in which to pursue their own ideas. Graduate students are funded through a combination of research assistantships, teaching assistantships, and fellowships. Students interested in pursuing a PhD are preferred. Utah State University has an excellent graduate program in ecology with over 50 faculty and 80+ graduate students across campus affiliated with the USU Ecology Center (http://www.usu.edu/ecology/).

Additional information about the position and Utah State University is available at:
http://whitelab.weecology.org/grad-student-opening

Interested students can find more information about our group by checking out:
Our websites: http://whitelab.weecology.org, http://weecology.org
Our code repositories: http://github.com/weecology
Our blog: http://jabberwocky.weecology.org
And Twitter: http://twitter.com/ethanwhite

Interested students should contact Dr. Ethan White (ethan.white@usu.edu) by December 1st, 2012 with their CV, GPA, GRE scores (if available), and a brief statement of research interests.

Follow

Get every new post delivered to your Inbox.

Join 1,526 other followers