EcoData Retriever: quickly download and cleanup ecological data so you can get back to doing science
If you’ve every worked with scientific data, your own or someone elses, you know that you can end up spending a lot of time just cleaning up the data and getting it in a state that makes it ready for analysis. This involves everything from cleaning up non-standard nulls values to completely restructuring the data so that tools like R, Python, and database management systems (e.g., MS Access, PostgreSQL) know how to work with them. Doing this for one dataset can be a lot of work and if you work with a number of different databases like I do the time and energy can really take away from the time you have to actually do science.
Over the last few years Ben Morris and I been working on a project called the EcoData Retriever to make this process easier and more repeatable for ecologists. With a click of a button, or a single call from the command line, the Retriever will download an ecological dataset, clean it up, restructure and assemble it (if necessary) and install it into your database management system of choice (including MS Access, PostgreSQL, MySQL, or SQLite) or provide you with CSV files to load into R, Python, or Excel.
Just click on the box to get the data:
Or run a command like this from the command line:
retriever install msaccess BBS --file myaccessdb.accdb
This means that instead of spending a couple of days wrangling a large dataset like the North American Breeding Bird Survey into a state where you can do some science, you just ask the Retriever to take care of it for you. If you work actively with Breeding Bird Survey data and you always like to use the most up to date version with the newest data and the latest error corrections, this can save you a couple of days a year. If you also work with some of the other complicated ecological datasets like Forest Inventory and Analysis and Alwyn Gentry’s Forest Transect data, the time savings can easily be a week.
The Retriever handles things like:
- Creating the underlying database structures
- Automatically determining delimiters and data types
- Downloading the data (and if there are over 100 data files that can be a lot of clicks)
- Transforming data into standard structures so that common tools in R and Python and relational database management systems know how to work with it (e.g., converting cross-tabulated data)
- Converting non-standard null values (e.g., 999.0, -999, NoData) into standard ones
- Combining multiple data files into single tables
- Placing all related tables in a single database or schema
The EcoData Retriever currently includes a number of large, openly available, ecological datasets (see a full list here). It’s also easy to add new datasets to the EcoData Retriever if you want to. For simple data tables a Retriever script can be as simple as:
name: Name of the dataset description: A brief description of the dataset of ~25 words. shortname: A one word name for the dataset table: MyTableName, http://awesomedatasource.com/dataset
We also have some exciting new features on the To Do list including:
- Automatically cleaning up the taxonomy using existing services
- Providing detailed tracking of the provenance of your data by recording the date it was downloaded, the version of the software used, and information about what cleanup steps the Retriever performed
- Integration into R and Python
Let us know what you think we should work on next in the comments.
Martorell, C. & R.P. Freckleton. 2014. Testing the roles of competition, facilitation and stochasticity on community structure in a species-rich assemblage. Journal of Ecology doi:10.1111/1365-2745.12173
At a given location in nature, why are some species present and others absent? Why do some species thrive and have lots of individuals and others are barely eeking out an existence? What determines how many species can live together there? These questions have fascinated (some might say obsessed) community ecologists for an almost embarrassing number of decades. They have proven difficult questions to answer and everyone has their favorite process they like to use to answer those questions. Competition for limiting resources is perennially a favorite process used to explain who gets into a community and who does well once they’re in it. But there are also a number of other processes that clearly play important roles. Theory and data are showing that the movement of species from location to location can alter what species exist where and how many individuals they have at a site. The role of facilitation (positive interactions among species) has increasingly been getting play as well, especially in stressful environments. There can also be a random component to the order that species arrive at a particular location. Because it can be difficult for very similar species to coexist, who is already at a location can influence who can then get into that location (this is sometimes referred to as historical or priority effects). I’m sure I missed some processes and I’m equally sure that someone out there right now is upset I didn’t include theirs. Others might (and by might I mean probably will) disagree with what I’m about to say, but most of the time it seems to me that we spend most of our time arguing about which process is most important. It’s competition! No it’s dispersal limitation! Niches! No niches! I have come to find this binary approach to studying communities wearisome. And here’s why. Does competition influence who exists at a particular location? Yes. Does dispersal? Yes. Does facilitation? Yes. Do stochastic processes? Yes. Do priority effects? Yes. We are at a point in ecology where I think we can feel confident that these various processes both exist and that they affect what we see in nature. Instead, we need to figure out how these processes work together to create the communities we observe. Does the role of a process stay constant through time? Or does it change depending on whether a community has been recently disturbed or is more established? Can we weave together these processes to predict how a community will look through time?
Right about now, you’re wondering if I will ever actually mention the Martorell & Freckleton paper. Here you go. Martorell & Freckleton (2014) take data from a long-term study of plants in Mexico and analyze all the pair-wise interactions among species in order to “document the intensity and demographic importance of interactions and stochasticity in terms of per capita effects, and to set them in a community context”. In effect, they used population models and the spatio-temporal data on plants to assess for each species observed how its presence and population growth/abundance was impacted by interactions with other species, interactions with individuals of the same species, variability in the environment, dispersal, and population stochasticity. If you want to know how they did this, you’ll need to read the paper. They found that both competition and facilitation between species played an important role in determining whether a new species could colonize a particular site. Once established, competition and facilitation played less important roles in explaining the abundance of species. Most of the variation in abundance between species can be explained by interactions with other members of the same species and by stochastic events influencing dynamics at a location.*
So why do I like this paper? Because it’s a step towards that integration of processes that I think we need to start doing. Their end message isn’t: process x affects ‘thing I’m interested in’ y. Their end message is about how these processes are working together and when they play a more (or less) important role for determining what species are present and how well they are doing at a site. Their results suggest a model of communities where interactions among species influences who establishes at a particular location (i.e. the species composition in community ecology lingo). However, stochastic events and interactions among members of the same species become important for understanding differences among species in abundances and population growth rates. Only time will tell if this particular integration of processes holds across different types of ecosystems. But right now it allows us to start talking about more sophisticated models of how species come together to create the diversity of species and abundances in a community.
And what does this paper say about predicting the species in a community and their abundances? My interpretation is that it says what I think a growing number of us have suspected for a while. For a specific location there is not a single expected configuration of a community. There are many possible configurations. This means that precisely predicting the species composition of a community will be difficult. But it also makes me wonder whether it might be possible to predict the space of possibilities and how probable those possibilities are. Given this disturbance rate and this pool of possible species, there’s a 60% chance of this configuration of species, but only a 10% chance for this one. I suspect many of my colleagues think that even this level of prediction or forecasting is pure science fiction thinking on my part. But like some of my other blogging colleagues (hi, Brian! hi, Peter!) I believe that pushing our field from one focused on ‘understanding’ to one focused on ‘forecasting’ or ‘predicting’ is one of the greatest challenges our science faces**. Figuring out how and when different processes operate and what aspects of community structure they are controlling is the first step towards forecasting. And that is exactly why I like this article.
* Disclaimer: I’ve distilled the paper down to the core message of what I found interesting and why. To understand what Martorell & Freckleton did, all of their results, and what they thought made their results interesting, you should really read the paper.
**Acknowledgments: Sadly, I can’t also link to the long and awesome conversations that Ethan, Allen Hurlbert and I have been having on this topic while on sabbatical. Trust me, they’ve been revolutionary experiences that you wish you were there for.
Engaging in Art and Science Collaborations
This is a guest post by Zack Brym (@ZackBrym). He is a graduate student in our group interested in the form and function of orchard trees. He has also developed an interest in scientific communication. He is sharing his recent experience translating his Ph.D. research into dance and what he learned from it.
Why is it that some scientists experience large swings between accomplishing a lot all at once and drought-like periods of inactivity? Successful scientists maintain a manageable pace of activity and work efficiently around deadlines, but sometimes the ups and downs of productivity are dictated by something more intangible. Wisely, an adviser helped explain to me that “Science is a creative process. You just don’t have it in you all the time.” I very much agree with him and have embraced the opportunity to explore creativity in science.
There is a place for creativity within all steps of the scientific process. Innovation is the result of imagining new experimental designs, analytical techniques, or revolutionary ideas. Creativity can also be expressed while troubleshooting an idea or figuring out how to communicate key findings. I believe creativity is especially required for scientists working at the forefront of their discipline and I personally seek it out just as much as I do more conventional academic stimuli.
Communicating science clearly and broadly to the public through creative outreach tools is of increasing importance to scientists. Naturally, there are a number of ways that this is currently happening. Mark Brunson and Michele Baker are working hard at improving visibility of science at Utah State University with their Translational Ecology Curriculum. At the University of Florida, the Creative Campus Committee developed a speaker series to explore “Analogous Thinking in the Arts and Sciences”. Part of the program included a biology professor, Jamie Gillooly, who actively engaged as a scholar in residence at the School of Art. Nalini Nadkarni at the University of Utah has developed the Research Ambassador Program which provides resources to scientists to reach out to diverse public audiences like prisoners, religious groups, elderly, and urban youth. Participants of a successful outreach program are given the opportunity to engage creatively with science which promotes the curiosity of complex systems and critical thinking skills.
At the mid-point of my Ph.D., I used creativity to help express the fundamental concepts of my research by working to convey those ideas clearly through art. This year, I joined the ranks of Dance Your Ph.D. entrants with my video Prune to Wild. Dance Your Ph.D. is an international video competition. The challenge is to communicate a key concept of your Ph.D. research using an interpretive dance. Doing the Dance Your Ph.D. video gave me two great benefits. I produced an amazing outreach tool for my science and I gained new insight and perspective about my research while developing the project. People are constantly asking me more about my research after seeing the video and telling others about it (e.g., Salt Lake Tribune, Herald Journal). And because of this project I have a better idea of what I want to say to them.
My communication skills benefited during the Dance Your Ph.D. project because I had to formulate a simple message about the fundamental concepts of my Ph.D research. To produce a meaningful dance video I was forced to describe my science in a way that my choreographer (Stephanie White) and filmmaker (Andy Lorimer) could translate into a dance video. More so, we were inspired by the Dance Your Ph.D. judging instructions to take an unconventional approach to making the video and emphasize artistry equally to science. The contest rules read, “The judges can penalize videos that rely heavily on elements that do not involve dance at all, such as written text and graphics.” Most entries use words and text in association with dance, but we chose to exclude words or text from the video aside from the title and credits.
If you are only using dance, it becomes extremely difficult to convey a strong message. I provide the intended message of the dance on my professional website for folks who might be less confident with their interpretation. Interestingly, it seems that viewers underestimate the amount of information they get from my video when they first approach me about it. I am generally pleased to confirm their basic understanding as correct and then we get to continue on in greater detail.
Making the video has been such a positive exercise in developing a simple research message that I have continued to seek out opportunities to reconstruct and simplify my thesis. Can you describe your science using only the 1000 most common words? My attempt is:
Wood grows to hold leaves to the sun. Some wood also holds food for people. Humans manage this wood to grow less and be short, but to make more food. I want to know more about the growing of wood so I can learn how to make more food with less wood, work, water, and whatever. (Developed at UpGoer5)
The message of my video was a bit more complex in word choice, but arguably as simple.
My goal is to develop an understanding of fruit trees so that I can recommend a management strategy that acknowledges the physiological constraints imposed on fruit trees through evolution while also producing an economically viable fruit. The resulting tree represents a “natural” tree architecture that actually uses fewer resources to produce fruit by achieving maximum physiological efficiency.
This text is a direct result of my Dance Your Ph.D. project. The video became a truly collaborative effort and I am very grateful for the contributions of artists in creatively describing my research. Collaborations between artists and scientists are becoming more recognized for their benefits. Supporting these efforts is formalized by the “STEM to STEAM” movement which is gaining traction in education policy. What’s more, through this project I am ever sure that science and its creative discoveries should be presented in an open and transparent manner so everyone has the chance to engage in the scientific process and contribute back in meaningful ways.
This is a guest post by Elita Baldridge (@elitabaldridge). She is a graduate student in our group who has been navigating the development of a chronic illness during graduate school. She is sharing her story to help spread awareness of the challenges faced by graduate students with chronic illnesses. She wrote an excellent post on the PhDisabled blog about the initial development of her illness that I encourage you to read first.
During my time as a Ph.D. student, I developed a host of bizarre, productivity eating symptoms, and have been trying to make progress on my dissertation while also spending a lot of time at doctors’ offices trying to figure out what is wrong with me. I wrote an earlier blog post about dealing with the development of a chronic illness as a graduate student at the PhDisabled Blog.
When the rheumatologist handed me a yellow pamphlet labeled “Fibromyalgia”, I felt a great sense of relief. My mystery illness had a diagnosis, so I had a better idea of what to expect. While chronic, at least fibromyalgia isn’t doing any permanent damage to joints or brain. However, there isn’t a lot known about it, the treatment options are limited, and the primary literature is full of appallingly small sample sizes.
There are many symptoms which basically consisting of feeling like you have the flu all the time, with all the associated aches and pains. The worst one for me, because it interferes with my highly prized ability to think, is the cognitive dysfunction, or, in common parlance, “fibro fog”. This is a problem when you are actively trying to get research done, as sometimes you remember what you need to do, but can’t quite figure out how navigating to your files in your computer works, what to do with the mouse, or how to get the computer on. I frequently finish sentences with a wave of my hand and the word “thingy”. Sometimes I cannot do simple math, as I do not know what the numbers mean, or what to do next. Depending on the severity, the cognitive dysfunction can render me unable to work on my dissertation as I simply cannot understand what I am supposed to do. I’m not able to drive anymore, due to the general fogginess, but I never liked driving that much anyway. Sometimes I need a cane, because my balance is off or I cannot walk in a straight line, and I need the extra help. Sometimes I can’t be in a vertical position, because verticality renders me so dizzy that I vomit.
I am actually doing really well for a fibromyalgia patient. I know this, because the rheumatologist who diagnosed me told me that I was doing remarkably well. I am both smug that I am doing better than average, because I’m competitive that way, and also slightly disappointed that this level of functioning is the new good. I would have been more disappointed, only I had a decent amount of time to get used to the idea that whatever was going on was chronic and “good” was going to need to be redefined. My primary care doctor had already found a medication that relieved the aches and pains before I got an official diagnosis. Thus, before receiving an official diagnosis, I was already doing pretty much everything that can be done medication wise, and I had already figured out coping mechanisms for the rest of it. I keep to a strict sleep schedule, which I’ve always done anyway, and I’ve continued exercising, which is really important in reducing the impact of fibromyalgia. I should be able to work up my exercise slowly so that I can start riding my bicycle short distances again, but the long 50+ mile rides I used to do are probably out.
Fortunately, my research interests have always been well suited to a macroecological approach, which leaves me well able to do science when my brain is functioning well enough. I can test my questions without having to collect data from the field or lab, and it’s easy to do all the work I need to from home. My work station is set up right by the couch, so I can lay down and rest when I need to. I have to be careful to take frequent breaks, lest working too long in one position cause a flare up. This is much easier than going up to campus, which involves putting on my healthy person mask to avoid sympathy, pity, and questions, and either a long bus ride or getting a ride from my husband. And sometimes, real people clothes and shoes hurt, which means I’m more comfortable and spending less energy if I can just wear pajamas and socks, instead of jeans and shoes.
Understand that I am not sharing all of this because I want sympathy or pity. I am sharing my experience as a Ph.D. student developing and being diagnosed with a chronic illness because I, unlike many students with any number of other short term or long term disabling conditions, have a lot of support. Because I have a great deal of family support, departmental support, and support from the other Weecologists and our fearless leaders, I should be able to limp through the rest of my Ph.D. If I did not have this support, it is very likely that I would not be able to continue with my dissertation. If I did not have support from ALL of these sources, it is also very likely that I would not be able to continue. While I hope that I will be able contribute to science with my dissertation, I also think that I can contribute to science by facilitating discussion about some of the problems that chronically ill students face, and hopefully finding solutions to some of those problems. To that end, I have started an open GitHub repository to provide a database of resources that can help students continue their training and would welcome additional contributions. Unfortunately, there doesn’t seem to be a lot. Many medical Leave of Absence programs prevent students from accessing university resources- which also frequently includes access to subsidized health insurance and potentially the student’s doctor, as well as removing the student from deferred student loans.
I have fibromyalgia. I also have contributions to make to science. While I am, of course, biased, I think that some contribution is better than no contribution. I’d rather be defined by my contributions, rather than my limitations, and I’m glad that my university and my lab aren’t defining me by my limitations, but are rather helping me to make contributions to science to the best of my ability.
As some of you may know, I’ve been working with Michael Angilletta for the past year on organizing a Gordon Research Conference. I announced the mentoring program that is affiliated with the conference last week, but here is the official info on the conference itself. Please forgive a little repetition from the mentoring program post.
Application Deadline: June 22, 2014
When and Where: July 20-25 2014 at the University of New England, Biddeford Maine
Conference Topic: Many of the impacts humans have on nature affect patterns and processes at multiple spatial, temporal, or organizational scales. Thus predicting the response of nature to human impacts is challenging because changes in one scale can have profound impacts on patterns and processes at other scales of nature. Because ecology has traditionally been focused on patterns and processes at single scales, we have few approaches that allow us to understand cross-scale feedbacks that can influence the patterns and processes we are interested in predicting. The Gordon Research Conference on ‘Unifying Ecology Across Scales: the role of nutrients, metabolism, and physiology’ is a small conference focused on exploring how the availability, acquisition, and transference of energy and nutrients can link patterns and processes across spatial, organizational, and temporal scales. Our goal is to provide a venue for people interested in this topic to discuss the current state of the field and discuss how to promising avenues of future research. Research interests of participants span the diverse areas of ecology, evolution, and physiology, but are united in an interest to use energy and nutrients to unify different areas and approaches to ecology.
What is a Gordon Research Conference?: Gordon Research Conferences (GRC) are well known in some fields, but the number of ecology related GRCs is low, so many of us haven’t heard of one before. A Gordon Research Conference is a small conference ( < 200 people) focused on a specific topic. In our case, the topic is trying to link patterns and processes across scales using nutrients, metabolism, and physiology. Speakers at GRCs are by invite only, but there is a poster session almost every afternoon for attendees to present their research. The poster session is not just for the junior people to present. Well known senior people tack up posters and stand by them too.
The structure of a GRC is also pretty unique. Talks occur in the mornings and evenings, leaving the afternoons free for informal discussions, formation of collaborations, and recreational activities (our conference site has kayaking as well as other organized opportunities). Attendees all sleep in the same dorm and eat at the same cafeteria, further creating opportunities for interactions and discussions.
Applying to attend: Registration is now open.
GRC’s have a unique approach to the application process. You have to submit an application which the conference chairs (that’s me and Michael Angilletta) can then decide to accept or reject. Then you’ll get an ‘invitation’ to actually register. Don’t let the fear of rejection stop you from applying though. We have historically had space for everyone who wants to come.
Special events for graduate students and postdocs: We have a Gordon Research Seminar (GRS) associated with our conference focused on “understanding the drivers of biological systems by integrating metabolism, physiology, and macroecology”. Gordon Research Seminars provide opportunities for graduate students and postdocs to present their research and network with their peers and a small number of senior scientists mentors before the main conference. Feedback from people who have attended these has been universally positive. In fact, when we didn’t have these one year, there was a huge outcry to bring them back. You have to apply for the GRS separately from the GRC. The conference chairs for the GRS are Sarah Supp and Sarah Diamond. The GRS registration process is also currently open. Dates for the GRS are July 19-20, 2014.
This year we are also excited to announce we have a mentoring program at the conference that graduate students and postdocs who plan on attending the conference can apply for. We have limited slots for this (approximately 20). Details can be found here.
Who is Speaking?: To (hopefully) get you even more excited about attending, here is the list of session topics, speakers, and discussion leaders for the conference. UPDATED: We’ve added a number of lightning talks (short talks). Those speakers have now been added below. If you want titles as well, the full schedule for the conference (with talk titles) is available here
Session Topic 1: Developing Unified Theories of Ecology
Leader Name: Pablo Marquet
Session Topic 2: Macrophysiology Meets Macroecology
Leader Name: Lauren Buckley
Session Topic 3: Biogeography of Environmental Tolerance
Leader Name: Jennifer Sunday
Lightning Talks: Lacy Chick / Richard Feldman
Session Topic 4: Metabolic Adaptation to Changing Environments
Leader Name: Craig White
Session Topic 5: Mechanistic Basis of Macroecological Patterns
Leader Name: Brian Enquist
Session Topic 6: Linking Organismal Traits to Community Dynamics
Leader Name: Elena Litchman
Session Topic 7: Using Stoichiometry to Link Organisms and Ecosystems
Leader Name: Susan Kilham
Session Topic 8: Predicting Diversity across Scales
Leader Name: Brian McGill
Session Topic 9: Integrating Ecological Processes at the Macroscale
Leader Name: James Brown
The British Ecological Society has announced that will now allow the submission of papers with preprints (formal language here). This means that you can now submit preprinted papers to Journal of Ecology, Journal of Animal Ecology, Methods in Ecology and Evolution, Journal of Applied Ecology, and Functional Ecology. By allowing preprints BES joins the Ecological Society of America which instituted a pro-preprint policy last year. While BES’s formal policy is still a little more vague than I would like*, they have confirmed via Twitter that even preprints with open licenses are OK as long as they are not updated following peer review.
Preprints are important because they:
- Speed up the progress of science by allowing research to be discussed and built on as soon as it is finished
- Allow early career scientists to establish themselves more rapidly
- Improve the quality of published research by allowing a potentially large pool reviewers to comment on and improve the manuscript (see our excellent experience with this)
BES getting on board with preprints is particularly great news because the number of ecology journals that do not allow preprints is rapidly shrinking to the point that ecologists will no longer need to consider where they might want to submit their papers when deciding whether or not to post preprints. The only major blocker at this point to my mind is Ecology Letters. So, my thanks to BES for helping move science forward!
*Which is why I waited 3 weeks for clarification before posting.
Graduate Student and Postdoctoral Mentoring Program
Gordon Research Conference: Unifying Ecology across Scales
ADDED BELOW: Who can apply is added under financial support (why it’s where will make more sense when you read it)
Time and Place:
July 19-25, 2014 at the University of New England in Biddeford, Maine
Ecological patterns and processes occur at multiple scales of space, time, and organization. This complexity makes predicting ecological responses challenging because changes in one scale can have profound impacts on patterns and processes at other scales. Because subdisciplines have traditionally focused on one or two scales, we have few approaches that enable us to predict the connections and feedbacks across scales that shape biodiversity. This Gordon Research Conference, titled “Unifying ecology across scales: the roles of nutrients, metabolism, and physiology” will bring a small group of experts together to explore how the flow of energy and nutrients can be used to understand patterns and processes across scales. Research interests of the participants will span diverse areas of ecology, evolution, and physiology, but are united by the goal of using energetics and stoichiometry to unify subdisciplines of ecology. The schedule includes a 5-day research conference (co-chaired by Morgan Ernest & Michael Angilletta) preceded by a 2-day research seminar oriented toward students and postdocs (co-chaired by Sarah Supp & Sarah Diamond).
The National Science Foundation will support a mentoring program at this conference aimed at graduate students and postdoctoral researchers. This program will provide participants with the following opportunities: 1) presenting their research as either a short ~10 minute ‘lightning’ talk during the Gordon Research Conference or a full-length 30 minute talk during the Gordon Research Seminar, 2) one-on-one interactions with a more senior researcher at the conference who will serve as a career mentor, and 3) group discussions on topics pertaining to success as an early-career scientist. Applicants must commit to attending the 2-day seminar and the 5-day conference.
Graduate students and postdocs accepted into the program will receive up to $1000 for registration fees and up to $300 for travel expenses. Registration for all events includes meals and housing.
ADDED: While all current graduate students and postdocs are encouraged to apply, financial support is slightly restricted for non-US residents. We cannot fund foreign travel for non US citizens, but can reimburse for travel expenses incurred within the US. Both US and non-US students/postdocs qualify for the registration fees funds. We are allowed for pay for foreign travel for US Citizens.
How to Apply:
Graduate students or postdoctoral researchers interested in participating in the conference mentoring program should send their current curriculum vitae and an abstract of their proposed talk for the conference (≤ 250 words) to Morgan Ernest at firstname.lastname@example.org. Both items should be combined in a single PDF file.
Deadline: 5 pm EST on Feb 1, 2014
For more information:
Gordon Research Conference
Email: Morgan Ernest (email@example.com) or Michael Angilletta (firstname.lastname@example.org)
Gordon Research Seminar
Email: Sarah Supp (email@example.com) or Sarah Diamond (firstname.lastname@example.org)
This is a guest post by Dan McGlinn, a weecology postdoc (@DanMcGlinn on Twitter). It is a Research Summary of: McGlinn, D.J., X. Xiao, and E.P. White. 2013. An empirical evaluation of four variants of a universal species–area relationship. PeerJ 1:e212 http://dx.doi.org/10.7717/peerj.212. These posts are intended to help communicate our research to folks who might not have the time, energy, expertise, or inclination to read the full paper, but who are interested in a <1000 general language summary.
It is well established in ecology that if the area of a sample is increased you will in general see an increase in the number species observed. There are a lot of different reasons why larger areas harbor more species: larger areas contain more individuals, habitats, and environmental variation, and they are likely to cross more barriers to dispersal – all things that promote more species to be able to exist together in an area. We typically observe relatively smooth and simple looking increases in species number with area. This observation has mystified ecologists: How can a pattern that should be influenced by many different and biologically idiosyncratic processes appear so similar across scales, taxonomic groups, and ecological systems?
Recently a theory was proposed (Harte et al. 2008, Harte et al. 2009) which suggests that detailed knowledge of the complex processes that influence the increase in species number may not be necessary to accurately predict the pattern. The theory proposes that ecological systems tend to simply be in their most likely configuration. Specifically, the theory suggests that if we have information on the total number of species and individuals in an area then we can predict the number of species in smaller portions of that area.
Published work on this new theory suggests that it has potential for accurately predicting how species number changes with area; however, it has not been appreciated that there are actually four different ways that the theory can be operationalized to make a prediction. We were interested to learn
- Can the theory accurately predict how species number changes with area across many different ecological systems, and
- Do the different versions of the theory consistently perform better than others
To answer these questions we needed data. We searched online and made requests to our colleagues for datasets that documented the spatial configuration of ecological communities. We were able to pull together a collection of 16 plant community datasets. The communities spanned a wide range of systems including hyper-diverse, old-growth tropical forests, a disturbance prone tropical forest, temperate oak-hickory and pine forests, a Mediterranean mixed-evergreen forest, a low diversity oak woodland, and a serpentine grassland.
Fig 1. A) Results from one of the datasets, the open circles display the observed data and the lines are the four different versions of the theory we examined. B) A comparison of the observed and predicted number of species across all areas and communities we examined for one of the versions of the theory.
Across the different communities we found that the theory was generally quite accurate at predicting the number of species (Fig 1 above), and that one of the versions of the theory was typically better than the others in terms of the accuracy of its predictions and the quantity of information it required to make predictions. There were a couple of noteworthy exceptions in our results. The low diversity oak woodland and the serpentine grassland both displayed unusual patterns of change in richness. The species in the serpentine grassland were more spatially clustered than was typically observed in the other communities and thus better described by the versions of the theory that predicted stronger clustering. Abundance in the oak woodland was primarily distributed across two species whereas the other 5 species where only observed once or twice. This unusual pattern of abundance resulted in a rather unique S-shaped relationship between the number of species and area and required inputting the observed species abundances to accurately model the pattern.
The two key findings from our study were
- The theory provides a practical tool for accurately predicting the number of species in sub-samples of a given site using only information on the total number of species and individuals in that entire area.
- The different versions of the theory do make different predictions and one appears to be superior
Of course there are still a lot of interesting questions to address. One question we are interested in is whether or not we can predict the inputs of the theory (total number of species and individuals for a community) using a statistical model and then plug those predictions into the theory to generate accurate fine-scaled predictions. This kind of application would be important for conservation applications because it would allow scientists to estimate the spatial pattern of rarity and diversity in the community without having to sample it directly. We are also interested in future development of the theory that provides predictions for the number of species at areas that are larger (rather than smaller) than the reference point which may have greater applicability to conservation work.
The accuracy of the theory also has the potential to help us understand the role of specific biological processes in shaping the relationship between species number and area. Because the theory didn’t include any explicit biological processes, our findings suggest that specific processes may only influence the observed relationship indirectly through the total number of species and individuals. Our results do not suggest that biological processes are not shaping the relationship but only that their influence may be rather indirect. This may be welcome news to practitioners who rely on the relationship between species number and area to devise reserve designs and predict the effects of habitat loss on diversity.
Harte, J., A. B. Smith, and D. Storch. 2009. Biodiversity scales from plots to biomes with a universal species-area curve. Ecology Letters 12:789–797.
Harte, J., T. Zillio, E. Conlisk, and A. B. Smith. 2008. Maximum entropy and the state-variable approach to macroecology. Ecology 89:2700–2711.
Doing science in academia involves a lot of rejection and negative feedback. Between grant agencies single digit funding rates, pressure to publish in a few "top" journals all of which have rejection rates of 90% or higher , and the growing gulf between the number of academic jobs and the number of graduate students and postdocs , spending even a small amount of time in academia pretty much guarantees that you’ll see a lot of rejection. In addition, even when things are going well we tend to focus on providing as much negative feedback as possible. Paper reviews, grant reviews, and most university evaluation and committee meetings are focused on the negatives. Even students with awesome projects that are progressing well and junior faculty who are cruising towards tenure have at least one meeting a year where someone in a position of power will try their best to enumerate all of things you could be doing better . This isn’t always a bad thing  and I’m sure it isn’t restricted to academia or science (these are just the worlds I know), but it does make keeping a positive attitude and reasonable sense of self-worth a bit… challenging.
One of the things that I do to help me remember why I keep doing this is my Why File. It’s a file where I copy and paste reminders of the positive things that happen throughout the year . These typically aren’t the sort of things that end up on my CV. I have my CV for tracking that sort of thing and frankly the number of papers I’ve published and grants I’ve received isn’t really what gets me out of bed in the morning. My Why File contains things like:
- Email from students in my courses, or comments on evaluations, telling me how much of an impact the skills they learned have had on their ability to do science
- Notes from my graduate students, postdocs, and undergraduate researchers thanking me for supporting them, inspiring them, or giving them good advice
- Positive feedback from mentors and people I respect that help remind me that I’m not an impostor
- Tweets from folks reaffirming that an issue or approach I’m advocating for is changing what they do or how they do it
- Pictures of thank you cards or creative things that people in my lab have done
- And even things that in a lot of ways are kind of silly, but that still make me smile, like screen shots of being retweeted by Jimmy Wales or of Tim O’Reilly plugging one of my papers.
If you’ve said something nice to me in the past few years be it in person, by email, on twitter, or in a handwritten note, there’s a good chance that it’s in my Why File helping me keep going at the end of a long week or a long day. And that’s the other key message of this post. We often don’t realize how important it is to say thanks to the folks who are having a positive influence on us from time to time. Or, maybe we feel uncomfortable doing so because we think these folks are so talented and awesome that they don’t need it, or won’t care, or might see this positive feedback as silly or disingenuous. Well, as Julio Betancourt once said, "You can’t hug your reprints", so don’t be afraid to tell a mentor, a student, or a colleague when you think they’re doing a great job. You might just end up in their Why File.
What do you do to help you stay sane in academia, science, or any other job that regularly reminds you of how imperfect you really are?
 This idea that where you publish not what you publish is a problem, but not the subject of this post.
 There are lots of great ways to use a PhD, but unfortunately not everyone takes that to heart.
 Of course the people doing this are (at least sometimes) doing so with the best intentions, but I personally think it would be surprisingly productive to just say, "You’re doing an awesome job. Keep it up." every once in a while.
 There is often a goal to the negativity, e.g., helping a paper or person reach their maximum potential, but again I think we tend to undervalue the consequences of this negativity in terms of motivation [4b].
[4b] Hmm, apparently I should write a blog post on this since it now has two footnotes worth of material.
 I use a Markdown file, but a simple text file or a MS Word document would work just fine as well for most things.
Sam Scheiner published a piece recently on ecology’s lack of engagement with theory. Frankly, the title pretty much tells you his conclusion “The ecological literature: an idea free distribution”, but if you want to know more, either read the original piece (it’s short) or EEB & Flow’s nice write up on it. The empirical-theoretical divide is a topic I’ve been pondering for a while. A long time ago (I was a postdoc), in a galaxy far far away (New Mexico), I read an awesome book called “The Making of the Atomic Bomb”*. It’s a wonderful history on the discovery of the atom and the race to harness its energy in the midst of World War II. In the book, a tight interaction between theorists and empiricists is portrayed, with empiricists pouring over the latest theories trying to figure out how to test them and theorists pouring over the latest results trying to understand what they might mean theoretically. It’s a gripping tale. In contrast to the scientific process portrayed in the book, ecology lacks the same tight integration between theoretical development and empirical testing. What is going on in ecology that might be impeding this scientific give and take? I have some ideas, though they are admittedly from the perspective of an empirical ecologist.
1) Empiricists and math literacy. This is the one that will have the theoreticians nodding vigorously. In ecology, empiricists often lack a basic level of comfort and literacy with math. In my graduate level class, we read a lot of primary literature. Some of those papers are math heavy. Without fail, my students see an equation and freeze up. They don’t even know how to think about what that equation might mean. And – let’s be honest here – it’s not just students that this happens to. As I tell my students, math is another language. You don’t necessarily need to be able to speak it fluently, but to be a literate scientist you at least need to be able to ask where the bathroom is. I’d say that right now, many empiricists can’t ask for the bathroom. If we’re going to bridge the empirical-theoretical divide, empiricists need to get more comfortable with seeing and interpreting equations.
2) Theoreticians and ecological literacy. This is the one that will have the empiricists nodding. Theory papers are often more focused on the mathematical aspects of the theory than the ecological meaning of the assumptions, variables, parameters, and predictions. I don’t think it’s a coincidence that some of the theories that have received the most empirical attention were formulated by people with a strong empirical component to their research; examples include: R* coexistence theory (Tilman, long-term field experiments at Cedar Creek), Metabolic Theory of Ecology (Brown, long-term field experiments at Portal), neutral theory (Hubbell, long-term research at Barro Colorado Island), and Chessonian coexistence (Chesson, long-term field experiments in SE Arizona)**. These authors have tried to communicate their theories in biological terms. Given the limited math literacy of empiricists, we need theorists to be better at communicating the ecology captured by the math in order to get empiricists engaged with the theory. Even though the most precise and concise way of providing directions to the bathroom is to provide a latitude, longitude, and datum, it’s really better to tell someone to take a left on the Champs Elysees.
3) Communication between empiricists and theoreticians. Given the two points above, it should come as no surprise that we have relatively limited communication between the groups. All sorts of pathologies can arise when two groups don’t know how to communicate to each other. For example, we have separate theory sessions at the annual Ecological Society of America meetings! I’ve always found that odd. Like theory is its own subdiscipline studying things of little relevance to the other population, community, and ecosystem ecologists at the meeting! If we are not communicating, then empiricists are unaware of relevant theories and theoreticians are unaware of new empirical developments that can improve existing theory or point towards the need for new theories. Without communication, our intellectual progress is severely hampered. We end up with piles of data that are only used for understanding a specific system at a specific point in time. We also end up with piles of theories that serve as little more than mathematical ornaments, because they have not been tested. Maybe I’m alone in this, but I think this is something that needs to be remedied.
4) Testing theories is hard. In ecology, testing theories is often hard. It’s rare that a theory will make predictions that simply require us to document that X impacts Y (e.g., does fire impact nitrogen levels in soils?). Coexistence theories like those developed by Peter Chesson are a great example of this problem. The storage effect and stabilizing vs. equalizing mechanisms for coexistence are big complex concepts that require a lot of thought and effort to test in useful ways. We need a class of creative empiricists, who can engage actively with theory, assess the key aspects of the theory that are testable, and figure out how to design those tests. We also need theorists who communicate broadly about the key predictions of their models, important underlying assumptions, and explicitly describe what good tests of their theories would entail, so that empiricists are correctly testing those models.
So, assuming that a better integration of theory and empirical research is desired, how do we accomplish it? Honestly, I don’t have a prescription for fixing this right now. But I do think there are some key elements that we need to be thinking about:
1) More context specific exposure to mathematics for our undergrads and grads. Shipping them off to Calculus 101 in the Mathematics Department and hoping they pick it up there is clearly not working.
2) Better communication between theorists and empiricists. There’s lots of ways to work on this. In our group, we house my empirically minded students with Ethan’s more quantitative students and also run joint lab meetings. We’ve been pleased with the results, but how to scale this up to whole programs is less clear to me. Another possibility is a series of workshops or even a center whose mission is to bring together theoreticians and empiricists interested in similar questions. The one thing I do know is that this isn’t something we can just fix through the literature. The current barriers are such that we will need venues for in-person exchanges as the two groups learn each other’s languages.
3) Broad conversations about how we test (and improve) theories. As a field, we’ve spent a lot of time talking about how to rigorously test cause-effect relationships and assessing whether patterns in nature are real or can be explained by null models. Our conversations about how to create a good dynamic for designing theory, rigorously testing it, and using the empirical results to improve the theory, has – as far as I am aware – not been very vigorous in ecology.
Addressing these key elements might not create a “Golden Age of Ecology”, but I steadfastly believe that no single approach is sufficient for addressing the complicated questions facing ecology. In that context, improving how theoreticians and empiricists interact can only be a plus.
* Note to the NSA, who undoubtedly had a red light go off somewhere when that precise combination of words crossed their giga-computers sucking in the internet: “The Making of the Atomic Bomb” by Richard Rhodes is a history book, not a how-to manual.
**I think the fact that all of them have long-term field programs is very very interesting, but a topic for another day.