A friend of mine once joked that doing ecological informatics meant working with data that was big enough that you couldn’t open it in an Excel spreadsheet. At the time (~6 years ago) that meant a little over 64,000 rows in a table). Times have changed a bit since then, We now talk about “big data” instead of “informatics”, Excel can open a table with a little over 1,000,000 rows of data, and most importantly there is an ever increasing amount of publicly available ecological, evolutionary, and environmental data that we can use for tackling ecological questions.
I’ve been into using relatively big data since I entered graduate school in the late 1990s. My dissertation combined analyses of the Breeding Bird Survey of North America (several thousand sites) and assembling hundreds of other databases to understand how patterns varied across ecosystems and taxonomic groups.
One of the reasons that I like using large amounts of data is that has the potential to gives us general answers to ecological questions quickly. The typical development of an ecological idea over the last few decades can generally be characterized as:
- Come up with an idea
- Test it with one or a few populations, communities, etc.
- Publish (a few years ago this would often come even before Step 2)
- In a year or two test it again with a few more populations, communities, etc.
- Either find agreement with the original study or find a difference
- Debate generality vs. specificity
- Lather, rinse, repeat
After a few rounds of this, taking roughly a decade, we gradually started to have a rough idea of whether the initial result was general and if not how it varied among ecosystems, taxonomic groups, regions, etc.
This is fine, and in cases where new data must be generated to address the question this is pretty much what we have to do, but wouldn’t it be better if we could ask and answer the question more definitely with the first paper. This would allow us to make more rapid progress as a science because instead of repeatedly testing and reevaluating the original analysis we would be moving forward and building on the known results. And even if it still takes time to get to this stage, as with meta-analyses that build on decades of individual tests, using all of the available data still provides us with a general answer that is clearer and more (or at least differently) informative than simply reading the results of dozens of similar papers.
So, to put it simply, one of the benefits of using “big data” is to get the most general answer possible to the question of interest.
Now, it’s clear that this idea doesn’t sit well with some folks. Common responses to the use of large datasets (or compilations of small ones) include concerns about the quality of large datasets or the ability of individuals who haven’t collected the data to fully understand it. My impression is that these concerns stem from a tendancy to associate “best” with “most precise”. My personal take is that being precise is only half of the problem. If I collect the best dataset imaginable for characterizing pattern/process X, but it only provides me with information on a single taxonomic group at a single site, then, while I can have a lot of confidence in my results, I have no idea whether or not my results apply beyond my particular system. So, precision is great, but so is getting genearlizable results, and these two things trade off against one another.
Which leads me to what I increasingly consider to be the ideal scenario for areas of ecological research where some large datasets (either inherently large or assembled from lots of small datasets) can be applied to the question of interest. I think the ideal scenario is a combination of “high quality” and “big” data. By analyzing these two sets of data separately, and determining if the results are consistent we can have the maximum confidence in our understanding of the pattern/process. This is of course not trivial to do. First it requires a clear idea of what is high quality for a particular question and what isn’t. In my experience folks rarely agree on this (which is why I built the Ecological Data Wiki). Second, it further increases the amount of time, effort, and knowledge that goes into the ideal study, and finding the resources to identify and combine these two kinds of data will not be easy. But, if we can do this (and I think I remember seeing it done well in some recent ecological meta-analyses that I can’t seem to find at the moment) then we will have the best possible answer to an ecological question.
- Big data and the future of ecology
- The new bioinformatics: integrating ecological data from the gene to the biosphere
- Statistical machismo (for more on the tradeoffs inherent in being more precise)
As a budding macroecologist, I have thought a lot about what skills I need to acquire during my Ph.D. This is my model of the four basic attributes for a macroecologist, although I think it is more generally applicable to many ecologists as well:
- Knowledge of SQL
- Dealing with proper database format and structure
- Finding data
- Appropriate treatments of data
- Understanding what good data are
- Monte Carlo methods
- Maximum likelihood methods
- Power analysis
- Higher level calculus
- Should be able to derive analytical solutions for problems
- Should be able to write programs for analysis, not just simple statistics and simple graphs.
- Able to use version control
- Once you can program in one language, you should be able to program in other languages without much effort, but should be fluent in at least one language.
Achieve expertise in at least 2 out of the 4 basic areas, but be able to communicate with people who have skills in the other areas. However, if you are good at collaboration and come up with really good questions, you can make up for skill deficiencies by collaborating with others who possess those skills. Start with smaller collaborations with the people in your lab, then expand outside your lab or increase the number of collaborators as your collaboration skills improve.
Achieving proficiency in an area is best done by using it for a project that you are interested in. The more you struggle with something, the better you understand it eventually, so working on a project is a better way to learn than trying to learn by completing exercises.
The attribute should be generalizable to other problems: For example, if you need to learn maximum likelihood for your project, you should understand how to apply it to other questions. If you need to run an SQL query to get data from one database, you should understand how to write an SQL query to get data from a different database.
In graduate school:
Someone who wants to compile their own data or work with existing data sets needs to develop a good intuitive feel for data; even if they cannot write SQL code, they need to understand what good and bad databases look like and develop a good sense for questionable data, and how known issues with data could affect the appropriateness of data for a given question. The data skill is also useful if a student is collecting field data, because a little bit of thought before data collection goes a long way toward preventing problems later on.
A student who is getting a terminal master’s and is planning on using pre-existing data should probably be focusing on the data skill (because data is a highly marketable skill, and understanding data prevents major mistakes). If the data are not coming from a central database, like the BBS, where the quality of the data is known, additional time will have to be added for time to compile data, time to clean the data, and time to figure out if the data can be used responsibly, and time to fill holes in the data.
Master’s students who want to go on for a Ph.D. should decide what questions they are interested in and should try to pick a project that focuses on learning a good skill that will give them a headstart- more empirical (programming or stats), more theoretical (math), more applied (math (e.g., for developing models), stats(e.g., applying pre-existing models and evaluating models, etc.), or programming (e.g. making tools for people to use)).
Ph.D. students need to figure out what types of questions they are interested in, and learn those skills that will allow them to answer those questions. Don’t learn a skill because it is trendy or you think it will help you get a job later if you don’t actually want to use that skill. Conversely, don’t shy away from learning a skill if it is essential for you to pursue the questions you are interested in.
Right now, as a Ph.D. student, I am specializing in data and programming. I speak enough math and stats that I can communicate with other scientists and learn the specific analytical techniques I need for a given project. For my interests (testing questions with large datasets), I think that by the time I am done with my Ph.D., I will have the skills I need to be fairly independent with my research.
We had a great time at ESA this year and enjoyed getting to interact with lots of both old and new friends and colleagues. Since we’re pretty into open science here at Weecology, it’s probably not surprising that we have a lot of slides (and even scripts) from our many and varied talks and posters posted online, and we thought it might be helpful to aggregate them all in one place. Enjoy.
- Morgan Ernest‘s Ignite talk on Why Constraint Based Approaches to Ecology (with script)
- Morgan Ernest‘s talk on Biotic Responses to shifting Ecological Drivers in a Desert Community
- Ethan White‘s Ignite talk on Big Data in Ecology (with script)
- Ethan White‘s talk on Evaluating a General Theory of Macroecology
- Dan McGlinn‘s Ignite talk on Constraint Based Species-Area Relationships
- Dan McGlinn‘s talk on Modeling Geographic Patterns in the Species Abundance Distribution
Ken Locey‘s Ignite talk on The Feasible Set: Putting Pattern Into Perspective
Thanks to Dan McGlinn for help to assembling the links.
Slides and script from Morgan Ernest’s Ignite talk on Why constraint based approaches to ecology from Elita Baldridge and Ethan White’s thought provoking ESA 2013 session on Constraints in Ecology. Slides are also archived on figshare.
As this coral reef food web so aptly demonstrates, nature is complex. There is a dizzying array of diversity across species, across ecosystems and even across individuals within species and ecosystems.
To grapple with this complexity, ecology has a long tradition of using a reductionist approach. The hallmark of this approach is the belief that if we can understand the dynamics of each of the pieces of this complicated machinery, then we can understand how the machine as a whole works.
But as we delve into these complex systems, we have generally found that this is easier said than done. This food web only documents the direct trophic interactions.
If we wanted to completely model this ecosystem, then on top of this feeding network, we would need to add population regulation for each of the species, competitive interactions, predator/prey relationships, mutualisms, indirect interactions, abiotic dynamics….
But say we do this. Say we completely model all the biotic interactions in an ecosystem. What then? Well, we generally have to start all over again if we want to study a completely different coral reef, much less a desert or grassland or tropical forest.
What if instead of modelling all the complexity, it was possible to distill a complicated system down to a few core principles. Principles that constrain the possible states that an ecosystem can even exhibit? What if those constraints not only limit the possible diversity we have to think about, but actually help us better understand how and why a system can seem to break those constraints?
So let’s talk briefly about how a constraint works. In biology, a common constraint that arises is because something important is finite. The trick with something finite is that it sets a constraint on how much if available to be used.
There are lots of things in biology that are finite. Unless you have a time machine or an ability to defy the laws of thermodynamics, typically time and resources are finite. Amount constraints are important because they often limit what an individual can do or the productivity of ecosystems
If there is an amount constraint it will often also give rise to a partitioning constraint. As your finite amount of stuff gets allocated, its not available for other uses. To allocate more time to knitting, I’d have to take time away from other activities
The only way to increase investment in one activity w/o negatively impacting another is by increasing the size of the pie. Sadly, my time machine has been acting up lately.
Biologically, this holds for both the individual and the ecosystem levels. For an organism, investment in reproduction may reduce resources available for other functions. At the ecosystem-level, resources used by one species aren’t available for other species to use
So, how can thinking about a constraint change your view of your favorite system? Here’s an example from my favorite system involving desert rodents. In 2003, Ethan White, Kate Thibault and I began to wonder why our long-term field site always setting new rodent records high levels of rodent abundances. When we plotted the data it was clear that the number of rodents caught in a year has been increasing since the study started in 1977
But if we try to estimate the amount of resources being used by the community, by summing an index of metabolic rate across all individuals, it hasn’t been increasing at all. This suggests that somehow the community is violating what I just told you. They are supporting more individuals on the same amount of resources.
The answer goes back to the pie. When you divide up the same pie into smaller pieces, the pie can support more slices. In our case, the community has shifted from large to small species. Small individuals require less resources oer individual than larger species…hence more individuals on the same amount of resource.
One of the cool things about constraints is that in biology they’re kinda like the pirate code in Pirates of the Caribbean: they are guidelines that evolution cleverly tries to get around. Need more nitrogen? Make friends w/ a microbe that can fix atmospheric nitrogen for you. Need more resources to devote to reproduction? Convince your relatives to help out.
And that’s why I really love thinking about constraints in biology. They can really help us do two things: take a bewildering array of complexity and provide an ordered expectation of how the world should look.
And by understanding how the world should look, it helps us better understand and examine those individuals, species, or ecosystems that seem to be doing things a little differently. It is those who do things differently that can provide us with the best insights into cool biology.
The talks that come after me will explain different types of constraints and the cool things that understanding those constraints have allowed them to ask. And hopefully by the end of this session we will have convinced you to start looking at your system through constraint-based eyes and see if cool new questions pop out at you too!
Slides and script from Ethan White’s Ignite talk on Big Data in Ecology from Sandra Chung and Jacquelyn Gill‘s excellent ESA 2013 session on Sharing Makes Science Better. Slides are also archived on figshare.
1. I’m here to talk to you about the use of big data in ecology and to help motivate a lot of the great tools and approaches that other folks will talk about later in the session.
2. The definition of big is of course relative, and so when we talk about big data in ecology we typically mean big relative to our standard approaches based on observations and experiments conducted by single investigators or small teams.
3. And for those of you who prefer a more precise definition, my friend Michael Weiser defines big data and ecoinformatics as involving anything that can’t be successfully opened in Microsoft Excel.
4. Data can be of unusually large size in two ways. It can be inherently large, like citizen science efforts such as Breeding Bird Survey, where large amounts of data are collected in a consistent manner.
5. Or it can be large because it’s composed of a large number of small datasets that are compiled from sources like Dryad, figshare, and Ecological Archives to form useful compilation datasets for analysis.
6. We have increasing amounts of both kinds of data in ecology as a result of both major data collection efforts and an increased emphasis on sharing data.
7-8. But what does this kind of data buy us. First, big data allows us to work at scales beyond those at which traditional approaches are typically feasible. This is critical because many of the most pressing issues in ecology including climate change, biodiversity, and invasive species operate at broad spatial and long temporal scales.
9-10. Second, big data allows us to answer questions in general ways, so that we get the answer today instead of waiting a decade to gradually compile enough results to reach concensus. We can do this by testing theories using large amounts of data from across ecosystems and taxonomic groups, so that we know that our results are general, and not specific to a single system (e.g., White et al. 2012).
11. This is the promise of big data in ecology, but realizing this potential is difficult because working with either truly big data or data compilations is inherently challenging, and we still lack sufficient data to answer many important questions.
12. This means that if we are going to take full advantage of big data in ecology we need 3 things. Training in computational methods for ecologists, tools to make it easier to work with existing data, and more data.
13. We need to train ecologists in the computational tools needed for working with big data, and there are an increasing number of efforts to do this including Software Carpentry (which I’m actively involved in) as well as training initiatives at many of the data and synthesis centers.
14. We need systems for storing, distributing, and searching data like DataONE, Dryad, NEON‘s data portal, as well as the standardized metadata and associated tools that make finding data to answer a particular research question easier.
15. We need crowd-sourced systems like the Ecological Data Wiki to allow us to work together on improving insufficient metadata and understanding what kinds of analyses are appropriate for different datasets and how to conduct them rigorously.
16. We need tools for quickly and easily accessing data like rOpenSci and the EcoData Retriever so that we can spend our time thinking and analyzing data rather than figuring out how to access it and restructure it.
17. We also need systems that help turn small data into big data compilations, whether it be through centralized standardized databases like GBIF or tools that pull data together from disparate sources like Map of Life.
18. And finally we we need to continue to share more and more data and share it in useful ways. With the good formats, standardized metadata, and open licenses that make it easy to work with.
19. And so, what I would like to leave you with is that we live in an exciting time in ecology thanks to the generation of large amounts of data by citizen science projects, exciting federal efforts like NEON, and a shift in scientific culture towards sharing data openly.
20. If we can train ecologists to work with and combine existing tools in interesting ways, it will let us combine datasets spanning the surface of the globe and diversity of life to make meaningful predictions about ecological systems.
It’s that time of year again where we let people know which Weecologists are doing what and where at the annual Ecological Society of America meeting! We have an action packed schedule this year.
8am-5pm, 101G Minneapolis Convention Center
Ethan White and former Weecology undergrad Ben Morris will be helping out with a Software Carpentry Workshop. Learn cool tools to improve your scientific computing practices! So go become a computing ninja*
*your mileage may vary.
4pm, 101C Minneapolis Convention Center
Ethan White will be speaking in the NEON organized oral session on “Plugging into NEON” about why testing theories with a lot of data is a good thing. Trust me, when he means a lot of data, he means A LOT of data.
8 am, 101D Minneapolis Convention Center
Morgan Ernest is speaking in the LTER/LTREB organized oral session on “Legacies From Long-Term Ecological Studies: Using The (Recent) Past To Inform Future Research”. As the title suggests, she’ll be talking about how short-term and long-term shifts in ecological drivers can reorganize communities. Oh, and for the Portal Project fans out there, this is a Portal Project talk.
Organized by Sandra Chung and Jacquelyn Gill
8am-10am, M100IB Minneapolis Convention Center
Ben Morris and Ethan White will both be talking in this Ignite Session and how and why to set your data free.
9:50 am, 101D Minneapolis Convention Center
Sarah Supp (now a post-doc at Stony Brook University) will be presenting some of her dissertation work exploring the cool idea that communities are actually composed of groups representing two different syndromes of traits. She is also helping represent the Portal crew this year!
Exhibit Hall B, Minneapolis Convention Center
Ken Locey will be presenting a poster on his dissertation work. Ever have a sneaking suspicion that there was something about the species abundance distribution that we didn’t understand? Grab a beer (or other beverage of choice), go see Ken’s poster, and let Ken blow your mind.
8 am, L100C, Minneapolis Convention Center
Dan McGlinn, current Weecology post-doc, will present on his work on his latest work applying the Maximum Entropy Theory of Ecology to species-abundance distributions. If Ethan White’s talk on Monday made you crave more Maximum Entropy Theory, then you can get your fix here.
Organized by Elita Baldridge and Ethan White
1:30-3:30pm, 101C Minneapolis Convention Center
This Ignite session is organized by Weecology graduate student Elita Baldridge & Ethan White (primarily Elita) and it focused on providing a forum for different perspectives on how studying constraints can give us important insights into ecology. It features some Weecologists (Ernest, Locey, McGlinn), but also a bunch of other really outstanding scientists.
IGN 10-1 Ernest – Why constraint based approaches to ecology?
IGN 10-2 Locey – The feasible set: putting pattern in perspective
IGN 10-3 Rominger – Evolutionary Constraints and information entropy in ecology
IGN 10-4 Kaspari, Kay and Powers – Leibig is dead; long love Leibig
IGN 10-5 Lamanna – Constrains on carbon flux in extreme climates
IGN 10-6 McGlinn – Ecological constraints predict the spatial structure of biodiversity
IGN 10-7 Buckley – Thermal and energetic constraints on biogeography in changing environments
IGN 10-8 Diamond – Physiological constraints and predicting
Exhibit Hall B, Minneapolis Convention Center
Zack Brym will be presenting a poster on his dissertation research. Do you think apple and cherry trees are just like other trees (except w/ delicious fruit)? Find out why you’re both right and wrong.
Finally, last, but certainly not least, we’d like to give a shout out to former Weecology post-doc Kate Thibault, who is now a scientist at NEON. We would list all of her stuff, but she is involved in so many activities and presentations that she would require her own blog post! We are super proud of her through, so go check out: this, this, this, and she’s a co-author on Ethan’s presentation on Monday!
If you have been to a conference recently where speakers are invited, the odds are that you (or someone with you) noticed that the speaker list didn’t really reflect the demographics of the field.
There have been various conversations about a number of recent conferences. For an example, check out this hilarious post by Jonathan Eisen. The bottom of the post also contains numerous updates containing links to studies and other comments on this issue.
So, with this background in mind, I agreed to organize a small seminar series for the Ecology Center at Utah State University this past year. The seminar series was a monthly lunch meant to bring together the diverse group of people on our campus with scholarly activities related to the environment (which spans multiple departments and colleges). With the increasing importance of interdisciplinary research, the thought was to create a venue to help us understand the diversity of activities here and to eventually foster activities across disciplines. It’s one of many things I probably should have said no to*, but I have a special interest in fostering interdisciplinary communication*** so I decided to give it a try. For each lunch I invited 3-5 people to come and each give a 5 minute talk about their research. My mission: to create a speaking list for each lunch that was close to 50:50 gender ratio, at minimum reflected the background ethnic diversity here, and represented multiple departments. Over the academic year, a total of 28 people gave talks, representing 10 departments and 6 colleges. The gender ratio ended up being 50:50. My ethnic diversity was lower than I wanted: 11% from underrepresented groups. It’s hard to figure out what the ethnic diversity of USU’s faculty really is, but this site suggests that perhaps I wasn’t too far off the background****
So that’s the stats of what I did. Given all the scuttlebutt around the internet about people attending conferences/symposia where the invited speakers are highly white male biased, and the recent study that suggests part of the problem is that women say ‘no’ more frequently than men to invites, how did my seminar series end up the way it did?
Short answer: It wasn’t easy.
In reflecting over my past year, here are what I think the important steps were.
Start with a big pool: I generated a pool of possible invitees by going through every department remotely affiliated with the Ecology Center and making a list of names of people who fit the broad theme of the lunch series. Then I talked to colleagues who interacted a lot with other units to get even more names from units that we don’t normally interact with. In the end my list was 54 people. But with that big pool to start with, I had a lot of flexibility as I tried to balance the multiple axes of speaker diversity.
Invitee Categories: When I would set out to organize a lunch, I would start by deciding what departments I wanted. Then I would use my list to pick out a gender balanced list w/ representation from an underrepresented group if that was an option. By being clear upfront about the different diversity axes I was managing, there were clear decisions to make about invitations. Because I had already vetted my list to be all people suitable to speak at the event.
Managing the rejections: This is where the time investment and the big pool really become important. Let’s face it: most of us are in reactionary crisis management mode and when our carefully crafted balanced speaker list gets disrupted by a ‘no’ we just go to the next name that pops in our head. I don’t care what gender/ethnic/other group you belong to, the studies suggest that your knee jerk response won’t add diversity to your list. Given the low diversity in our field, the truth is that even if you were strategic, there may not be another woman/underrepresented minority available that fits a specific type of slot. So what do you do? I waited. I waited for all the rejections to come in and then crafted a second invite list organized around who said yes. Did the man from Biology and the woman from Engineering decline? Not another woman on my list from engineering? Invite a woman from biology and a man from engineering. If the original pool of invitees is big enough, this kind of rearrangement on the second round of invites can be accomplished fairly easily.
Persistence: The advantage of a seminar series over a conference is that if a specific date didn’t work, I could send them other possible dates and see if one of those did work. These people would then be the starting point that I crafted the rest of the speaker list around for some other month.
How do I know these rules worked? Because like you, I’m really busy and sometimes I didn’t follow them. For two months in particular, I let chaos reign. What did I get? One month was all men and 75% were from one department. Interestingly, the other month was all women – though from different departments. The lesson that I learned from that? Diversity on multiple axes doesn’t just ‘happen’.
My story is frankly, just that. Maybe I got lucky that my seminar series ended up as diverse as it did. But I have to agree with Edna’s (from the Incredibles) paraphrasing of a famous Louis Pasteur quote:
Luck favors the prepared.
*I think there’s a law of academia where the number of requests you get to do stuff can be fit by the following equation**: Sum(number of times your name has been mentioned recently, weighted by whether your name was used in a positive or negative context) + [social aptitude]^(whether you represent an underrepresented group in your field and how underrepresented is that group)
**And to the quantitative folk out there, no I have no idea what that equation would actually look like. My guess: absolute garbage.
*** Getting familiar with another discipline’s vocabulary can be extremely important for communication. For example, when my ecoinformatics husband says “Sudo, review that manuscript for me” I have learned that he is actually attempting to use Jedi-like computer programming mind tricks to make me do what he wants. Fortunately, I am not (yet) a computer.
**** It depends on what one thinks the ‘ethnicity unknown’ and ‘non-resident alien’ groups represent. I like to think the non-resident aliens are from other planets. That would seriously help the diversity problem on our campus.
We here at Weecology have just recently discovered John Bruno’s blog SeaMonster, and have been getting a great deal of enjoyment out of it. While perusing some of the posts, we ran across one that made Ethan and I both laugh and cringe at the same time: Are unreasonably harsh reviewers retarding the pace of coral reef science? It’s the troubled story of a young manuscript just trying to get a break in this cruel world of academic publishing. In particular it was this part summarizing the reviews the paper has received that caught our attention:
Reviewer 1: This is impossible! They are clearly wrong!
Reviewer 2: Everyone knows this! The study lacks novelty and impact!
Ethan and I have long had the hypothesis that getting these types of reviews, where your idea is both wrong and trivial, is a sign you’re on to something good. We call it the Charnov Zone. Why you ask? I spent a couple of years as a postdoc working with Eric Charnov and I learned all sorts of great things from him, some of it scientific and some of it about the more practical aspects of being a scientist. The latter lessons were often delivered as stories, and one of his stories was about him presenting his work at Oxford as a newly minted PhD student1:
Eric Charnov went out to Oxford to present his dissertation research on optimal foraging. When he finished his talk, an eminent biologist2 stood up and proceeded to explain why Ric’s work was deeply and horrifically flawed. After that an extremely eminent evolutionary biologist3, stood up and explained kindly how the work wasn’t wrong at all, just trivial.
What was this work that was both wrong and trivial?
Charnov, EL. 1976. Optimal foraging, the marginal value theorem. Theoretical population biology 9: 129-136.
And yes, it is a Citation Classic that’s been cited nearly 3000 times according to Google Scholar.
While having a manuscript that lands in the Charnov Zone doesn’t necessarily mean you have a Citation Classic on your hands, it probably does mean you have an idea that is causing cognitive dissonance in your field. This particular brand of cognitive dissonance seems to be an indicator that there’s something in the paper that part of your field takes as uninteresting trivia (often without proof) and another part of your field rejects as impossible (and you must be wrong). Thus you have something the field needs to think very carefully about. So, give your manuscript caught in the Charnov Zone a little love. At Weecology, we think that papers that are paradoxically wrong and trivial are in a scientific sweet spot4 and well-worth the effort.
UPDATE: Eric Charnov emailed with a correction to the story. The paper in question was actually Charnov, E.L. 1976. Optimal foraging: attack strategy of a mantid. American Naturalist 110:141-151. This paper is also well cited (~750 citations) and Ric says that the Current Contents group (who managed Citation Classics and solicited the essays about the papers) gave him a choice of writing up either the mantid paper or the marginal value theorem paper. Ric chose the Marginal Value Theorem. Thus the story generally still stands. For insights into the troubles the Marginal Value Theorem paper had in the review process (which is also a Charnov Zone story), see Ric’s comment below.
1 Please note that my memory has become less reliable after having a kid, so this story may or may not accurately reflect what was actually told to me ten years ago!
2 Honestly can’t remember the name, but I assume he was eminent because surely Oxford doesn’t have any other type!
3 My memory says the man’s name rhymed with Dichard Rawkins.
4 Admittedly a frustrating one
I’m a big fan of preprints, the posting of papers in public archives prior to peer review. Preprints speed up the scientific dialogue by letting everyone see research as it happens, not 6 months to 2 years later following the sometimes extensive peer review process. They also allow more extensive pre-publication peer review because input can be solicited from the entire community of scientists, not just two or three individuals. You can read more about the value of preprints in our preprint about preprints (yes, really) posted on figshare.
In the spirit of using preprints to facilitate broad pre-publication peer review a group of weecologists have just posted a preprint on how to make it easier to reuse data that is shared publicly. Since PeerJ‘s commenting system isn’t live yet we would like to encourage your to provide feedback about the paper here in the comments. It’s for a special section of Ideas in Ecology and Evolution on data sharing (something else I’m a big fan of) that is being organized by Karthik Ram (someone I’m a big fan of).
Our nine recommendations are:
- Share your data
- Provide metadata
- Provide an unprocessed form of the data
- Use standard data formats (including file formats, table structures, and cell contents)
- Use good null values
- Make it easy to combine your data with other datasets
- Perform basic quality control
- Use an established repository
- Use an established and liberal license
Most of this territory has been covered before by a number of folks in the data sharing world, but if you look at the state of most ecological and evolutionary data it clearly bears repeating. In addition, I think that our unique contribution is three fold: 1) We’ve tried hard to stick to relatively simple things that don’t require a huge time commitment to get right; 2) We’ve tried to minimize the jargon and really communicate with the awesome folks who are collecting great data but don’t have much formal background in the best practices of structuring and sharing data; and 3) We contribute the perspective of folks who spend a lot of time working with other people’s data and have therefore encountered many of the most common issues that crop up in ecological and evolutionary data.
So, if you have the time, energy, and inclination, please read the preprint and let us know what you think and what we can do to improve the paper in the comments section.
UPDATE 2: PeerJ has now enabled commenting on preprints, so comments are welcome directly on our preprint as well (https://peerj.com/preprints/7/).
Over at Dynamic Ecology this morning Jeremy Fox has a post giving advice on how to decide where to submit a paper. It’s the same basic advice that I received when I started grad school almost 15 years ago and as a result I don’t think it considers some rather significant changes that have happened in academic publishing over the last decade and a half. So, I thought it would be constructive for folks to see an alternative viewpoint. Since this is really a response to Jeremy’s post, not a description of my process, I’m going to use his categories in the same order as the original post and offer my more… youthful… perspective.
- Aim as high as you reasonably can. The crux of Jeremy’s point is “if you’d prefer for more people to read and think highly of your paper, you should aim to publish it in a selective, internationally-leading journal.” From a practical perspective journal reputation used to be quite important. In the days before easy electronic access, good search algorithms, and social networking, most folks found papers by reading the table of contents of individual journals. In addition, before there was easy access to paper level citation data, and alt-metrics, if you needed to make a quick judgment on the quality of someones science the journal name was a decent starting point. But none of those things are true anymore. I use searches, filtered RSS feeds, Google Scholar’s recommendations, and social media to identify papers I want to read. I do still subscribe to tables of contents via RSS, but I watch PLOS ONE and PeerJ just as closely as Science and Nature. If I’m evaluating a CV as a member of a search committee or a tenure committee I’m interested in the response to your work, not where it is published, so in addition to looking at some of your papers I use citation data and alt-metrics related to your paper. To be sure, there are lots of folks like Jeremy that focus on where you publish to find papers and evaluate CVs, but it’s certainly not all of us.
- Don’t just go by journal prestige; consider “fit”. Again, this used to mater more before there were better ways to find papers of interest.
- How much will it cost? Definitely a valid concern, though my experience has been that waivers are typically easy to obtain. This is certainly true for PLOS ONE.
- How likely is the journal to send your paper out for external review? This is a strong tradeoff against Jeremy’s point about aiming high since “high impact” journals also typically have high pre-review rejection rates. I agree with Jeremy that wasting time in the review process is something to be avoided, but I’ll go into more detail on that below.
- Is the journal open access? I won’t get into the arguments for open access here, but it’s worth noting that increasing numbers of us value open access and think that it is important for science. We value open access publications so if you want us to “think highly of your paper” then putting it where it is OA helps. Open access can also be important if you “prefer for more people to read… your paper” because it makes it easier to actually do so. In contrast to Jeremy, I am more likely to read your paper if it is open access than if it is published in a “top” journal, and here’s why: I can do it easily. Yes, my university has access to all of the top journals in my field, but I often don’t read papers while I’m at work. I typically read papers in little bits of spare time while I’m at home in the morning or evenings, or on my phone or tablet while traveling or waiting for a meeting to start. If I click on a link to your paper and I hit a paywall then I have to decide whether it’s worth the extra effort to go to my library’s website, log in, and then find the paper again through that system. At this point unless the paper is obviously really important to my research the activation energy typically becomes too great (or I simply don’t have that extra couple of minutes) and I stop. This is one reason that my group publishes a lot using Reports in Ecology. It’s a nice compromise between being open access and still being in a well regarded journal.
- Does the journal evaluate papers only on technical soundness? The reason that many of us think this approach has some value is simple, it reduces the amount of time and energy spent trying to get perfectly good research published in the most highly ranked journal possible. This can actually be really important for younger researchers in terms of how many papers they produce at certain critical points in the career process. For example, I would estimate that the average amount of time that my group spends getting a paper into a high profile journal is over a year. This is a combination of submitting to multiple, often equivalent caliber, journals until you get the right roll of the dice on reviewers, and the typically extended rounds of review that are necessary to satisfy the reviewers about not only what you’ve done, but satisfying requests for additional analyses that often aren’t critical, and changing how one has described things so that it sits better with reviewers. If you are finishing your PhD then having two or three papers published in a PLOS ONE style journal vs. in review at a journal that filters on “importance” can make a big difference in the prospect of obtaining a postdoc. Having these same papers out for an extra year accumulating citations can make a big difference when applying for faculty positions or going up for tenure if folks who value paper level metrics over journal name are involved in evaluating your packet.
- Is the journal part of a review cascade? I don’t actually know a lot of journals that do this, but I think it’s a good compromise between aiming high and not wasting a lot of time in review. This is why we think that ESA should have a review cascade to Ecosphere.
- Is it a society journal? I agree that this has value and it’s one of the reasons we continue to support American Naturalist and Ecology even though they aren’t quite as open as I would personally prefer.
- Have you had good experiences with the journal in the past? Sure.
- Is there anyone on the editorial board who’d be a good person to handle your paper? Having a sympathetic editor can certainly increase your chances of acceptance, so if you’re aiming high then having a well matched editor or two to recommend is definitely a benefit.
To be clear, there are still plenty of folks out there who approach the literature in exactly the way Jeremy does and I’m not suggesting that you ignore his advice. In fact, when advising my own students about these things I often actively consider and present Jeremy’s perspective. However, there are also an increasing number of folks who think like I do and who have a very different set of perspectives on these sorts of things. That makes life more difficult when strategizing over where to submit, but the truth is that the most important thing is to do the best science possible and publish it somewhere for the world to see. So, go forth, do interesting things, and don’t worry so much about the details.