Author Archives:

Characterizing the species-abundance distribution with only information on richness and total abundance [Research Summary]

This is the first of a new category of posts here at Jabberwocky Ecology called Research Summaries. We like the idea of communicating our research more broadly than to the small number of folks who have the time, energy, and interest to read through entire papers. So, for every paper that we publish we will (hopefully) also do a blog post communicating the basic idea in a manner targeted towards a more general audience. As a result these posts will intentionally skip over a lot of detail (technical and otherwise), and will intentionally use language that is less precise, in order to communicate more broadly. We suspect that it will take us quite a while to figure out how to do this well. Feedback is certainly welcome.

This is a Research Summary of: White, E.P., K.M. Thibault, and X. Xiao. 2012. Characterizing species-abundance distributions across taxa and ecosystems using a simple maximum entropy model. Ecology. http://dx.doi.org/10.1890/11-2177.1*

The species-abundance distribution describes the number of species with different numbers of individuals. It is well known that within an ecological community most species are relatively rare and only a few species are common, and understanding the detailed form of this distribution of individuals among species has been of interest in ecology for decades. This distribution is considered interesting both because it is a complete characterization of the commonness and rarity of species and because the distribution can be used to test and parameterize ecological models.

Numerous mathematical descriptions of this distribution have been proposed and much of the research into this pattern has focused on trying to figure out which of these descriptions is “the best” for a particular group of species at a small number of sites. We took an alternative approach to this pattern and asked: Can we explain broad scale, cross-taxonomic patterns in the general shape of the abundance distribution using a simple model that requires only knowledge of the species richness and total abundance (summed across all species) at a site?

To do this we used a model that basically describes the most likely form of the distribution if the average number of individuals in a species is fixed (which turns out to be a slightly modified version of the classic log-series distribution; see the paper or John Harte’s new book for details). As a result this model involves no detailed biological processes and if we know richness and total abundance we can predicted the abundance of each species in the community (i.e., the abundance of the most common species, second most common species… rarest species).

Since we wanted to know how well this works in general (not how well it works for birds in Utah or trees in Panama) we put together a a dataset of more than 15,000 communities. We did this by combining 6 major datasets that are either citizen science, big government efforts, or compilations from the literature. This compilation includes data on birds, trees, mammals, and butterflies. So, while we’re missing the microbes and aquatic species, I think that we can be pretty confident that we have an idea of the general pattern.

In general, we can do an excellent job of predicting the abundance of each rank of species (most abundant, second most abundant…) at each site using only information on the species richness and total abundance at the site. Here is a plot of the observed number of individuals in a given rank at a given site against the number predicted. The plot is for Breeding Bird Survey data, but the rest of the datasets produce similar results.

Observed-predicted plot for Breeding Bird Survey data showing a good ability of the model to predict the observed data.

Observed-predicted plot for nearly 3000 Breeding Bird Survey communities. Since there are over 100,000 points on this plot we’ve color coded them by the number of points in the vicinity of the focal point, so red areas have lots of points nearby and blue areas have very few points. The black line is the 1:1 line.

The model isn’t perfect of course (they never are and we highlight some of its failures in the paper), but it means that if we know the richness and total abundance of a site then we can capture over 90% of the variation in the form of the species-abundance distribution across ecosystems and taxonomic groups.

This result is interesting for two reasons:

First, it suggests that the species-abundance distribution, on its own, doesn’t tell us much about the detailed biological processes structuring a community. Ecologists have know that it wasn’t fully sufficient for distinguishing between different models for a while (though we didn’t always act like it), but our results suggest that in fact there is very little additional information in the distribution beyond knowing the species richness and total abundance. As such, any model that yields reasonable richness and total abundance values will probably produce a reasonable species-abundance distribution.

Second, this means that we can potentially predict the full distribution of commonness and rarity even at locations we have never visited. This is possible because richness and total abundance can, at least sometimes, be well predicted using remotely sensed data. These predictions could then be combined with this model of the species-abundance distribution to make predictions for things like the number of rare species at a site. In general, we’re interested in figuring out how much ecological pattern and process can be effectively characterized and predicted at large spatial scales, and this research helps expand that ability.

So, that’s the end of our first Research Summary. I hope it’s a useful thing that folks get something out of. In addition to the science in this paper, I’m also really excited about the process that we used to accomplish this research and to make it as reproducible as possible. So, stay tuned for some follow up posts on big data in ecology, collaborative code development, and making ecological research more reproducible.

———————————————————————————————————————————————————————————————
*The paper will be Open Access once it is officially published but ,for reasons that don’t make a lot of sense to me, it is behind a paywall until it comes out in print.

On the value of fundamental scientific research

Jeremy Fox over at the Oikos Blog has written an excellent piece explaining why fundamental, basic science, research is worth investing in, even when time and resources are limited. His central points include:

  • Fundamental research is where a lot of our methodological advances come from.
  • Fundamental research provides generally-applicable insights.
  • Current applied research often relies on past fundamental research.
  • Fundamental research often is relevant to the solution of many different problems, but in diffuse and indirect ways.
  • Fundamental research lets us address newly-relevant issues.
  • Fundamental research alerts us to relevant questions and possibilities we didn’t recognize as relevant.
  • Fundamental research suggests novel solutions to practical problems.
  • The only way to train fundamental researchers is to fund fundamental research.

I don’t have a lot to add to what Jeremy has already said, except that I strongly agree with the points that he has made and think that in an era where much of ecology has direct applications to things like global change we need to guard against the temptation to justify all of our research based on its applications.

When I think about the value of fundamental research I always recall a scene from an early season of The West Wing where a politician (SAM) and a scientist (MILLGATE) are discussing how to explain the importance of something akin to the Large Hadron Collider. It loses a little something as a script (complements of Unofficial West Wing Transcript Archive), but nonetheless:

SAM
What is it?

MILLGATE
It’s a machine that reveals the origin of matter… By smashing protons together at very high speeds and at very high temperatures, we can recreate the Big Bang in a laboratory setting, creating the kinds of particles that only existed in the first trillionth of a second after the universe was created.

SAM
Okay, terrific. I understand that. What kind of practical applications does it have?

MILLGATE
None at all.

SAM
You’re not in any way a helpful person.

MILLGATE
Don’t have to be. I have tenure.

SAM
Doctor.

MILLGATE
There are no practical applications, Sam. Anybody who says different is lying.

ENLOW
If only we could only say what benefit this thing has, but no one’s been able to do that.

MILLGATE
That’s because great achievement has no road map. The X-ray’s pretty good. So is penicillin. Neither were discovered with a practical objective in mind. I mean, when the electron was discovered in 1897, it was useless. And now, we have an entire world run by electronics. Haydn and Mozart never studied the classics. They couldn’t. They invented them.

SAM
Discovery.

MILLGATE
What?

SAM
That’s the thing that you were… Discovery is what. That’s what this is used for. It’s for discovery.

The episode is “Dead Irish Writers” and I’d highly recommend watching the whole thing if you want to feel inspired about doing fundamental research.

Sometimes it’s important to ignore the details [Things you should read]

Joan Strassman has a very nice post about why it is sometimes useful to step back from the intricate details of biological systems in order to understand the general processes that are operating. Here’s a little taste of the general message

In this talk, Jay said that MacArthur claimed the best ecologists had blurry vision so they could see the big patterns without being overly distracted by the contradictory details. This immediately made a huge amount of sense to me. Biology is so full of special cases, of details that don’t fit theories, that it is easy to despair of advancing with broad, general theories. But we need those theories, for they tell us where to look next, what data to collect, and even what theory to challenge. I am a details person, but love the big theories.

The whole post is definitely worth a read.

Why I will no longer review for your journal

I have, for a while, been frustrated and annoyed by the behavior of several of the large for-profit publishers. I understand that their motivations are different from my own, but I’ve always felt that an industry that relies entirely on both large amounts of federal funding (to pay scientists to do the research and write up the results) and a massive volunteer effort to conduct peer review (the scientists again) needed to strike a balance between the needs of the folks doing all of the work and the corporations need to maximize profits.

Despite my concerns about the impacts of increasingly closed journals, with increasingly high costs, on the dissemination of research and the ability of universities to support their core missions of teaching and research, I have continued to volunteer my time and effort as a reviewer to Elsevier and Wiley-Blackwell. I did this because I have continued to see valuable contributions made by these journals and I felt that this combined with the contribution that I was making to science by helping improve the science published in high profile places made supporting these journals worthwhile. I no longer believe this to be the case and from now on I will no longer be reviewing for any journal that is published by Elsevier, Springer, or Wiley-Blackwell (including society journals that publish through them).

Why have I changed my mind? Because of the pursuit/support by these companies of the Research Works Act. This act seeks to prevent funding agencies from requiring that the results of research that they funded be made publicly available. In other words it seeks to prevent the government (and the taxpayers that fund it), which pays for a very large fraction of the cost of any given paper through both funding the research and paying the salaries of reviewers and editors, from having any say in how that research is disseminated. I think that Mike Taylor in the Guardian said most clearly how I feel about this attempt to exert legislative control requiring us to support corporate profits over the dissemination of scientific research:

Academic publishers have become the enemies of science

This is the moment academic publishers gave up all pretence of being on the side of scientists. Their rhetoric has traditionally been of partnering with scientists, but the truth is that for some time now scientific publishers have been anti-science and anti-publication. The Research Works Act, introduced in the US Congress on 16 December, amounts to a declaration of war by the publishers.

You should read the entire article. It’s powerful. There are lots of other great articles about the RWA including Michael Eisen in the New York Times, a nice post by INNGE, and a interesting piece by Paul Krugman (via oikosjeremy). I’m also late to the party in declaring my peer review strike and less eloquent than many of my peers in explaining why (see great posts by Michael Taylor, Gavin Simpson, and Timothy Gowers). But I’m here now and I’m letting you know so that you can consider whether or not you also want to stop volunteering for companies that don’t have science’s best interests in mind.

If you’d like to read up on the publisher’s side of this argument (they have costs, they have a right to recoup them) you can see Springer’s official position or an Elsevier Exec’s exchange with Michael Eisen. My problem with all of these arguments is that there is nothing in any funding agency’s policy that requires publishers to publish work funded by that agency. This is not (as Springer has argued) an “unfunded mandate”, this is a stake holder that has certain requirements related to the publication of research in which they have an interest. This is just like an author (in any non-academic publishing situation) negotiating with a publisher. If the publisher doesn’t like the terms that the author demands, then they don’t have to publish the book. Likewise, if a publisher doesn’t like the NIH policy then they should simply not agree to publish NIH funded research.

To be clear, I am not as extreme in my position as some. I still support and will review for independent society journals like Ecology and American Naturalist even though they aren’t Open Access and even though ESA has made some absurd comments in support of the same ideas that are in RWA. The important thing for me is that these journals have the best interests of science in mind, even if they are often frustratingly behind the times in how they think and operate.

And don’t worry, I’ve still got plenty of journal related work to keep me busy, thanks to my new position on the editorial board at PLoS ONE.

UPDATE: The links to the INNGE and Timothy Gowers post have now been fixed, and here are links to a couple of great posts by Casey Bergman that I somehow left out: one on how to turn down reviews while making a point and one on the not so positive response he received to one of these emails.

UPDATE 2: A great collection of posts on RWA. There are a lot of really unhappy scientists out there.

UPDATE 3: A formal Boycott of Elsevier. Almost 1000 scientists have signed on so far.

UPDATE 4: Wiley-Blackwell has now distanced itself from RWA and said that “We do not believe that legislative initiatives are the best way forward at this time and so have no plans to endorse RWA. Instead we believe that research funder-publisher partnerships will be more productive.” In addition, it was announced that a bill that would do the opposite of RWA has now been introduced. Hooray for collective action!

Am I teaching well given the available research on teaching

Figuring out how to teach well as a professor at a research university is largely a self-study affair. For me the keys to productive self-study are good information and self-reflection. Without good information you’re not learning the right things and without self-reflection you don’t know if you are actually succeeding at implementing what you’ve learned. There have been some nice posts recently on information and self-reflection about how we teach over at Oikos (based on, indirectly, on a great piece on NPR) and Sociobiology (and a second piece) that are definitely worth a read. As part of a course I’m taking on how to teach programming I’m doing some reading about research on the best approaches to teaching and self-reflection on my own approaches in the classroom.

One of the things we’ve been reading is a great report by the US Department of Education’s Institute of Education Sciences on Organizing Instruction and Study to Improve Student Learning. The report synthesizes existing research on what to do in the classroom to facilitate meaningful long-term learning, and distills this information into seven recommendations and information on how strongly each recommendation is supported by available research.

Recommendations

  1. Space learning over time. Arrange to review key elements of course content after a delay of several weeks to several months after initial presentation. (moderate)
  2. Interleave worked example solutions with problem-solving exercises. Have students alternate between reading already worked solutions and trying to solve problems on their own. (moderate)
  3. Combine graphics with verbal descriptions. Combine graphical presentations (e.g., graphs, figures) that illustrate key processes and procedures with verbal descriptions. (moderate)
  4. Connect and integrate abstract and concrete representations of concepts. Connect and integrate abstract representations of a concept with concrete representations of the same concept. (moderate)
  5. Use quizzing to promote learning.
    1. Use pre-questions to introduce a new topic. (minimal)
    2. Use quizzes to re-expose students to key content (strong)
  6. Help students allocate study time efficiently.
    1. Teach students how to use delayed judgments of learning to identify content that needs further study. (minimal)
    2. Use tests and quizzes to identify content that needs to be learned (minimal)
  7. Ask deep explanatory questions. Use instructional prompts that encourage students to pose and answer “deep-level” questions on course material. These questions enable students to respond with explanations and supports deep understanding of taught material. (strong)

(Quoted directly from the original report via a Software Carpentry blog post)

This is a nice summary, but it’s definitely worth reading the whole report to explore the depth of the thought process and learn more about specific ideas for how to implement these recommendations.

How am I doing?

Recently I’ve been teaching two courses on programming and database management for biologists. Because I’m not a big believer in classroom lecture, for this type of material, a typical day in one of these courses involves: 1) either reading up on the material in a text book or viewing a Software Carpentry lecture before coming to class; 2) a brief 5-10 minute period of either re-presenting complex material or answering questions about the reading/viewing; and 3) 45 minutes of working on exercises (during which time I’m typically bouncing from student to student helping them figure out things that they don’t understand). So, how am I doing with respect the the above recommendations?

1. Space learning over time. I’m doing OK here, but not as well as I’d like. The nice thing about teaching introductory programming concepts is that they naturally build on one another. If we learned about if-then statements two weeks ago then I’m going to use them in the exercises about loops that we’re learning about this week. I also have my advanced class use version control throughout the semester for retrieving data and turning in exercises to force them to become very comfortable with the work-flow. However, I haven’t done a very good job of bringing concepts back, on their own, later in the semester. The exercise based approach to the course is perfect for this, I just need to write more problems and insert them into the problem-sets a few weeks after we cover the original material.

2. Interleave worked example solutions with problem-solving exercises. I think I’m doing a pretty good job here. Student’s see worked examples for each concept in either a text book or video lecture (viewed outside of class) and if I think they need more for a particular concept we’ll walk through a problem at the beginning of class. I often use the Online Python Tutor for this purpose which provides a really nice presentation of what is going on in the program. We then spend most of the class period working on problem-solving exercises. Since my classes meets three days a week I think this leads to a pretty decent interleaving.

3. Combine graphics with verbal descriptions. I do some graphical presentation and the Online Python Tutor gives some nice graphical representations of running programs, but I need to learn more about how to communicate programming concepts graphically. I suspect that some of the students that struggle the most in my Intro class would benefit from a clearly graphical presentation of what is going happening in the program.

4. Connect and integrate abstract and concrete representations of concepts. I think I do this fairly well. The overall motivation for the course is to ground the programming material in the specific discipline that the students are interested in. So, we learn about the general concept and then apply it to concrete biological problems in the exercises.

5. Use quizzing to promote learning. I’m not convinced that pre-questions make a lot of sense for material like this. In more fact based classes they are helping to focus students’ attention on what is important, but I think the immediate engagement in problem-sets that focus on the important aspects works at least as well in my classroom. I do have one test in the course that occurs about half way through the Intro course after we’ve covered the core material.  It is intended to provide the “delayed re-exposure” that has been shown to improve learning, but after reading this recommendation I’m starting to think that this would be better accomplished with a series of smaller quizzes.

6. Help students allocate study time efficiently. I spend a fair bit of time doing this when I help students who ask questions during the assignments. By looking at their code and talking to them it typically becomes clear where the “illusion of knowing” is creeping in and causing them problems and I think I do a fairly good job of breaking that cycle and helping them focus on what they still need to learn. I haven’t used quizzes for this yet, but I think they could be a valuable addition.

7. Ask deep explanatory questions. One of the main focuses in both of my courses is an individual project where the students work on a larger program to do something that is of interest to them. I do this with the hope that it can provide the kind of deep exposure that this recommendation envisions.

So, I guess I’m doing OK, but I need to work more on representation of material both through bringing back old material in the exercises and potentially through the use of short quizzes throughout the semester. I also need to work on alternative ways to present material to help reach folks whose brains work differently.

If you are a current or future teacher I really recommend reading the full report. It’s a quick read and provides lots of good information and food for thought when figuring out how to help your students learn.

Thanks for listening in on my self-reflection. If you have thoughts about this stuff I’d love to hear about it in the comments.

NSF Pre-proposal guidelines/instructions

UPDATE: If you’re looking for the information for 2013, here’s an updated post.

Since I have now spent far too much time on multiple occasions trying to track down the instructions for the new pre-proposals for NSF DEB and IOS grants I’m going to post the link here under the assumptions that other folks will be looking for this information as well (and also finding it difficult to track down).

http://www.nsf.gov/pubs/2011/nsf11573/nsf11573.htm#prep

Happy post-holiday grant writing to all.

UPDATE 1: Also note that the Biosketches are different for the pre-proposals (changes noted in bold-italics)

Biographical Sketches (2-page limit for each) should be included for each person listed on the Personnel page. It should include the individual’s expertise as related to the proposed research, professional preparation, professional appointments, five relevant publications, five additional publications, and up to five synergistic activities. Advisors, advisees, and collaborators should not be listed on this document, but in a separate table (see below).

UPDATE 2: Though it is not explicitly clear from the link above, Current & Pending Support should NOT be included in pre-proposals (thanks to Alan Tessier for clearing this up).

Stay Classy Wiley

I logged into one of my reviewer accounts at a Wiley journal this morning and was greeted by a redirect that took me to a page with the following message:

CONSENT
We appreciate your involvement with this publication, which is published by a John Wiley & Sons company. The publisher would like to contact you by email/post with details of publications and services that may be of interest to you, specific to your subject area, from companies in the John Wiley & Sons group (only) worldwide. Your information will never be passed to any third party companies and as part of any communications you will be given the opportunity to unsubscribe from receiving further contact. Please indicate whether you wish to receive this information by answering the CONSENT question below.

Asking someone who is already working for you for free if it’s OK to also try to sell them stuff while they’re doing it seems like a pretty good definition of classless to me.

Postdoc in Evolutionary Bioinformatics [Jobs]

There is an exciting postdoc opportunity for folks interested in quantitative approaches to studying evolution in Michael Gilchrist’s lab at the University of Tennessee. I knew Mike when we were both in New Mexico. He’s really sharp, a nice guy, and a very patient teacher. He taught me all about likelihood and numerical maximization and opened my mind to a whole new way of modeling biological systems. This will definitely be a great postdoc for the right person, especially since NIMBioS is at UTK as well. Here’s the ad:

Outstanding, motivated candidates are being sought for a post-doctoral position in the Gilchrist lab in the Department of Ecology & Evolutionary Biology at the University of Tennessee, Knoxville. The successful candidate will be supported by a three year NSF grant whose goal is to develop, integrate and test mathematical models of protein translation and sequence evolution using available genomic sequence and expression level datasets. Publications directly related to this work include Gilchrist. M.A. 2007, Molec. Bio. & Evol. (http://www.tinyurl/shahgilchrist11) and Shah, P. and M.A. Gilchrist 2011, PNAS (http://www.tinyurl/gilchrist07a).

The emphasis of the laboratory is focused on using biologically motivated models to analyze complex, heterogeneous datasets to answer biologically motivated questions. The research associated with this position draws upon a wide range of scientific disciplines including: cellular biology, evolutionary theory, statistical physics, protein folding, differential equations, and probability. Consequently, the ideal candidate would have a Ph.D. in either biology, mathematics, physics, computer science, engineering, or statistics with a background and interest in at least one of the other areas.

The researcher will collaborate closely with the PIs (Drs. Michael Gilchrist and Russell Zaretzki) on this project but potentiall have time to collaborate on other research projects with the PIs. In addition, the researcher will have opportunties to interact with other faculty members in the Division of Biology as well as researchers at the National Institute for Mathematical and Biological Synthesis (http://www.nimbios.org).

Review of applications begins immediately and will continue until the position is filled. To apply, please submit curriculum vitae including three references, a brief statement of research background and interests, and 1-3 relevant manuscripts to mikeg[at]utk[dot]edu.

An excoriation of for-profit academic publishers

George Monbiot has just published a piece in The Telegraph berating for-profit academic publishers that will surely be castigated by some as over the top hyperbole and praised by others as a trenchant criticism of the state of academic publishing*. Starting off with the, perhaps, ever so slightly, contentious title of Academic publishers make Murdoch look like a socialist Monbiot proceeds to fire zingers like

Murdoch pays his journalists and editors, and his companies generate much of the content they use. But the academic publishers get their articles, their peer reviewing (vetting by other researchers) and even much of their editing for free. The material they publish was commissioned and funded not by them but by us, through government research grants and academic stipends. But to see it, we must pay again, and through the nose.

and backs up his position with a recent analysis by Deutsche Bank

The publishers claim that they have to charge these fees as a result of the costs of production and distribution, and that they add value (in Springer’s words) because they “develop journal brands and maintain and improve the digital infrastructure which has revolutionised scientific communication in the past 15 years”. But an analysis by Deutsche Bank reaches different conclusions. “We believe the publisher adds relatively little value to the publishing process … if the process really were as complex, costly and value-added as the publishers protest that it is, 40% margins wouldn’t be available.”

finally ending with a call to arms that even your, never shying away from a good fight, narrator would have toned down a bit**

The knowledge monopoly is as unwarranted and anachronistic as the corn laws. Let’s throw off these parasitic overlords and liberate the research that belongs to us.

Go read the whole thing. This is something that folks are going to be talking about, and I think it’s another good opportunity to ask ourselves whether the the group that contributes the least to the overall scientific process should the one that benefits the most financially. I take it as yet another sign that 2011 is the year that the war over academic publishing officially began.

————————————————————————————————————————————————————————–

*Yes, once I used the word “excoriation” in the title I got a little carried away with the big words.

**Though not for lack of agreement with the general sentiment; clearly.

Some meandering thoughts on the difference between EcologicalData.org and DataONE

In the comments of my post on the Ecological Data Wiki Jarrett Byrnes asked an excellent question:

Very cool. I’m curious, how do you think this will compare/contrast/fight with the Data One project – https://www.dataone.org/ – or is this a different beast altogether?

As I started to answer it I realized that my thoughts on the matter were better served by a full post, both because they are a bit lengthy and because I don’t actually know much about DataONE and would love to have some of their folks come by, correct my mistaken impressions, and just chat about this stuff in general.

To begin with I should say that I’m still trying to figure this out myself, both because I’m still figuring out exactly what DataONE is going to be, and because EcologicalData is still evolving. I think that both projects goals could be largely defined as “Organizing Ecology’s Data,” but that’s a pretty difficult task, involving a lot of components and a lot of different ways to tackle them. So, my general perspective is that the more folks we have trying the merrier. I suspect there will be plenty of room for multiple related projects, but I’d be just as happy (even happier probably) if we could eventually find a single centralized location for handling all of this. All I want is solution to the challenge.

But, to get to the question at hand, here are the differences I see based on my current understanding of DataONE:

1. Approach. There are currently two major paradigms for organizing large amounts of information. The first is to figure out a way to tell computers how to do it for us (e.g., Google), the second is to crowdsource it’s development and curation (e.g., Wikipedia). DataONE is taking the computer based approach. It’s heavy on metadata, ontologies, etc. The goal is to manage the complexities of ecological data by providing the computer with very detailed descriptions of the data that it can understand. We’re taking the human approach, keeping things simple and trying to leverage the collective knowledge and effort of the field. As part of this difference in approach I suspect that EcologicalData will be much more interactive and community driven (the goal is for the community to actually run the site, just like Wikipedia) whereas DataONE will tend to be more centralized and hierarchical. I honestly couldn’t tell you which will turn out better (perhaps the two approaches will each turn out to be better for different things) but I’m really glad that we’re trying both at the same time to figure out what will work and where their relative strengths might be.

2. Actually serving data. DataONE will do this; we won’t. This is part of the difference in approach. If the computer can handle all of the thinking with respect to the data then you want it to do that and just spit out what you want. Centralizing the distribution of heterogeneous data is a really complicated task and I’m excited the folks at DataONE are tackling the challenge.

a. One of the other challenges for serving data is that is that you have to get all of the folks who “own” the data to let you provide it. This is one of the reasons I came up with the Data Wiki idea. By serving as a portal it helps circumvent the challenges of getting all of the individual stake holders to agree to participate.

b. We do provide a tool for data acquisition, the EcoData Retriever, that likewise focuses on circumventing the need to negotiate with data providers by allowing each individual investigator to automatically download the data from the source. But, this just sets up each dataset independently, whereas I’m presuming that DataONE will let you just run one big query of all the data (which I’m totally looking forward to by the way) [1].

3. Focus. The primary motivation behind the Data Wiki goes beyond identifying datasets and really focuses on how you should use them. Having worked with other folks’ data for a number of years I can say that the biggest challenging (for me anyway) is actually figuring out all of the details of when and how the dataset should be used. This isn’t just a question of reading metadata either. It’s a question of integrating thoughts and approaches from across the literature. What I would like to see develop on the Data Wiki pages is the development of concise descriptions for how to go about using these datasets in the best way possible.  This is a very difficult task to automate and one where I think a crowdsourced solution is likely the most effective. We haven’t done a great job of this yet, but Allen Hurlbert and I have some plans to develop a couple of good examples early in the fall to help demonstrate the idea.

4. We’re open for business. Ha ha, eat our dust DataONE. But seriously, we’ve taken a super simple approach which means we can get up and running quickly. DataONE is doing something much more complicated and so things may take some time to roll out. I’m hoping to get a better idea of what their time lines look like at ESA. I’m sure their tools will be well worth the wait.

5. Oh, and their budget is a little over $2,000,000/year, which is just slightly larger than our budget of around $5,000/year.

So, there is my lengthy and meandering response to Jarrett’s question. I’m looking forward to chatting with DataONE folks at ESA to find out more about what they are up to, and I’d love to have them stop by here to chat and clear up my presumably numerous misconceptions.

——————————————————————————————————————-

[1] Though we do have some ideas for managing something somewhat similar, so stay tuned for EcoData Retriever 2.0. Hopefully coming to an internet near you sometime this spring.

Follow

Get every new post delivered to your Inbox.

Join 572 other followers