Beta Release of Database Toolkit

The Ecological Database Toolkit

Large amounts of ecological and environmental data are becoming increasingly available due to initiatives sponsoring the collection of large-scale data and efforts to increase the publication of already collected datasets. As a result, ecology is entering an era where progress will be increasingly limited by the speed at which we can organize and analyze data. To help improve ecologists’ ability to quickly access and analyze data we have been developing software that designs database structures for ecological datasets and then downloads the data, processes it, and installs it into several major database management systems (at the moment we support Microsoft Access, MySQL, PostgreSQL, and SQLite). The database toolkit system can substantially reduce hurdles to scientists using new databases, and save time and reduce import errors for more experienced users.

The database toolkit can download and install small datasets in seconds and large datasets in minutes. Imagine being able to download and import the newest version of the Breeding Bird Survey of North America (a database with 4 major tables and over 5 million records in the main table) in less than five minutes. Instead of spending an afternoon setting up the newest version of the dataset and checking your import for errors you could spend that afternoon working on your research. This is possible right now and we are working on making this possible for as many major public/semi-public ecological databases as possible. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days. We hope that this will make it much more likely that scientists will use multiple datasets in their analyses; allowing them to gain more rapid insight into the generality of the pattern/process they are studying.

We need your help

We have done quite a bit of testing on this system including building in automated tests based on manual imports of most of the currently available databases, but there are always bugs and imperfections in code that cannot be identified until the software is used in real world situations. That’s why we’re looking for folks to come try out the Database Toolkit and let us know what works and what doesn’t, what they’d like to see added or taken away, and if/when the system fails to work properly. So if you’ve got a few minutes to have half a dozen ecological databases automatically installed on your computer for you stop by the Database Toolkit page at EcologicalData.org, give it a try, and let us know what you think.

4 Comments on “Beta Release of Database Toolkit

  1. Looks really cool!!! I’m looking forward to playing around with it a bit.

  2. Thanks Theo! The source code is up on the site as well if you feel like looking under the hood.

  3. Ethan,

    Thanks for this great contribution! Looks like it will be a great tool, I also look forward to testing it. I have spent hours or sometimes days writing scripts to download and process large datasets (most ambitious project was getting the hourly NOAA data from all weather stations since early 1900s, lots of data; I originally asked for a dump from them, but I suppose I didn’t have the right connections; but after writing a script and running wget for two days, I got it all). As the resource is fairly new, and documentation and a roadmap are likely still in early stages, I have a two questions:

    Any plans on integrating spatial support? Many ecological datasets contain spatial information (assuming it was collected along with the data). Creating spatial ecological databases would be best, but adds a level of complexity. You are likely aware of PostGIS for PostgreSQL (of which I am a big fan), but there are others as well.

    Is there a list of current available databases, or potential ones?

  4. Thanks Brady! We’re glad you like the idea.

    I just added a page to EcologicalData that lists the currently available datasets. It’s a fairly short list at the moment since we’ve been focusing on the underlying infrastructure, but one of our next big pushes is to increase the number of supported datasets substantially. For someone with the background indicated by your comment it should also be fairly easy to add datasets yourself by checking out the developer documentation. We’d certainly be happy to have folks contributing scripts for new datasets. Also feel free to pass along requests in the comments here or as an issue over on EcologicalData.

    I’d love to add support for spatial data at some point. I’ve been meaning to implement PostGIS or something like it for my lab for quite a while and just haven’t gotten around to it. Once I’ve played with it a bit on my own (hopefully this spring) and get a feel for the additional complexities it would entail we’ll sit down and figure out whether to integrate it or not.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: