Jabberwocky Ecology

Data Analyst position in ecology research group

The Weecology lab group run by Ethan White and Morgan Ernest at the University of Florida is seeking a Data Analyst to work collaboratively with faculty, graduate students, and postdocs to understand and model ecological systems. We’re looking for someone who enjoys tidying, managing, manipulating, visualizing, and analyzing data to help support scientific discovery.

The position will include:

  • Organizing, analyzing, and visualizing large amounts of ecological data, including spatial and remotely sensed data. Modifying existing analytical approaches and data protocols as needed.
  • Planning and executing the analysis of data related to newly forming questions from the group. Assisting in the statistical analysis of ecological data, as determined by the needs of the research group.
  • Providing assistance and guidance to members of the research group on existing research projects. Working collaboratively with undergraduates, graduate students and postdocs in the group and from related projects.
  • Learning new analytical tools and software as needed.

This is a staff position in the group and will be focused on data management and analysis. All members of this collaborative group are considered equal partners in the scientific process and this position will be actively involved in collaborations. Weecology believes in the importance of open science, so most work done as part of this position will involve writing open source code, use of open source software, and production and use of open data.

Weecology is a partnership between the White Lab, which studies ecology using quantitative and computational approaches and the Ernest Lab, which tends to be more field and community ecology oriented. The Weecology group supports and encourages members interested in a variety of career paths. Former weecologists are currently employed in the tech industry, with the National Ecological Observatory Network, as faculty at teaching-focused colleges, and as postdocs and faculty at research universities. We are also committed to supporting and training a diverse scientific workforce. Current and former group members encompass a variety of racial and ethnic backgrounds from the U.S. and other countries, members of the LGBTQ community, military veterans, people with chronic illnesses, and first-generation college students. More information about the Weecology group and respective labs is available on our website. You can also check us out on Twitter (@skmorgane, @ethanwhite, @weecology, GitHub, and our blog Jabberwocky Ecology.

The ideal candidate will have:

  • Experience working with data in R or Python, some exposure to version control (preferably Git and GitHub), and potentially some background with database management systems (e.g., PostgreSQL, SQLite, MySQL) and spatial data.
  • Research experience in ecology
  • Interest in open approaches to science
  • Experience collecting or working with ecological data

That said, don’t let the absence of any of these stop you from applying. If this sounds like a job you’d like to have please go ahead and put in an application.

We currently have funding for this position for 2.5 years. Minimum salary is $40,000/year (which goes a pretty long way in Gainesville), but there is significant flexibility in this number for highly qualified candidates. We are open to the possibility of someone working remotely. The position will remain open until filled, with initial review of applications beginning on May 5th. If you’re interested in applying you can do so through the official UF position page. If you have any questions or just want to let us know that you’re applying you can email Weecology’s project manager Glenda Yenni at glenda@weecology.org.

Postdoctoral research position in the Temporal Dynamics of Communities

The Weecology lab group run by Morgan Ernest and Ethan White at the University of Florida is seeking a post-doctoral researcher to study changes in ecological communities through time. This position will primarily involve broad-scale comparative analyses across communities using large time-series datasets and/or in-depth analyses of our own long-term dataset (the Portal Project). Experience with any of the following is useful, but not required: long-term data, macroecology, paleoecology, quantitative/theoretical ecology, and programming/data analysis in R or Python. The successful applicant will be expected to collaborate on lab projects on community dynamics and develop their own research projects in this area according to their interests.

Weecology is a partnership between the Ernest Lab, which tends to be more field and community ecology oriented and the White Lab, which tends to be more quantitative and computationally oriented. The Weecology group supports and encourages students interested in a variety of career paths. Former weecologists are currently employed in the tech industry, with the National Ecological Observatory Network, as faculty at teaching-focused colleges, and as postdocs and faculty at research universities. We are also committed to supporting and training a diverse scientific workforce. Current and former group members encompass a variety of racial and ethnic backgrounds from the U.S. and other countries, members of the LGBTQ community, military veterans, people with chronic illnesses, and first-generation college students. More information about the Weecology group and respective labs is available on our website. You can also check us out on Twitter (@skmorgane, @ethanwhite, @weecology), GitHub, and our blog Jabberwocky Ecology.

This 2-year postdoc has a flexible start date, but can start as early as June 1st 2017. Interested students should contact Dr. Morgan Ernest (skmorgane@ufl.edu) with their CV including a list of three references, a cover letter detailing their research interests/experiences, and one or more research samples (a PDF or link to a scientific product such as a published paper, preprint, software, data analysis code, etc). The position will remain open until filled, with initial review of applications beginning on April 24th.

Data Retriever 2.0: We handle the data so you can focus on the analysis

We are very exited to announce a major new release of the Data Retriever, our software for making it quick and easy to get clean, ready to analyze, versions of publicly available data.

The Data Retriever, automates the downloading, cleaning, and installing of ecological and environmental data into your choice of databases and flat file formats. Instead of hours tracking down the data on the web, downloading it, trying to import it, running into issues (e.g, non-standard nulls, problematic column names, encoding issues), fixing one problem, and then encountering the next, all you need to do is run a single command from the command line:

$ retriever install csv iris
$ retriever install sqlite breed-bird-survey -f bbs.sqlite

or from R:

>>> rdataretriever::install('postgres', 'wine-quality')
>>> portal_data <- rdataretriever::fetch('portal')

The Data Retriever uses information in Frictionless Data datapackage.json files to automatically handle all of the complexities of “simple” data for you. For more complicated complicated datasets, with dozens of components or major data structure issues, the Retriever uses Python scripts as plugins to handle the major data cleaning work and then automatically handles the rest.

To find out more about the Data Retriever checkout the websites, the full documentation, and the GitHub repositories for both the Data Retriever and the R Data Retriever package.

Expanded focus and name change

For those of you familiar with the EcoData Retriever, this is the same software with a new name. Challenges with the data end of the analysis pipeline occur across disciplines and our tools work just as well for non-ecological data, so we’ve started adding non-ecological data and changed our name to reflect that. We’d love to hear from anyone interested in leading a push to add data from another discipline or just interested in adding a single favorite dataset.

As part of this we’ve changed the name of the R package from ecoretriever to rdataretriever.

Major changes

The 2.0 release includes a number of major changes including:

  • Python 3 support (a single code base runs on both Python 2 and 3)
  • Adoption of the frictionless data datapackage.json standard (replacing our old YAML like metadata system), including a command line interface for creating and editing datapackage.json files
  • Add json and xml as available output formats
  • Major expansion of the documentation and hosting of the documentation at Read the Docs
  • Remove the graphical user interface (to allow us to focus that development time on wrappers for other languages)
  • Lots of work under the hood and major improvements in testing
  • Broaden scope to include non-ecological data

We are also in the process of releasing version 1.0 of the R package. This version adds the new features in the Data Retriever and also includes major stability improvements, in particular in RStudio and on Windows.

We also have a brand new website.

Upgrading to the new version (UPDATED)

To ensure the smoothest upgrade to the new version we recommend:

  1. Run retriever reset scripts from the command line
  2. Uninstall the old version of the EcoData Retriever
  3. Install the new version
  4. Run retriever update from the command line

Acknowledgments

Henry Senyondo is the lead developer for the Data Retriever and has done an amazing job over the past year developing new features and shoring up the fundamentals for the software. He lead the work on 2.0 start to finish.

Akash Goel was a Google Summer of Code student with the project last summer and was responsible for the majority of the work adding Python 3 support and switching the project over to the datapackage.json standard.

Dan McGlinn, the creator of the R package, has continued his excellent leadership of the development of this package. Shawn Taylor, a new contributor, was instrumental in solving the stability issues on Windows/RStudio.

In addition to these core folks our growing group of contributors to both projects have been invaluable for adding new functionality, fixing bugs, and testing new changes. We are super excited to have contributions from 30 different people and will keep working hard to make sure that everyone feels welcome and supported in contributing to the project.

The level of work done to get these releases out the door was only possible due to generous support of the Gordon and Betty Moore Foundation’s Data Driven Discovery Initiative. This support allowed my group to employ Henry as a full time software engineer to work on these and other projects. This kind of active support for the development and maintenance of research oriented software makes sustainable software development at universities possible.

Fork our course: A semester-long Data Carpentry course for biologists

This is post is co-authored by Zack Brym and Ethan White

Over the last year and a half we have been actively developing a semester-long Data Carpentry course designed to be easily customized and integrated into existing graduate and undergraduate curricula.

Data Carpentry for Biologists contains course materials for teaching scientists how to work more effectively with data. The course provides introductions to data management and relational databases, data manipulation and analysis, and data visualization. It covers the same general types of material as a two-day Data Carpentry workshop, but expands the materials and opportunities for practice into a full-length university course. The teaching material uses R and SQLite, with some corresponding materials for Python as well. To help students understand the direct applications to their interests, the examples and exercises focus on biological questions and working with real data. The course emphasizes using best practices to produce reusable and reproducible data analysis.

Active-learning Teaching Materials

Learning computing requires active practice by working through programming problems. Just diving in to computing is challenging for most scientists, so the course instruction is designed to combine short live-coding introductions to concepts followed immediately by the students working on a related exercise. Additional exercises are assigned later for practice. This follows the “I do”, “We do”, “You do” approach to teaching, which leverages the benefits of active-learning and flipped classrooms without leaving students who are less comfortable with the material feeling lost. The bulk of class time is spent working on assigned exercises with the instructor moving around the room helping guide students through things they don’t understand and engaging with students who are thinking about advanced applications of what they’ve learned.

This approach is the result of lots of reading about effective teaching methods and Ethan’s experience teaching this and related courses over the last six years at Utah State University and the University of Florida. It seems to work well for both students that get the material easily and those that find it more challenging. We’ve also tried to make these materials as useful as possible for self-guided students.

Open course development

Software Carpentry and Data Carpentry have shown how powerful collaborative lesson development can be and we’re interested in bringing that to the university classroom. We have designed the course materials to be modular and easy to modify, and the course website easy to clone and set up. All of the teaching materials and associated website files are openly available at the Data Carpentry for Biologists repository on GitHub under CC-BY and MIT licenses. The course materials are all written in Markdown and everything runs on Jekyll through GitHub Pages. Making your own version of the course should take less than an hour. We’ve developed documentation for how to create your own version of the course and how to contribute to development. Exercises and assignments are modular and changing exercises and assignments simply involves reordering items in a list. Adding a new exercise involves creating a new Markdown file and then adding its title to the list of exercises for an assignment.

Get Involved

If you teach, or want to teach, a course like this, we’d love to get you involved. Here are some useful links for getting started.

–   I want to teach the course.
–   I have some feedback.
–   I want to contribute to the project.

We want to be sure getting involved is as easy as possible. We’ve worked hard to provide documentation and help resources for students and instructors. Students can find all they need to know at our student start guide. Instructors have access to course content and site design documentation.

If your having trouble finding something or getting something to work, or simply have some feedback about the course please open a new issue at GitHub or send us an email.

Acknowledgements

Development of this course was generously support by  the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.

New release of the EcoData Retriever

EcoData Retriever logoWe are very exited to announce the newest release of the EcoData Retriever, our software for automating the downloading, cleaning, and installing of ecological and environmental data. Instead of hours or days trying to get complicated datasets like the Breeding Bird Survey ready for analysis, the Retriever lets you simply click a button or run a single command from R or the command line, and your computer does the rest.

bbs_install_animated

It’s been over a year since the last retriever release and there are lots of new features and improvements to be excited about.

  • We’ve added 21 new datasets, including major ecological and environmental datasets like eBird, Vertnet, and the Global Wood Density Database, and the PRISM climate data.
  • To support all of these datasets we’ve added support for additional data types including greater than memory archive files, and we’ve also improved the ability to control where downloaded files are stored and how they are clustered together.
  • We’ve significantly improved documentation and now have a new automatically built documentation site at Read The Docs.
  • We’ve also made a lot of under the hood improvements.

This is also the first release that has been overseen by Weecology’s new software engineer, Henry Senyondo. We’re excited to have Henry on the team, and now that he’s around development of both the EcoData Retriever and other lab software projects will be happening more quickly.

A big thanks to the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative for funding this development through Grant GBMF4563 and to the National Science Foundation for funding as part of a CAREER award to Ethan White.

UPDATE: Led by Dan McGlinn we also released a new version of the ecoretriever R interface for the Retriever last fall. This makes using the Retriever from R as simple as:

data <- ecoretriever::fetch("BBS")

White Lab PhD openings at the University of Florida

I’m looking for one or more graduate students to join my group next fall. In addition to the official add (below) I’d like to add a few extra thoughts. As Morgan Ernest noted in her recent ad, we have a relatively unique setup at Weecology in that we interact actively with members of the Ernest Lab. We share space, have joint lab meetings, and generally maintain a very close intellectual relationship. We do this with the goal of breaking down the barriers between the quantitative side of ecology and the field/lab side of ecology. Our goal is to train scientists who span these barriers in a way that allows them to tackle interesting and important questions.

I also believe it’s important to train students for multiple potential career paths. Members of my lab have gone on to faculty positions, postdocs, and jobs in both science non-profits and the software industry.

Scientists in my group regularly both write papers (e.g., these recent papers from dissertation chapters: Locey & White 2013, Xiao et al. 2014) and develop or contribute to software (e.g., EcoData Retriever, ecoretriever, rpartitions & pypartitions) even if they’ve never coded before they joined my lab.

My group generally works on problems at the population, community, and ecosystem levels of ecology. You can find out more about what we’ve been up to by checking out our website. If you’re interested in learning more about where the lab is headed I recommend reading my recently funded Moore Investigator in Data-Driven Discovery proposal.

PH.D STUDENT OPENINGS IN QUANTITATIVE, COMPUTATIONAL, AND MACRO- ECOLOGY

The White Lab at the University of Florida has openings for one or more PhD students in quantitative, computational, and/or macro- ecology to start fall 2015. The student(s) will be supported as graduate research assistants from a combination of NSF, Moore Foundation, and University of Florida sources depending on their research interests.

The White Lab uses computational, mathematical, and advanced statistical/machine learning methods to understand and make predictions/forecasts for ecological systems using large amounts of data. Background in quantitative and computational techniques is not necessary, only an interest in learning and applying them. Students are encouraged to develop their own research projects related to their interests.

The White Lab is currently at Utah State University, but is moving to the Department of Wildlife Ecology and Conservation at the University of Florida starting summer 2015.

Interested students should contact Dr. Ethan White (ethan@weecology.org) by Nov 15th, 2014 with their CV, GRE scores, and a brief statement of research interests.

UPDATE: Added a note that we work at population, community, and ecosystem levels.

EcoData Retriever now supports R and environmental data, and has more datasets

Retreiver Logo
We are very excited to announce the newest release of our EcoData Retriever software and the first release of a supporting R package, ecoretriever. If you’re not familiar with the EcoData Retriever you can read more here.

The biggest improvement to the Retriever in this set of releases is the ability to run it directly from R. Dan McGlinn did a great job leading the development of this package and we got ton of fantastic help from the folks at rOpenSci (most notably Scott Chamberlain, Gavin Simpson, and Karthik Ram). Now, once you install the main EcoData Retriever, you can run it from inside R by doing things like:

install.packages('ecoretriever')
library(ecoretriever)

# List the datasets available via the Retriever
ecoretriever::datasets()

# Install the Gentry dataset into csv files in your working directory
ecoretriever::install('Gentry', 'csv')

# Download the raw Gentry dataset files, without any processing,
# to the subdirectory named data
ecoretriever::download('Gentry', './data/')

# Install and load a dataset as a list
Gentry = ecoretriever::fetch('Gentry')
names(Gentry)
head(Gentry$counts)

The other big advance in this release is the ability to have the Retriever directly download files instead of processing them. This allows us to support data that doesn’t come in standard tabular forms. So, we can now include things like environmental data in GIS formats and phylogenetic data such as supertrees. We’ve used this new capability to allow the automatic downloading of the Bioclim data, one of the most widely used climate datasets in ecology, and the supertree for mammals from Fritz et al. 2009.

Finally, we’ve also add the very cool mammalian diet dataset from Dryad