We are excited to announce the first release of a new Julia package that let’s you run our Data Retriever software with a native Julia interface.
For those of you not familiar with Julia it is a new programming language that is similar to R and Python, has a central focus on data analysis, and is designed from the ground up to be fast. It is an emerging scientific programming and data analysis language. Tim Poisot and his lab have been leaders in introducing Julia to the ecology community (thanks to them you can access GBIF data, analyze ecological networks, and more) and we’re pleased to start following his lead.
After installing the Python package getting your favorite dataset into Julia involves opening Julia and running:
julia> Pkg.add("Retriever") julia> using Retriever julia> iris_data = Retriever.install_csv("iris") julia> iris_data = readcsv("iris_Iris.csv")
Like the Python and R versions of the retriever the Julia version also lets you install into a number of different database management systems and formats to meet your needs including PostgreSQL, MySQL, SQLite, JSON, and XML. So if you need to install a large dataset and access if from the database you can do that:
julia> Pkg.add("SQLite") julia> using SQLite julia> Retriever.install_sqlite("breed-bird-survey", file="bbs.sqlite") julia> db = SQLite.DB("bbs.sqlite") julia> SQLite.query(db, "SELECT * FROM breed_bird_survey_counts LIMIT 10")
We use the PyCall package to directly run the Python code from the main retriever package. Cross-language support like this is really useful for letting difficult to develop core code be easily used in different languages and it’s great that this is a core feature of Julia.
This is our first Julia package and so there are sure to be lots of things to improve (starting with the documentation). If you use Julia, or are interested in experimenting with it, we’d love feedback, issues, and pull requests. We’re always enthusiastic to have new contributors and help everyone get started, especially if they’re just learning. For more information see:
Scaling-up ecological patterns and processes is crucial to understanding the effects of environmental change on natural systems and human society. We are piloting a Data Science Challenge where multiple groups attempt to use the same remote sensing data from low flying airplanes to infer the location and type of trees in forests. This will allow forests to be studied in detail at much larger scales than is currently possible. This kind of collaborative data analysis challenge has proven highly effective in other fields for quickly improving methods for converting image data to useful information.
There are three sets of tasks: 1) identifying individual trees in remote sensing images; 2) aligning ground data with remote sensing data; and 3) classifying trees into species.
Teams (or individuals) can participate in all of them or just pick the tasks they are most interested in. Tasks 2 and 3 can be accomplished using just tabular data. Task 1 requires working directly with spatial data. Details of the different tasks and links to the data are available at the challenge website:
We plan to write a general paper about the competition, the data, and the performance of the different methods used. Individual participants will be invited to write and publish associated short papers on the methods they used and results they produced. We already have a journal that has agreed to publishing all of these related contributions together into a collection (pending review of course).
The challenge is already open and the deadline for submissions is December 15th. Once you sign up on the website you will receive an email with some additional details. If you have any questions feel free to respond to that email or checkout the FAQ to see if they have already been answered.
This challenge is sponsored by the National Institute of Standards Technology as part of it’s Data Science Evaluation series and is also partially supported by he Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through grant GBMF4563.It uses data from the National Ecological Observatory Network in addition to data collected by the organizers. It is being organized by the Data Science Research lab, the Weecology lab, and Stephanie Bohlman’s lab all at the University of Florida.
We are exited to announce a new release of the Data Retriever, our software for making it quick and easy to get clean, ready to analyze, data.
The Data Retriever, automates the downloading, cleaning, and installing of data into your choice of databases and flat file formats. Instead of hours tracking down the data on the web, downloading it, trying to import it, running into issues, fixing one problem, and then encountering the next, all you need to do is run a single command from the command line, R, or (now!!) Python:
$ retriever install csv iris
> portal_data <- rdataretriever::fetch('portal')
In : import retriever as rt In : rt.install_postgres('breed-bird-survey')
- Python interface: While the retriever is written in Python the package previously only had a command line interface. Now you can access the full power of the retriever from directly inside Python. See the full tutorial for more details.
- Conda packaging: The conda package manager has become one of the two main ways to install Python packages. You can now install the retriever using
conda install retriever -c conda-forge
- Command line autocomplete: As the number of datasets and backends supported by the retriever goes it can be difficult to remember specific names. Using Tab will now autocomplete retriever commands, backends, and dataset names. (Currently only available of OSX and Linux)
- We also made some changes to the metadata script system so if you’ve previously installed the retriever you should update your scripts using:
retriever reset scripts retriever update
Find out more
To find out more about the Data Retriever checkout the:
Ongoing work on the Data Retriever lead by Henry Senyondo is made possible by the generous support of the Gordon and Betty Moore Foundation’s Data Driven Discovery Initiative. This kind of active support for the development and maintenance of research oriented software makes sustainable software development at universities possible. Shivam Negi developed the Python interface as part of his Google Summer of Code project. You can read more about his time in GSOC at his blog.
Twelve different folks contributed code to this release. A big thanks to Henry Senyondo, Ethan White, Shivam Negi, Andrew Zhang, Kapil Kumar, Kunal Pal, Amritanshu Jain, Kevin Amipara, David LeBauer, Amritanshu Jain, Goel Akash, and
Parth-25m for making the retriever better.
Someone (**cough** **cough** Morgan) fell down on their job rebloging the Portal Project 40th anniversary posts to Jabberwocky. But that means this week is the week o’ Portal as we reblog the various posts from the past few weeks. First up, what happens when we go out in the summer to count plants?
Twice a year the Portal crew gets a little larger, and spends a few extra days, and we count plants on all 384 quadrats. Despite some of us being in our second decade of visiting the site, and everyone on the plant crew being intimately familiar with most of the species at the site, and that the rodent RA has been watching the plants grow and giving us monthly updates, we still never really know what we’re going to find once we get out there. The desert does what it wants.
The uncertainty seems especially high for the summer plant community. Some years we arrive to an ocean of grass, waving in the breeze. Those are the years we spend a lot of ‘quality time’ with each quadrat. Other years we arrive to a dustbowl. We walk around the site laying our PVC quadrat down and picking it back up…
View original post 205 more words