We are exited to announce a new release of the Data Retriever, our software for making it quick and easy to get clean, ready to analyze, data.
The Data Retriever, automates the downloading, cleaning, and installing of data into your choice of databases and flat file formats. Instead of hours tracking down the data on the web, downloading it, trying to import it, running into issues, fixing one problem, and then encountering the next, all you need to do is run a single command from the command line, R, or (now!!) Python:
$ retriever install csv iris
> portal_data <- rdataretriever::fetch('portal')
In : import retriever as rt In : rt.install_postgres('breed-bird-survey')
- Python interface: While the retriever is written in Python the package previously only had a command line interface. Now you can access the full power of the retriever from directly inside Python. See the full tutorial for more details.
- Conda packaging: The conda package manager has become one of the two main ways to install Python packages. You can now install the retriever using
conda install retriever -c conda-forge
- Command line autocomplete: As the number of datasets and backends supported by the retriever goes it can be difficult to remember specific names. Using Tab will now autocomplete retriever commands, backends, and dataset names. (Currently only available of OSX and Linux)
- We also made some changes to the metadata script system so if you’ve previously installed the retriever you should update your scripts using:
retriever reset scripts retriever update
Find out more
To find out more about the Data Retriever checkout the:
Ongoing work on the Data Retriever lead by Henry Senyondo is made possible by the generous support of the Gordon and Betty Moore Foundation’s Data Driven Discovery Initiative. This kind of active support for the development and maintenance of research oriented software makes sustainable software development at universities possible. Shivam Negi developed the Python interface as part of his Google Summer of Code project. You can read more about his time in GSOC at his blog.
Twelve different folks contributed code to this release. A big thanks to Henry Senyondo, Ethan White, Shivam Negi, Andrew Zhang, Kapil Kumar, Kunal Pal, Amritanshu Jain, Kevin Amipara, David LeBauer, Amritanshu Jain, Goel Akash, and
Parth-25m for making the retriever better.