Data Analyst position in ecology research group
The Weecology lab group run by Ethan White and Morgan Ernest at the University of Florida is seeking a Data Analyst to work collaboratively with faculty, graduate students, and postdocs to understand and model ecological systems. We’re looking for someone who enjoys tidying, managing, manipulating, visualizing, and analyzing data to help support scientific discovery.
The position will include:
- Organizing, analyzing, and visualizing large amounts of ecological data, including spatial and remotely sensed data. Modifying existing analytical approaches and data protocols as needed.
- Planning and executing the analysis of data related to newly forming questions from the group. Assisting in the statistical analysis of ecological data, as determined by the needs of the research group.
- Providing assistance and guidance to members of the research group on existing research projects. Working collaboratively with undergraduates, graduate students and postdocs in the group and from related projects.
- Learning new analytical tools and software as needed.
This is a staff position in the group and will be focused on data management and analysis. All members of this collaborative group are considered equal partners in the scientific process and this position will be actively involved in collaborations. Weecology believes in the importance of open science, so most work done as part of this position will involve writing open source code, use of open source software, and production and use of open data.
Weecology is a partnership between the White Lab, which studies ecology using quantitative and computational approaches and the Ernest Lab, which tends to be more field and community ecology oriented. The Weecology group supports and encourages members interested in a variety of career paths. Former weecologists are currently employed in the tech industry, with the National Ecological Observatory Network, as faculty at teaching-focused colleges, and as postdocs and faculty at research universities. We are also committed to supporting and training a diverse scientific workforce. Current and former group members encompass a variety of racial and ethnic backgrounds from the U.S. and other countries, members of the LGBTQ community, military veterans, people with chronic illnesses, and first-generation college students. More information about the Weecology group and respective labs is available on our website. You can also check us out on Twitter (@skmorgane, @ethanwhite, @weecology, GitHub, and our blog Jabberwocky Ecology.
The ideal candidate will have:
- Experience working with data in R or Python, some exposure to version control (preferably Git and GitHub), and potentially some background with database management systems (e.g., PostgreSQL, SQLite, MySQL) and spatial data.
- Research experience in ecology
- Interest in open approaches to science
- Experience collecting or working with ecological data
That said, don’t let the absence of any of these stop you from applying. If this sounds like a job you’d like to have please go ahead and put in an application.
We currently have funding for this position for 2.5 years. Minimum salary is $40,000/year (which goes a pretty long way in Gainesville), but there is significant flexibility in this number for highly qualified candidates. We are open to the possibility of someone working remotely. The position will remain open until filled, with initial review of applications beginning on May 5th. If you’re interested in applying you can do so through the official UF position page. If you have any questions or just want to let us know that you’re applying you can email Weecology’s project manager Glenda Yenni at firstname.lastname@example.org.
White Lab PhD openings at the University of Florida
I’m looking for one or more graduate students to join my group next fall. In addition to the official add (below) I’d like to add a few extra thoughts. As Morgan Ernest noted in her recent ad, we have a relatively unique setup at Weecology in that we interact actively with members of the Ernest Lab. We share space, have joint lab meetings, and generally maintain a very close intellectual relationship. We do this with the goal of breaking down the barriers between the quantitative side of ecology and the field/lab side of ecology. Our goal is to train scientists who span these barriers in a way that allows them to tackle interesting and important questions.
I also believe it’s important to train students for multiple potential career paths. Members of my lab have gone on to faculty positions, postdocs, and jobs in both science non-profits and the software industry.
Scientists in my group regularly both write papers (e.g., these recent papers from dissertation chapters: Locey & White 2013, Xiao et al. 2014) and develop or contribute to software (e.g., EcoData Retriever, ecoretriever, rpartitions & pypartitions) even if they’ve never coded before they joined my lab.
My group generally works on problems at the population, community, and ecosystem levels of ecology. You can find out more about what we’ve been up to by checking out our website. If you’re interested in learning more about where the lab is headed I recommend reading my recently funded Moore Investigator in Data-Driven Discovery proposal.
PH.D STUDENT OPENINGS IN QUANTITATIVE, COMPUTATIONAL, AND MACRO- ECOLOGY
The White Lab at the University of Florida has openings for one or more PhD students in quantitative, computational, and/or macro- ecology to start fall 2015. The student(s) will be supported as graduate research assistants from a combination of NSF, Moore Foundation, and University of Florida sources depending on their research interests.
The White Lab uses computational, mathematical, and advanced statistical/machine learning methods to understand and make predictions/forecasts for ecological systems using large amounts of data. Background in quantitative and computational techniques is not necessary, only an interest in learning and applying them. Students are encouraged to develop their own research projects related to their interests.
The White Lab is currently at Utah State University, but is moving to the Department of Wildlife Ecology and Conservation at the University of Florida starting summer 2015.
Interested students should contact Dr. Ethan White (email@example.com) by Nov 15th, 2014 with their CV, GRE scores, and a brief statement of research interests.
UPDATE: Added a note that we work at population, community, and ecosystem levels.
Four basic skill areas for a macroecologist [Guest post]
This is a guest post by Elita Baldridge (@elitabaldridge), a graduate student in Ethan White’s lab in the Ecology Center at Utah State University.
As a budding macroecologist, I have thought a lot about what skills I need to acquire during my Ph.D. This is my model of the four basic attributes for a macroecologist, although I think it is more generally applicable to many ecologists as well:
- Knowledge of SQL
- Dealing with proper database format and structure
- Finding data
- Appropriate treatments of data
- Understanding what good data are
- Monte Carlo methods
- Maximum likelihood methods
- Power analysis
- Higher level calculus
- Should be able to derive analytical solutions for problems
- Should be able to write programs for analysis, not just simple statistics and simple graphs.
- Able to use version control
- Once you can program in one language, you should be able to program in other languages without much effort, but should be fluent in at least one language.
Achieve expertise in at least 2 out of the 4 basic areas, but be able to communicate with people who have skills in the other areas. However, if you are good at collaboration and come up with really good questions, you can make up for skill deficiencies by collaborating with others who possess those skills. Start with smaller collaborations with the people in your lab, then expand outside your lab or increase the number of collaborators as your collaboration skills improve.
Achieving proficiency in an area is best done by using it for a project that you are interested in. The more you struggle with something, the better you understand it eventually, so working on a project is a better way to learn than trying to learn by completing exercises.
The attribute should be generalizable to other problems: For example, if you need to learn maximum likelihood for your project, you should understand how to apply it to other questions. If you need to run an SQL query to get data from one database, you should understand how to write an SQL query to get data from a different database.
In graduate school:
Someone who wants to compile their own data or work with existing data sets needs to develop a good intuitive feel for data; even if they cannot write SQL code, they need to understand what good and bad databases look like and develop a good sense for questionable data, and how known issues with data could affect the appropriateness of data for a given question. The data skill is also useful if a student is collecting field data, because a little bit of thought before data collection goes a long way toward preventing problems later on.
A student who is getting a terminal master’s and is planning on using pre-existing data should probably be focusing on the data skill (because data is a highly marketable skill, and understanding data prevents major mistakes). If the data are not coming from a central database, like the BBS, where the quality of the data is known, additional time will have to be added for time to compile data, time to clean the data, and time to figure out if the data can be used responsibly, and time to fill holes in the data.
Master’s students who want to go on for a Ph.D. should decide what questions they are interested in and should try to pick a project that focuses on learning a good skill that will give them a headstart- more empirical (programming or stats), more theoretical (math), more applied (math (e.g., for developing models), stats(e.g., applying pre-existing models and evaluating models, etc.), or programming (e.g. making tools for people to use)).
Ph.D. students need to figure out what types of questions they are interested in, and learn those skills that will allow them to answer those questions. Don’t learn a skill because it is trendy or you think it will help you get a job later if you don’t actually want to use that skill. Conversely, don’t shy away from learning a skill if it is essential for you to pursue the questions you are interested in.
Right now, as a Ph.D. student, I am specializing in data and programming. I speak enough math and stats that I can communicate with other scientists and learn the specific analytical techniques I need for a given project. For my interests (testing questions with large datasets), I think that by the time I am done with my Ph.D., I will have the skills I need to be fairly independent with my research.
Graduate student opening with Weecology
We’re looking for a new student to join our interdisciplinary research group. The opening is in Ethan’s lab, but the faculty, students, and postdocs in Weecology interact seamlessly among groups. If you’re interested in macroecology, community ecology, or just about anything with a computational/quantitative component to it, we’d love to hear from you. The formal ad is included below (and yes, we did include links to our blog, twitter, and our GitHub repositories in the ad). Please forward this to any students who you think might be a good fit, and let us know if you have any questions.
GRADUATE STUDENT OPENING
The White Lab at Utah State University has an opening for a graduate student with interests in Macroecology, Community Ecology, or Ecological Theory/Modeling. Active areas of research in the White lab include broad scale patterns related to biodiversity, abundance and body size, ecological dynamics, and the use of sensor networks for studying ecological systems. We use computational, mathematical, and advanced statistical methods in much of our work, so students with an interest in these kinds of methods are encouraged to apply. Background in these quantitative techniques is not necessary, only an interest in learning and applying them. While students interested in one of the general areas listed above are preferred, students are encouraged to develop their own research projects related to their interests. The White Lab is part of an interdisciplinary ecology research group (http://weecology.org) whose goal is to facilitate the broad training of ecologists in areas from field work to quantitative methods. Students with broad interests are jointly trained in an interdisciplinary setting. We are looking for students who want a supportive environment in which to pursue their own ideas. Graduate students are funded through a combination of research assistantships, teaching assistantships, and fellowships. Students interested in pursuing a PhD are preferred. Utah State University has an excellent graduate program in ecology with over 50 faculty and 80+ graduate students across campus affiliated with the USU Ecology Center (http://www.usu.edu/ecology/).
Additional information about the position and Utah State University is available at:
Interested students can find more information about our group by checking out:
Our websites: http://whitelab.weecology.org, http://weecology.org
Our code repositories: http://github.com/weecology
Our blog: http://jabberwocky.weecology.org
And Twitter: http://twitter.com/ethanwhite
Interested students should contact Dr. Ethan White (firstname.lastname@example.org) by December 1st, 2012 with their CV, GPA, GRE scores (if available), and a brief statement of research interests.
Postdoc in Evolutionary Bioinformatics [Jobs]
There is an exciting postdoc opportunity for folks interested in quantitative approaches to studying evolution in Michael Gilchrist’s lab at the University of Tennessee. I knew Mike when we were both in New Mexico. He’s really sharp, a nice guy, and a very patient teacher. He taught me all about likelihood and numerical maximization and opened my mind to a whole new way of modeling biological systems. This will definitely be a great postdoc for the right person, especially since NIMBioS is at UTK as well. Here’s the ad:
Outstanding, motivated candidates are being sought for a post-doctoral position in the Gilchrist lab in the Department of Ecology & Evolutionary Biology at the University of Tennessee, Knoxville. The successful candidate will be supported by a three year NSF grant whose goal is to develop, integrate and test mathematical models of protein translation and sequence evolution using available genomic sequence and expression level datasets. Publications directly related to this work include Gilchrist. M.A. 2007, Molec. Bio. & Evol. (http://www.tinyurl/shahgilchrist11) and Shah, P. and M.A. Gilchrist 2011, PNAS (http://www.tinyurl/gilchrist07a).
The emphasis of the laboratory is focused on using biologically motivated models to analyze complex, heterogeneous datasets to answer biologically motivated questions. The research associated with this position draws upon a wide range of scientiﬁc disciplines including: cellular biology, evolutionary theory, statistical physics, protein folding, diﬀerential equations, and probability. Consequently, the ideal candidate would have a Ph.D. in either biology, mathematics, physics, computer science, engineering, or statistics with a background and interest in at least one of the other areas.
The researcher will collaborate closely with the PIs (Drs. Michael Gilchrist and Russell Zaretzki) on this project but potentiall have time to collaborate on other research projects with the PIs. In addition, the researcher will have opportunties to interact with other faculty members in the Division of Biology as well as researchers at the National Institute for Mathematical and Biological Synthesis (http://www.nimbios.org).
Review of applications begins immediately and will continue until the position is filled. To apply, please submit curriculum vitae including three references, a brief statement of research background and interests, and 1-3 relevant manuscripts to mikeg[at]utk[dot]edu.
RStudio [Things you should use]
If you use R (and it seems like everybody does these days) then you should check out RStudio – an easy to install, cross-platform IDE for R. Basically it’s a seamless integration of all of the aspects of R (including scripts, the console, figures, help, etc.) into a single easy to use package. For those of you are familiar with Matlab, it’s a very similar interface. It’s not a full blown IDE yet (no debugger; no lint) but what this actually means is that it’s simple and easy to use. If you use R I can’t imagine that you won’t love this new (and open source!) tool.
UPDATE: Check out another nice article on RStudio over at i’m a chordata! urochordata!
Thoughts on developing a digital presence
A while ago there was a bit of discussion around the academic blogosphere recently regarding the importance of developing a digital presence and what the best form of that presence might be. Recently as I’ve been looking around at academics’ websites as part of faculty, postdoc and graduate student searchers going on in my department/lab I’ve been reminded of the importance of having a digital presence.
It seems pretty clear to me that the web is the primary source of information acquisition for most academics, at least up through the young associate professors. There are no doubt some senior folk who would still rather have a paper copy of a journal sent to them via snail mail and who rarely open their currently installed copy of Internet Explorer 6, but I would be very surprised if most folks who are evaluating graduate student, postdoctoral and faculty job candidates aren’t dropping the name of the applicant into their favorite search engines and seeing what comes up. They aren’t looking around for dirt like all those scary news stories that were meant to stop college students from posting drunken photos of themselves on social networking sites. They’re just
procrastinating looking for more information to get a clearer picture of you as a scientist/academic. I also do a quick web search when I meet someone interesting at a conference, get a paper/grant to review with authors I haven’t heard of before, read an interesting study by someone I don’t know, etc. Many folks who apply to join my lab for graduate school find me through the web.
When folks go looking around for you on the web you want them to find something (not finding anything is the digital equivalent of “being a nobody”), and better yet you want them to find something that puts your best foot forward. But what should this be? Should you Tweet, Buzz, be LinkedIn, start a Blog, have a Wiki*, or maybe just get freaked out by all of this technology and move to the wilderness somewhere and never speak to anyone ever again.
I think the answer here is simple: start with a website. This is the simplest way to present yourself to the outside world and you can (and should) start one as soon as you begin graduate school. The website can be very simple. All you need is a homepage of some kind, a page providing more detailed descriptions of your research interests, a CV, a page listing your publications†, and a page with your contact information. Keep this updated and looking decent and you’ll have as good an online presence as most academics.
While putting together your own website might seem a little intimidating it’s actually very easy these days. The simplest approach is to use one of the really easy hosted solutions out there. These include things like Google Sites, which are specifically designed to let you make websites; or you can easily turn a hosted blogging system into a website (WordPress.com is often used for this). There are lots of other good options out there (let us know about your favorites in the comments). In addition many universities have some sort of system set up for letting you easily make websites, just ask around. Alternatively, you can get a static .html based template and then add your own content to it. Open Source Web Design is the best place I’ve found for templates. You can either open up the actual html files or you can use a WYSIWYG editor to replace the sample text with your own content. SeaMonkey is a good option for a WYSIWYG editor. Just ask your IT folks how to get these files up on the web when you’re done.
So, setting up a website is easy, but should you be doing other things as well and if so what. At the moment I would say that if you’re interested in trying out a new mode of academic communication then you should pick one that sounds like fun to you and give it a try; but this is by no means a necessity as an academic at the moment. If you do try to do some of these other things, then do them in moderation. It’s easy to get caught up in the rapid rewards of finishing a blog post or posting a tweet on Twitter, not to mention keeping up with others blogs and tweets, but this stuff can rapidly eat up your day and for the foreseeable future you won’t be getting a job based on your awesome stream of 140 character or less insights.*Yep, that’s right, it’s a link to the Wikipedia page on Wiki’s. †And links to copies of them if you are comfortable flaunting the absurd copyright/licensing policies of many of the academic publishers (or if you only published in open access journals).
Postdoctoral position in macroecology, quantitative ecology, and ecoinformatics
We have a postdoc position available for someone interested in the general areas of macroecology, quantitative ecology, and ecoinformatics. Here’s the short ad with links to the full job description:
Ethan White’s lab at Utah State University is looking for a postdoc to collaborate on research studying approaches for unifying macroecological patterns (e.g., species abundance distributions and species-area relationships) and predicting variation in these patterns using ecological and environmental variables. The project aims to 1) evaluate the performance of models that link ecological patterns by using broad scale data on at least three major taxonomic groups (birds, plants, and mammals); and 2) combine models with ecological and environmental factors to explain continental scale variation in community structure. Models to be explored include maximum entropy models, neutral models, fractal based models, and statistical models. The postdoc will also be involved in an ecoinformatics initiative developing tools to facilitate the use of existing ecological data. There will be ample opportunity for independent and collaborative research in related areas of macroecology, community ecology, theoretical ecology, and ecoinformatics. The postdoc will benefit from interactions with researchers in Dr. White’s lab, the Weecology Interdisciplinary Research Group, and with Dr. John Harte’s lab at the University of California Berkeley. Applicants from a variety of backgrounds including ecology, mathematics, statistics, physics and computer science are encouraged to apply. The position is available for 1 year with the possibility for renewal depending on performance, and could begin as early as September 2010 and no later than May 2011. Applications will begin to be considered starting on September 1, 2010. Go to the USU job page to see the full advertisement and to apply.
If you’re interested in the position and are planning to be at ESA please leave a comment or drop me an email (email@example.com) and we can try to set up a time to talk while we’re in Pittsburgh. Questions about the position and expressions of interest are also welcome.
UPDATE: This position has been filled.
Rise of the neoFisherian statistical paradigm
I’ve been meaning to get around to posting about Stuart Hurlbert and Cecilia Lombardi’s recent paper (2009; Ann. Zool. Fennici 46: 311–349) on the use of p-values in drawing scientific conclusions… but thankfully Jarrett Byrnes over at i’m a chordata! urochordata! wrote such a great post about it that all I need to do is point you over to his place. Just so you know what you’re getting into, Hurlbert & Lombardi provide a convincing argument against the sanctity of the canonical alpha value of 0.05 and against the use of alpha values and ‘statistically significant’ in general. Instead they recommend (quoting Jarrett):
1) Report a p-value for a test. 2) Do not assign it significance, but rather refer to the level of support it gives for rejecting a null – strong, weak, moderate, practically non-existent. Make sure this statement of support is grounded in the design and power of the experiment. Suspend judgement on rejecting a null if the p value is high, as p-value testing is NOT the same as giving evidence FOR a null (something so many of us forget). 3) Use this in accumulation with other lines of evidence to draw a conclusion about a research hypothesis.
Go check out the full post. It’s well worth the read.
Frequency distributions for ecologists V: Don’t let the lack of a perfect tool prevent you from asking interesting questions
I had an interesting conversation with someone the other day that made me think I needed one last frequency distribution post in order to avoid causing some people to not move forward with addressing interesting questions.
As a quantitative ecologist I spent a fair amount of time trying to figure out the best way to do things. In other words, I often want to know what the best method is available for answering a particular question. When I think I’ve figured this out I (sometimes, if I have the energy) try to communicate the best methodology more broadly to encourage good practice and accurate answers to questions of interest to ecologists. In some cases finding the best approach is fairly easy. For example, likelihood based methods for fitting and comparing simple frequency distributions are often straightforward and can be easily looked up online. However, in many cases the methodological challenges are more substantial, or the question being asked is not general enough that the methods have been worked out and clearly presented. This happens in the case of frequency distributions when one needs non-standard minimum and maximum values (a common case in ecological studies) or when one needs discrete analogs of traditionally continuous distributions. It’s not that these cases can’t be addressed, it’s just that you can’t look the solutions up on Wikipedia.
So, what is someone without a sufficient background to do (and, btw, that might be all of us if the problem is really hard or even… intractable). First, I’d recommend trying to ask for help. Talk to a statistician at your university or a quantitative colleague and see if they can help you figure things out. I am always pleased to try to help out because I always learn something in the process. Then, if that fails, just do something. Morgan and I will probably write more about this later, but please, please, please don’t let the questions you ask as an ecologists be defined by the availability of an ideal statistical methodology that is easy to implement. In the context of the current series of posts, if you are trying to do something with a more complex frequency distribution and you can’t find a solution to your problem using likelihood then use something else. If it was me I’d go with either normalized logarithmic binning or something based on the CDF as these methods can behave reasonably well. Sure, people like me may complain, but that’s fine. Just make clear that you are aware of the potential weaknesses and that you did what you did because you couldn’t figure out an appropriate alternative approach. That way you still get to make progress on the question of interest and you may motivate people to help work on developing better methods. Sure, you might be the presenting the “right” answer, but then I very much doubt that we ever are when studying ecological systems anyway.
Frequency distributions for ecologists
This is a table of contents of sorts for five posts on the visualization, fitting, and comparison of frequency distributions. The goal of these posts is to expose ecologists to the ideas and language related to good statistical practices for addressing frequency distribution data. The focus is on simple distributions and likelihood methods. The information provided here is far from comprehensive, but my aim is to give readers a good place to start when exploring this kind of data.
Frequency distributions for ecologists IV: comparing model performance
Likelihood, likelihood, likelihood (and maybe some other complicated approaches), but definitely not r^2 values from fitting regressions to binned data.
A bit more nitty gritty detail
In addition to causing issues with parameter estimation, binning based methods are also inappropriate when trying to determine which distribution provides the best fit to empirical data. As a result you won’t find any card carrying statistician recommending this approach. Basically binning and fitting regressions ignores the very nature of this kind of data generating bizarre error structures and making measures of model fit arbitrary and ungrounded in statistical theory. This isn’t something that is controversial in anyway. It is not “hotly contested” or open to debate despite what you may read in the ecological literature (i.e., Reynolds & Rhodes 2009), and you can’t (well, at least you shouldn’t) choose to use binning based methods just because someone else did (i.e., Maestre & Escudero 2009)*.
So, to be rigorous you want to use a more appropriate framework, which again should be likelihood (or Bayes; or something more complicated that I know nothing about; but if you’re taking the time to read this article you should probably start with likelihood). To determine the likelihood of a model given the data you simply take the product of the probability density function (pdf) evaluated at each value of x (put each value of x into the equation for the pdf with the parameters set to the maximum likelihood estimates and then multiply all of the resulting values together). Having done this you can use a likelihood ratio test to compare two distributions (if you’re into p-values this is for you) or you can use model selection based on an information criterion like AIC. With the likelihood in hand the AIC is then just 2k-2*ln(likelihood) (where k is the number of parameters). In practice you’ll probably want to calculate the ln(likelihood) to start (otherwise the values get really small and you’ll run into precision issues) so you would typically take the sum of the log of the pdf instead of the product described above. Andy Edwards 2007 Nature paper does a nice job of talking about this in the context of Levy Flights. It’s worth keeping in mind that the details can have an important influence here, so you’ll want to be sure that your pdfs have appropriately defined minimum and maximum and satisfy the other limitations on the parameters as well. This approach will yield valid AICs for comparing models. In contrast the AIC values in another recent Nature paper (which are based on binning the data, fitting regressions, and then estimating the likelihoods of those regressions) are not grounded in probability in the same way and in my opinion are not appropriate (at least without some Monte Carlo work to show that they at least perform well).
I’m not trying to give anyone a hard time about what they’ve done in the past. There really is a failure of education and discussion regarding how to deal with distributions in ecology. That said, now that the discussion of these issues has started to reach the broad ecological population we do need to be careful about unnecessarily and inappropriately fomenting a statistical controversy that doesn’t exist, so that we can move towards the use and refinement of the most rigorous methods available.
If you’re looking for a good introduction to this area I highly recommend The Ecological Detective by Hilborn & Mangel. If you’re looking for something with more advanced material and technical detail I like In All Likelihood by Pawitan. I’ve also heard good things about Benjamin Bolker’s new book, but I have not yet read it myself.
*NB: I haven’t conducted any Monte Carlo work on this myself like I have for parameter estimation, but I have read quite a bit of statistical literature in this area and if you do the same I think you will find that statisticians don’t even consider the possibility of binning and fitting regressions, because it is so obviously disconnected from the question at hand.