In a big step forward for allowing proper credit to be provided to all of the awesome folks collecting and publishing data, the journal Global Ecology & Biogeography has just announced that they will start supporting an unlimited set of references to datasets used in a paper.
A growing concern in the macroecological community has been that many papers whose data are used in meta-analyses or data-compilation papers have not been getting citation credit because most journals require these papers to only be listed in the supplemental material (which is not indexed by most indexing services). GEB is proud to support the inclusion of a second list of references within the main paper for all data papers used… To our knowledge, GEB is the first journal in the ecological field to do this. And we’ll be working with Wiley to further improve options in this area.
These references will be included immediately following the traditional references section in both the html and pdf versions of the paper. You can see an example in Olds et al. (2016).
What this means is that when you combine data from dozens or hundreds of studies to conduct a synthetic analysis, you can cite all of the sources in a way that will provide citation credit to those collecting the data1. It also means that scientists using large data compilations can cite the original data sources as well as the compilation itself2.
This is important for encouraging the publication of data, since one of the common reasons that scientists don’t publish data is a lack of credit, and citation only in non-indexed supplementary materials sections is a common concern.
Facilitating proper citation of all data sources is something the community has been requesting and it’s great to see GEB taking the lead in this area. Since Wiley, the publisher of GEB, is the largest publisher of ecology journals, it should be straightforward to implement this new approach widely. If other journals follow GEB’s lead, we will enter a new era where citation of data can be as complete as possible, allowing proper credit to everyone who collects and publishes data.
1GEB will need to make sure that this section gets properly picked up by the indexers, and tweak the presentation as necessary if it isn’t.
2Provided that the compilation provides a method for compiling a citation list of all associated sources.
For the past few years I’ve been involved in a collaboration to put together a broad-coverage life history database for mammals, reptiles, and birds. The project started because my collaborator, Nathan Myhrvold, and I both had projects we were interested in that involved comparing life history traits of reptiles, mammals, and birds, and only mammals had easily accessible life history databases with broad taxonomic coverage. So, we decided to work together to fix this. To save others the hassle of redoing what we were doing, we decided to make the dataset available to the scientific community. While this post started out as a standard “Hey, check out this new publication from our group” post (Here it is, by the way: Myhrvold, N.P., †E. Baldridge, B. Chan, D. Sivam, D.L. Freeman, S.K.M. Ernest. 2015. An Amniote Life-history Database to Perform Comparative Analyses with Birds, Mammals, and Reptiles. Ecology 96:3109), I’ve realized that there’s something more important that needs to be discussed: what is the future of trait databases?
Trait databases are all the rage these days, for good reason. Traits are interesting from evolutionary and ecological perspectives: How and why do species differ in traits, how do traits evolve, how quickly do traits change in response to changing environment, and what impacts do these differences have on community assembly and ecosystem function. They have the potential to link individual performance with local, regional, and even global processes. There’s lots of trait data out there, but most of it has been buried in papers, books, theses, gray literature, field guides, etc. This has led to the explosion of compendiums compiling trait data. Some of these are published as Data Papers (e.g.: Mammals: Jones et al 2009 , Plankton: Kremer et al 2014) or on-line databases (e.g. AnAge, FishBase), which are open for everyone to use. Many of these open datasets are generated by a small number of scientists to address some particular question. Some are quasi-open/quasi-private resources generated by consortiums of scientists (TRY).
There are a variety of issues regarding these trait compendiums, not least of which is these trait compendiums pull data from numerous sources, but how do data generators get credit and what type of credit is reasonable? This is a doozy that I don’t have an answer to. Instead, my focus today is on the eventual endgame of trait databases. No trait database currently being produced has all the trait data of interest for every species. This means we have a bunch of incomplete data products running around. So, every few years, a bigger – more complete, but still incomplete – trait dataset is produced for some group of species. Sometimes the bigger dataset replicates the effort of the smaller one, sometimes it incorporates the smaller compilation whole-cloth, sometimes they have little overlap in sources whatsoever. Data compilations vary in the ease of use and accessibility. Some databases are widely known, some are known only to a few insiders. I could keep going. Clearly this state of affairs is less than optimal for rapid progress in studying traits.
So what’s the end game here? What should we be doing? In my opinion, what we need is a centralized trait database where people can contribute trait data and where that data is easily accessible by anyone who wants to use it for research (not just to the contributing members of the database). It would also be nice if people who contribute significant amounts of data (no, I’m not going to define that here) could get specific credit for that contribution – maybe as a Data Paper or E-Publication. To encourage people to not just download data, add to it, and then sit on the expanded dataset, embargoes could be put in place to allow people to add their data to the dataset but have the data protected for a limited period of time to allow that researcher to get first crack at the publications using that entry. It’d be really nice if people who use the database could easily download all the references for the data they used so it can be easily incorporated into a literature cited section. The central database could get credit (let’s face it, it needs to be able to justify the funding that such an endeavor would require) by having people register papers published using data from the database. They could then keep track of numbers of pubs and citations to those pubs to help track the database’s impact.
Right about now, my Paleo brethren may be thinking “this sounds suspiciously familiar”. I’ve pretty much lifted this list right off of the Paleobiology Database website (https://paleobiodb.org/#/faq). While ecologists have been running our every database for itself experiment on Trait Databases, the Paleobiologists have been experimenting with collaborative open databases for fossil records. I’m an outsider, so I don’t really know how the database is perceived within the paleo community, but from the outside I have been a big fan of the database, the work that has emerged from its existence, and the community that surrounds it. Which is why I’ve wondered if ecology could some something similar.
But if we’re going to do this, I think we need to copy something else from the Paleobiology Database: a focus on individual records. Currently, many trait databases focus on a species-level value; what is the average number of offspring per litter? Seed Mass? Average body size? This is a logical place to start building a database if many of the questions are focused on comparing central tendencies across species. But our understanding of traits and the questions we want to ask have evolved. Having any info is still better than no info, but often we need info on variability across individuals within a species or we want to know how the trait might vary with changes in the environment. For this, we need record-level data. By this, I mean that instead of pooling observations to obtain an average for a species, we now often want to know that the average litter size for a species at location X is 3 but 8 at location Y. For some species, traits are especially sensitive to temperature or some other environmental variable – so knowing if the body size was measured at 28C or 32C can be important. This data could then be summarized in whatever way the user needed (species-averages, region-specific averages, etc). This, of course, is the hard part, because while we have an increasing number of trait compilations, they have either jettisoned the record information, or little of the record info is associated with the datapoint except maybe the citation name (I say this knowing I’m guilty of this). It also involves doing some form of georeferencing if we want the location info to be useable (like they’ve been doing for museum records). This means we would need to basically uncompile the compilations – find the original citations, extract as much info as we can from them, and then re-enter them as part of a more sophisticated database. This is an extraordinary amount of work that (to be clear) I am not volunteering for.
There are undoubtedly some in the trait community who are about to explode because they’ve been thinking “but we’re doing what you are talking about!”. There are indeed already some bigger initiatives out there (AnAge, FishBase, TRY) but they are either not community-based (i.e. run by a closed group), taxon-centric, or a nightmare of open and closed policies that make extracting data needlessly burdensome, or some unfortunate combo of the above. The one that seems closest to the Paleobiology Database model is TraitBank at the Enyclopedia of Life. Its goal, however, is different from the record-based trait database that I outlined above. Its goal is to have a webpage (and trait data) for every species on the planet, so this still seems to be a species average approach. As I mentioned before, some info is better than no info, so this alone would be a huge benefit to trait research, but still carries the restrictions of species-average values. On the plus side, data in the database is available for everyone to use and each data entry has the specific reference listed with it. But I don’t think it’s had broad buy-in from the trait community. TraitBank only lists 50 data sources and 327 “content partners” (websites/databases that have agreed to share their data via Encyclopedia of Life pages). Admittedly, these sources are some of the biggest data aggregations around, but it’s inconceivable that they cover the wide array of trait info for all of life. Without broad buy-in from the trait community, both using it for research and contributing their data to it, I don’t see this working in the way I’ve outlined above.
So where does this leave us? Well, things are currently in a muddle with respect to trait data, but there’s also tremendous opportunity for someone who can envision the type of database the field needs, sell broad swaths of the trait data community on its importance, and figure out how to build both the database and the community to support and use it. This may involve better community buy-in with TraitBank and/or some new initiative working on a record-level product that would allow a finer-level of question to be asked. The question is how does this happen and is there enough will in the trait community to give up on the current idiosyncratic ad hoc approach and contribute to something with broad trait and taxonomic coverage with an open data policy?
Recently, over at the blog Ecological Rants the eminent ecologist Charles Krebs wrote a post about the ills of simplification in ecology. The post focuses specifically on how ecology has been ‘led astray’ by simplified models and lab studies. This has recently been picked up on Dynamic Ecology by Jeremy Fox who responded generally to the post but specifically to the affront to microcosms. I strongly recommend you check them out for yourself and not just rely on my version of events.
I went on record a long time ago (in blog years I think 2011 was a century ago) that I believe that we need a multitude of approaches, so I don’t plan on wading into the microcosm debate. That we’re still having this debate exhausts me. Instead, I want to focus on a different angle in Kreb’s post. Here’s the specific section:
“If we assume equilibrial dynamics in our communities and ecosystems, we fly in violation of almost all long term studies of populations, communities, and ecosystems. The problem lies in the space and time vision of our science. Our studies are too short to show even a good representation of dynamics over a 100 year time scale, and the problems of landscape ecology highlight that what we see in patch A may be greatly influenced by whether patches B and C are close by or not. We see this darkly in a few small studies but are compelled to believe that such landscape effects are unusual or atypical. This may in fact be the case, but we need much more work to see if it is rare or common. And the broader issue is what use do we as ecologists have for ecological predictions that cannot be tested without data for the next 100 years?”
I agree with a lot of this paragraph, though my perspective on it is different. I agree that our focus for much of the past 60 years in community ecology has been on equilibrial dynamics at a specific spatial scale with limited understanding on the impact context (i.e. what patches are near what other patches) can have on the local community. Does this make it difficult for us to predict what will happen in the dynamic world we actually live in? Yes. But unlike Krebs I don’t see the past few decades of research as a waste. We’ve learned a great deal about the fundamentals of ecological systems – species interactions, food web structure, biodiversity, niche partitioning, colonization, extinction, etc etc etc – all with the help of microcosms and simplified theory (and field studies and macroecology). We needed those decades of work to understand the basics of how communities are structured under idealized conditions.
Left: A child’s line drawing of SpongeBob’s Squidward. Right: Squidward.Does the drawing capture the essence of squidward? I’m biased, but I say yes. But how does a child get to being able to create a reasonable facsimile of something without first learning how pencils work, how they respond to hand movement, and how to simplify an image but still make it recognizable to others? I think this is also true with ecology. How do we know how to reasonably abstract a complicated system down to its most important components without first understanding what the components are and how to convey them in simple understandable ways?
Now, our challenge is to take what we have learned and apply it to the more complicated scenarios that are happening in nature (i.e. how does our Squidward change as he interacts with the dynamic setting of Bikini Bottom*). How do ecosystems change through time? What is the role of species interactions, context-dependence, and processes at different spatial and temporal scales in driving (or ameliorating) changes in food webs, niche partitioning, etc? These are pressing questions for our society as we try to predict how nature will respond to human perturbations, but these are also important for the basic development of our science. Some of this work will be done through detailed case studies out in the field, but some (hopefully) will be done with the help of theory, controlled experiments, and data-intensive approaches like macroecology to generate generalizations that help us know how to think and predict likely responses and scenarios.
The danger that I think Krebs is concerned about is that we become so attached to our clean, simplified view, our polished theories, that we refuse to engage with the more complicated scenarios. For example, if long-term studies suggest that the focus on equilibrial communities is misplaced, it would be to our detriment to continue to focus only on equilibrial communities in our theories and experiments. However, I don’t think this is happening (or if it was, I think momentum is shifting). Landscape ecology, metacommunity theory, biogeography, are all areas where people have been actively studying the very spatial issues Krebs bemoans us neglecting. I think he is more accurate about community ecology shying away from rigorously thinking about temporal dynamics, but I have a whole post on that planned, so I’ll spare you my rant. That we are starting to think about these more complex issues is what makes ecology exciting right now (and frustrating and really really hard). We have a grasp (tenuous, maybe, but a grasp nonetheless) on the fundamental, general concepts that bridge across ecosystems and organisms. We have more data, better tools, and better theoretical constructs than at any time in the past. Now is the time to tackle these more complex questions and to do so will require all the scientific approaches available to us – that includes field ecology, macroecology, theory, and, yes, microcosms.
*Yes I have been forced to watch too much SpongeBob lately.
*This is a guest post by Elita Baldridge.**
Not only is this about the tools that I used to complete my PhD, but I am optimistic that these tools/coping mechanisms will allow me to be a scientist that gets paid for doing science.
The tips & tricks:
Remote work: Working remotely accommodated the variability in my functioning levels, and allowed me to be as productive as possible without having to allocate most of my energy to getting to/working at a physical location or trying to conserve enough energy so that I could make it home on the bus (since I can’t drive anymore).
Ergonomics: Finding what triggered more discomfort and what allowed me to work for longer periods of time really helped make it possible for me to finish up.
- Laptop of Science & a desktop, running Synergy to run one mouse and keyboard for both computers.
- Monitor risers to prevent fatigue.
- Kneeling chair to avoid obnoxious pressure points on hips, back and arms.
- Wrist rests galore.
- Kinesis Freestyle 2 keyboards, one for desk work with the dual machine setup, one for a reclining setup with just the Laptop of Science.)
Travel: Travel is dreadful. It involves a lot of discomfort while traveling, plus a lot of discomfort for weeks after. The thing that I am traveling for had better provide enough benefits to me that it is actually worth it because it is truly, truly unpleasant (of the crying and vomiting from pain variety). Remote attendance is vastly preferred.
However, if I really, really must:
- Grabber 12+ hour Peel N’ Stick body warmers, which make it possible to function on a reasonably human level most of the time.
- Cane or forearms crutches
- Wheelchair service in airports/Redcap on trains. (Voice of Experience: When you are asked if you can get to places on your own, up stairs, etc., select “no” if the answer is “yes, but it will be exceptionally unpleasant and there may be crying, whimpering, or falling over”.)
- Rest day after travel/accommodations really close to wherever you are supposed to be.
- Electric blanket for hotel (as full body heat pad)\
- Small travel blanket (for padding uncomfortable chair backs, etc.)
Version control: Using version control (I used GitHub) allowed for a more efficient workflow between me & dissertation collaborators (mostly Ethan, but also Xiao Xiao), plus I was insulated against the effects of cognitive dysfunction through commit messages, issues, and the ability to revert commits.
Kubi: A teleconferencing robot that allowed me to turn my (remote) head and look at people when they were speaking through whatever teleconferencing system we were using. I cannot say enough good things about how much this made me feel more like a part of whatever was going on.
Web conferencing: We tended to use browser based options Google Hangouts or Firefox Hello for this, but Skype is another option as well, I just had some difficulty getting it to behave well on my laptop.
Live-streaming: For my defense, I wanted to make the presentation a demonstration of making a talk accessible, and also how easy this can be. Full details of the accommodations & accessibility statement that I used for my defense are available on the event announcement. I used Google Hangouts on Air to live-stream my defense, then close captioned the talk afterward with the editor available on YouTube. This was all straightforward and took very little time. Handouts were available in advance of the talk, and an accessibility statement was provided with my defense announcement.
For the last 5 years I’ve been actively involved in training efforts through Software Carpentry and Data Carpentry to train researchers in best practices for software development and data analysis. These are concepts that are fundamental to the research we do in my gropu and my commitment to open and reproducible research.
As one of the founding members of the Data Carpentry Steering Committee, I am excited to announce that Data Carpentry has received a grant from the Gordon and Betty Moore Foundation that will help support our work over the next two years.
For those of you who aren’t familiar with Data Carpentry, we are a non-profit organization whose goal is to help teach scientists the skills they need to manage and analyze the increasingly large amounts of data that are being generated across the sciences. We do this through a combination of 2 day workshops at universities (if you’re interested in a workshop at your university request one here), and online resources including lesson material and forums. Data Carpentry is both similar to, and associated with, Software Carpentry, but with an emphasis on teaching material that is specific to particular scientific disciplines and focused on data management and analysis. We currently deliver courses for ecology/organismal biology and are in the process of developing material on genomics and geospatial data. The later in collaboration with awesome training group at NEON.
The support from the Moore Foundation will help us expand our efforts to cover new scientific domains, run far more workshops than we could have otherwise, and develop strategies for delivering this material in online workshops. I will also be leading the development of a semester long Data Carpentry course designed to make it easy to integrate these crucial skills into university classrooms. Check out the full proposal for more details.
I look forward to continuing my work with Data Carpentry and am excited about the opportunity for us to continue to enable data-intensive science by providing scientists the computational and data-oriented training they need to work with the large quantities of data we now have access to.
As I mentioned in the first post, having a chronic illness means that there can a lot of problems hiding behind the cheerful facade of graduate students you know. Here I describe some of the external challenges faced by people with chronic illness and disabilities that can be a much greater stumbling block than the condition itself.
Because I had such strong support from my family and my lab, I didn’t have to worry about the majority of these challenges. I didn’t have to worry about finances, losing health insurance, not being able to afford medical care, having to be on campus, able to work a regular schedule. Thus, I just had to deal with the physical challenges, and I could invest most of my energy into my research.
Finances/health insurance: Being able to afford medical care as well as paying for all of the regular expenses of being a PhD students can be extremely difficult. Although many PhD programs offer subsidized insurance, the long process of diagnosis, combined with extensive medical tests, plus the expense of treatment can leave massive amounts of medical debt. These issues can be magnified if a leave of absence is necessary, which typically eliminates both salary and insurance. Additionally, during a leave of absence, any student loans come out of deferment, making a leave of absence extremely challenging to take, and a chronic illness is unlikely to improve with a leave of absence with the added financial stress. I have a husband, who has health insurance for the both of us, and we were able to move in with my parents, and this was a situation that has been actually really great for everyone. This completely removed financial/health insurance worries from the picture, and there is always someone around for me just in case.
Advisor support: A good advisor is important for all graduate students, but it becomes especially important when dealing with a chronic illness. There are many stories of graduate students who have struggled way more than neccessary due to unsupportive advisors, or unsupportive university structures. (e.g., 1, 2, 3). The Weecology lab is awesome, and just made accommodations happen. In a later post, I’ll talk about some of the technological tools that made completing my PhD possible, but my favorite was the Kubi (https://revolverobotics.com/), which let me turn my own “head” when I was remoting in. Ethan also was my advocate at the university level, to make things happen so that I could work remotely to finish up.
Working: Chronic illness can result in many factors that make getting work done challenging, from the difficulty of getting to work, due to an inability to drive, chronic fatigue, medication effects, cognitive dysfunction, or being worn down by pain and discomfort. Even the physical environment of work can be challenging; for example, fluorescent lighting that triggers migraines. Getting to campus was taxing enough that by the time I got there, I was too worn out to get much done, and would have to leave early before I broke down. I also had to surrender my driver’s license, so I relied on public transportation (which caused a significant amount of “discomfort”) or I needed to get a ride from my husband (which caused slightly less “discomfort”). In addition, I have flares, so I’m good to work some days, but then am unable to function either physically or mentally some days. University campuses also tend to have some major accessibility problems. I was fortunate in that my office and the lab that I taught in were on the first floor, but when I returned for my defense I had to get to the second floor of a building with a broken elevator. Working remotely allowed me to set up my workspace so that I could work longer, and didn’t have to use up all my available energy going to campus to work.
Medical: Many chronic conditions require frequent medical care, through monitoring or trying out treatment options. My condition is chronic, but not progressive, and so I didn’t actually need a lot of medical care after I got my diagnosis. We’d figured out a treatment that worked before then, and I had a great doctor (USU Student Health Center is awesome, particularly Dr. Price). Thus, I didn’t need to spend a lot of time at doctor’s appointments, which for me would end up writing off an entire day, and then generally take the next day to recover from the appointment. Before my diagnosis, I was spending a lot of time at the doctor, experimenting with new medications, to see if they would have an effect, being poked more times than a pincushion, and waiting to see specialists.
Medication: Medications for chronic illness can have a lot of unpleasant side effects, including nausea, headaches, appetite loss, insomnia, fatigue, immune suppression, etc. Some of us end of taking medication to treat the side effects of the medication we are on to treat our condition. Most of the treatment options for fibromyalgia work by altering levels of neurotransmitters, and thus tend to be difficult to adjust to, have a long adjustment period, and make working while adjusting impossible. In addition, the beneficial effects generally wear off over time, leaving just the negatives, which happened with the first medication I was on. Because I didn’t have months to spend trying to find another medication that might work, and I was getting work done without medication, I saved that until after I was done with my PhD. I’ve spent the last two months dealing with adjusting to a fibromyalgia treatment I haven’t tried before, am minimally functional, and it seems as if this treatment option is worse than the fibromyalgia on its own. Because fibromyalgia is variable, I still have another two months to go to make sure that the medication is not effective, or whether I could be in a flare instead.
This is a guest post by Elita Baldridge.
I am working on organizing an Inclusive Ecology Section within the Ecological Society of America. This section will provide resources and support for all ecologists, regardless of race, sex, physical or mental ability or difference, gender identity or expression, sexual orientation, ethnicity, socio-economic status, culture or subculture, national origin, parental status, politics, religion, or age. The Inclusive Ecology section will be a space for ecologists to learn about issues and problems facing ecologists from under-represented groups, share support and solutions about problems facing ecologists from under-represented groups, work to make ecology a safe and inclusive space for all ecologists, and recognize those who have made significant contributions toward working to make ecology more inclusive to all ecologists.
The first stage in creating a new section is to collect 50 ESA member signatures on petition. If you’d like to support the idea of this section please sign the petition here:
To provide resources and support for all ecologists, regardless of race, sex, physical or mental ability or difference, gender identity or expression, sexual orientation, ethnicity, socio-economic status, culture or subculture, national origin, parental status, politics, religion, or age.
Share ideas on the development of proposed ESA Inclusive Ecology section here:
This is a guest post by Elita Baldridge
Most people aren’t familiar with the challenges of working on a PhD with a disability or chronic illness, and yet there’s a good chance that someone you know is in this situation and isn’t talking about it. This is the first in a series of posts about my experiences completing a PhD with a chronic illness, and about the things that we can do to support our colleagues and students so that they can have the greatest chance at success. Even with the best support in the world, it’s not always possible, but it’s a lot less likely without support.
Introduction to the social model of disability
To give you a little background, I developed a chronic illness during graduate school, eventually being diagnosed with fibromyalgia. Developing a chronic illness gave me a crash course in the social model of disability; here’s the general drift.
A lack of support and accommodations is a major factor in people being unable to function effectively.
The biggest thing that keeps me from functioning effectively at this point (with accommodations) is my chronic illness itself. However, that’s the most disabling factor for me because I had the accommodations available to reduce the impact of my condition as much as possible.
This is the sanitized, short version of some of my symptoms. I don’t particularly like to share these sorts of things because I want to be seen as a good ecologist, not as an inspiring story of an ecologist that has triumphed over terrible odds. An ecologist is a person, a story is a thing. However, I think that it’s important to understand that while your colleague may be cheerful and smiling and upbeat, they may also be hiding a lot, and being kind and providing accommodations is a small thing that can mean the world under the circumstances.
Cognitive dysfunction: This manifests itself in many ways, but when this is severe, I can’t actually think well enough to read anything more complicated than fairy tales, let alone think well enough to do research.
“Discomfort”: Pain is supposed to be a meaningful signal that something is wrong with the part of the body that hurts. However, with fibromyalgia, things hurt without damage occurring. Pain from fibromyalgia tends to be unresponsive to a wide variety of medications, and one of the best ways of managing the pain is through an exercise regime and visualization. I tend to use the word “discomfort” rather than “pain”, because pain is supposed to be a useful signal of damage, fibromyalgia pain is not useful, and calling it discomfort helps me to try ignore it more effectively.
The clothing thing: The majority of clothing does not work for me any more, because of the feeling of wearing an upset anthill.
The stinging nettle thing: Doing computer work while hands felt like I had been crushing stinging nettles with them.
The bucket thing: Keeping a bucket by my desk while I working, because I was throwing up because of discomfort. Also, avoiding eating before meetings on bad days, so I could make it through the meeting without throwing up into the bucket.
Mobility impairment: This depends on the day. Not a problem at a computer though, so that’s fine, unless on site and the mobility impairment access is garbage (i.e., mostly everywhere).
With sufficient support and accommodations, it is possible to do good science and get a PhD while living with a disability or chronic illness. Dealing with the illness itself is difficult enough, without also having to address barriers that are put in place by people and institutions not working to make things accessible. It’s not that difficult to make things more accessible, and making things more accessible for folks with chronic illness or disabilities also tends to be just good design that makes things more accessible to people without chronic illness or disabilities. Over the next couple of posts I’ll talk about things my lab and university have done to make things more accessible and therefore facilitate my PhD. I’ll also talk about practices that I’m now putting in place for things like seminars I give to make sure that they can reach as many scientists as possible, not just the able-bodied ones.
A few months ago Mick Watson wrote an awesome post about How to recruit a good bioinformatician. We’re in the process of hiring a scientific software engineer so I thought I’d use Mick’s post to illustrate why you should come work with us doing scientific software development and data-intensive research, and hopefully provide a concrete demonstration of the sort of things Mick suggests for appealing to talented computational folks.
Here are Mick’s original suggestions and why I think our position satisfies them.
1. Make sure they have something interesting to do
This is vital. Do you have a really cool research project? Do you have ideas, testable hypotheses, potential new discoveries? Is bioinformatics key to this process and do you recognise that only by integrating bioinformatics into your group will it be possible to realise your scientific vision, to answer those amazing questions?
Software and computational data analysis are core to everything our group does. Just check out our GitHub organization. We’re currently tackling challenging problems in: 1) automatically acquiring and combining heterogenous data; 2) combining large numbers of datasets into single research projects; 3) using machine learning and other computationally intensive modeling approaches to make predictions for ecological systems; and 4) trying to help improve computational and predictive approaches in science more broadly.
2. Make sure they have a good environment to work in
Bioinformatics is unique, I think, in that you can start the day not knowing how to do something, and by the end of the day, be able to do that thing competently. Most bioinformaticians are collaborative and open and willing to help one another. This is fantastic. So a new bioinformatician will want to know: what other bioinformatics groups are around? Is there a journal club? Is there a monthly regional bioinformatics meeting? Are there peers I can talk to, to gain and give help and support?
Or will I be alone in the basement with the servers?
Many members of my group have strong computational and/or machine learning backgrounds (at least for a bunch of scientists). We are also part of the new Informatics Institute at the University of Florida, which is being funded in part through UF’s “Big Data” preeminence initiative. The Informatics Institute brings together faculty, students, and postdocs from across campus with interests in computational science, and the preeminence initiative is recruiting mid-career folks in this area to move to UF (including myself). I’m also a Moore Investigator in Data Driven Discovery and actively involved in the Software Carpentry and Data Carpentry communities, providing strong connections to researchers and developers in all three of these groups. In short, the challenge won’t be finding people to interact with, it will be finding time to interact with all of the different folks you want to talk to.
Speaking of servers, the other type of environment bioinformaticians need is access to good compute resources. Does your institution have HPC? Is there a cluster with enough grunt to get most tasks done? Is there a sys/admin who understands Linux?
Or were you hoping to give them the laptop your student just handed back after having used it during their 4 year PhD? The one with WIndows 2000 on it?
The University of Florida has a brand new high performance computing platform – the HiPerGator. My lab has priority access to a large number of cores on this system. We also have experience working with, and resources to support, AWS and other cloud providers to address our resource needs.
4. Give them a development path
Bioinformaticians love opportunities to learn, both new technical skills and new scientific skills. They work best when they are embedded fully in the research process, are able to have input into study design, are involved throughout data generation and (of course) the data analysis. They want to be allowed to make the discoveries and write the papers. Is this going to be possible? Could you imagine, in your group, a bioinformatician writing a first author paper?
Technical and scientific development is strongly encouraged and supported for all members of our research group. You’ll have time and encouragement to learn new skills, support for travel to training/hackathons/conferences, and active engagement in both the scientific and software development aspects of the lab. Taking the lead on projects and writing first authored papers would be enthusiastically supported for anyone interested in doing so.
5. Pay them what they’re worth
This is perhaps the most controversial, but the laws of supply and demand are at play here. Whenever something is in short supply, the cost of that something goes up. Pay it. If you don’t, someone else will.
We’re doing our best. The position has a top starting salary of $70,000. Thanks to the low cost of living in Gainesville that’s equivalent to about $120,000 in Silicon Valley. We can’t compete with starting salaries for industry, but at least we’re on par with starting salaries for faculty.
6. Drop your standards
Especially true in academia. Does the job description/pay grade demand a PhD? You know what? I don’t have a PhD, and I’m doing OK (group leader for 11 years, over 60 publications, several million in grants won). Take a chance. A PhD isn’t everything
I don’t consider not requiring a PhD to be dropping my standards. We’re looking for the best person whether they have a PhD or not. If you’re good at computers and interested in science, I don’t know what more I could want.
7. Promote them
Got funds for an RA? Try and push it up to post-doc level and emphasize the possibility of being involved in research. Got funds for a post-doc? Try and push it up to a fellowship and offer semi-independence and a small research budget. Got money for a fellowship? Try and push it up to group leader level, and co-supervise a PhD student with them.
This position could have easily been budgeted as a postdoc, but I really wanted to promote the idea of more permanent software developer/engineer positions in academic science. This position is currently funded for 5 years as part of a Moore Foundation Investigator in Data Driven Discovery award. My goal is to make this a permanent position in my group by maintaining long-term funding beyond the 5 years. If I find someone good who wants to stick around I want the salary and responsibility to grow over time (and annual increases in salary are budgeted for the next 5 years).
So, hopefully I’ve done a decent job of satisfying Mick’s requirements. If any of this sounds interesting to you, feel free to leave a comment on this post, drop me an email, chat with me on Twitter, or just go ahead and apply.
My research group is hiring a Scientific Software Engineer to help develop software that facilitates science, contribute to research in data-intensive ecology, and improve scientific research and computing through training and modeling competitions.
We are actively involved in data-intensive computational research, open source software development, and open approaches to science. The engineer will work as part of a collaborative group, including undergraduates, graduate students and postdocs, using large amounts of ecological and environmental data to understand natural systems. They will develop and maintain open source software designed for working with large amounts of heterogeneous data, collaborate on research projects making predictions for ecological systems, and help develop web infrastructure for scientists to share, evaluate and improve predictions. In doing so they will actively interact with, and contribute to, related efforts from other initiatives and projects in these areas (e.g., rOpenSci, Dat, Software Carpentry, Data Carpentry, DataONE, NCEAS).
Are you a software developer who’s interested in science? Great! Are you a scientist with strong software skills? Awesome! If you have some experience with Python or R, Git, database management systems, web development, spatial data, and/or PostgreSQL/PostGIS, we’d be excited, but what we’re really interested in is someone who is good with computers, interested in science, enjoys working on a variety of projects, likes learning new tools as needed, and works well in a diverse team.
The University of Florida is a great place to work in the computational, data-intensive, and informatics side of science. They have a major hiring initiative in “big data”, a new Informatics Institute that we are a part of, and a top notch Research Computing Center (aka HPC). In addition, I am a Moore Foundation Investigator in Data-Driven Discovery and actively engaged in the computational and data-intensive science communities. This makes my lab a good place to work if you enjoy that sort of thing (checkout our GitHub organization if you want to see what we’ve been up to recently). We also work hard to provide a positive and supportive environment that treats all members of the group as important contributors and actively values diversity.
This position has guaranteed support for the next five years. My goal is for this to be a long-term position in our research group and a model for similar positions in other research groups.
If you’ve made it this far you might be interested in a few more details of the projects this position might be involved in. These include:
- Developing, maintaining, and providing support for open source software for acquiring, cleaning, combining, and managing large numbers of heterogeneous datasets. This will include Python based development and maintenance of the EcoData Retriever software and the development of new software to automatically combine multiple datasets together for analysis.
Working in collaborative teams to conduct scientific research including the use of machine learning for making predictions and forecasts for ecological systems.
Developing, maintaining, and providing support for an open source system for publicly sharing ecological predictions and forecasts and automatically evaluating those predictions as new data is released. This system will be designed to allow researchers to collaborate and compete to improve predictions by uploading predictions to be compared to test data and/or by uploading code to make predictions.
Engaging with the broader community of projects involved in acquiring, cleaning, and combining heterogeneous datasets (e.g., rOpenSci, DataONE, dat), as well as those training scientists in the use of data and computation (e.g., Software Carpentry, Data Carpentry). This includes contributing to open source and participating in related conferences and hackathons.
To apply please visit the official University of Florida job ad. If you have any questions feel free to leave a comment on this post, drop me an email, chat with me on Twitter, or check this blog later in the week to find out why I think this will be a pretty rewarding job. You can also check out our websites to find out more about my lab and our interdisciplinary research group.
UPDATE: Here’s the post I promised on why this will hopefully be a rewarding job.