Category Archives: programming
Postdoc in Evolutionary Bioinformatics [Jobs]
There is an exciting postdoc opportunity for folks interested in quantitative approaches to studying evolution in Michael Gilchrist’s lab at the University of Tennessee. I knew Mike when we were both in New Mexico. He’s really sharp, a nice guy, and a very patient teacher. He taught me all about likelihood and numerical maximization and opened my mind to a whole new way of modeling biological systems. This will definitely be a great postdoc for the right person, especially since NIMBioS is at UTK as well. Here’s the ad:
Outstanding, motivated candidates are being sought for a post-doctoral position in the Gilchrist lab in the Department of Ecology & Evolutionary Biology at the University of Tennessee, Knoxville. The successful candidate will be supported by a three year NSF grant whose goal is to develop, integrate and test mathematical models of protein translation and sequence evolution using available genomic sequence and expression level datasets. Publications directly related to this work include Gilchrist. M.A. 2007, Molec. Bio. & Evol. (http://www.tinyurl/shahgilchrist11) and Shah, P. and M.A. Gilchrist 2011, PNAS (http://www.tinyurl/gilchrist07a).
The emphasis of the laboratory is focused on using biologically motivated models to analyze complex, heterogeneous datasets to answer biologically motivated questions. The research associated with this position draws upon a wide range of scientific disciplines including: cellular biology, evolutionary theory, statistical physics, protein folding, differential equations, and probability. Consequently, the ideal candidate would have a Ph.D. in either biology, mathematics, physics, computer science, engineering, or statistics with a background and interest in at least one of the other areas.
The researcher will collaborate closely with the PIs (Drs. Michael Gilchrist and Russell Zaretzki) on this project but potentiall have time to collaborate on other research projects with the PIs. In addition, the researcher will have opportunties to interact with other faculty members in the Division of Biology as well as researchers at the National Institute for Mathematical and Biological Synthesis (http://www.nimbios.org).
Review of applications begins immediately and will continue until the position is filled. To apply, please submit curriculum vitae including three references, a brief statement of research background and interests, and 1-3 relevant manuscripts to mikeg[at]utk[dot]edu.
Why computer labs should never be controlled by individual colleges/departments
Some time ago in academia we realized that it didn’t make sense for individual scientists or even entire departments to maintain their own high performance computing resources. Use of these resources by an individual is intensive, but sporadic, and maintenance of the resources is expensive [1] so the universities soon realized they were better off having centralized high performance computing centers so that computing resources were available when needed and the averaging effects of having large numbers of individuals using the same computers meant that the machines didn’t spend much time sitting idle. This was obviously a smart decision.
So, why haven’t universities been smart enough to centralize an even more valuable computational resource, their computer labs?
As any student of Software Carpentry will tell you, it is far more important to be able to program well than it is to have access to a really large high performance computing center. This means that the most important computational resource a university has is the classes that teach their students how to program, and the computer labs on which they rely.
At my university [2] all of the computer labs on campus are controlled by either individual departments or individual colleges. This means that if you want to teach a class in one of them you can’t request it as a room through the normal scheduling process, you have to ask the cognizant university fiefdom for permission. This wouldn’t be a huge issue, except that in my experience the answer is typically a resounding no. And it’s not a “no, where really sorry but the classroom is booked solid with our own classes,” it’s “no, that computer lab is ours, good luck” [3].
And this means that we end up wasting a lot of expensive university resources. For example, last year I taught in a computer lab “owned” by another college [4]. I taught in the second class slot of a four slot afternoon. In the slot before my class there was a class that used the room about four times during the semester (out of 48 class periods). There were no classes in the other two afternoon slots [5]. That means that classes were being taught in the lab only 27% of the time or 2% of the time if I hadn’t been granted an exception to use the lab [6].
Since computing skills are increasingly critical to many areas of science (and everything else for that matter) this territoriality with respect to computer labs means that they proliferate across campus. The departments/colleges of Computer Science, Engineering, Social Sciences, Natural Resources and Biology [7] all end up creating and maintaining their own computer labs, and those labs end up sitting empty (or being used by students to send email) most of the time. This is horrifyingly inefficient in an era where funds for higher education are increasingly hard to come by and where technology turns over at an ever increasing rate. Which [8] brings me to the title of this post. The solution to this problem is for universities to stop allowing computer labs to be controlled by individual colleges/departments in exactly the same way that most classrooms are not controlled by colleges/departments. Most universities have a central unit that schedules classrooms and classes are fit into the available spaces. There is of course a highly justified bias to putting classes in the buildings of the cognizant department, but large classes in particular may very well not be in the department’s building. It works this way because if it didn’t then the university would be wasting huge amounts of space having one or more lecture halls in every department, even if they were only needed a few hours a week. The same issue applies to computer labs, only they are also packed full of expensive electronics. So please universities, for the love of all that is good and right and simply fiscally sound in the world, start treating computer labs like what they are: really valuable and expensive classrooms.
—————————————————-
[1] Think of a single scientist who keeps 10 expensive computers, only uses them a total of 1-2 months per year, but when he does the 10 computers aren’t really enough so he has to wait a long time to finish the analysis.
[2] And I think the point I’m about to make is generally true; at least it has been at several other universities I’ve worked over the years.
[3] Or in some cases something more like “Frak you. You fraking biologists have no fraking right to teach anyone a fraking thing about fraking computers.” Needless to say, the individual in question wasn’t actually saying frak, but this is a family blog.
[4] As a result of a personal favor done for one administrator by another administrator.
[5] I know because I took advantage of this to hold my office hours in the computer lab following class.
[6] To be fair it should be noted that this and other computer labs are often used by students for doing homework (along with other less educationally oriented activities) when classes are not using the rooms, but in this case the classroom was a small part of a much larger lab and since I never witnessed the non-classroom portion of the lab being filled to capacity, the argument stands.
[7] etc., etc., etc.
[8] finally…
A GitHub of Science? [Things you should read]
There is an excellent post on open science, prestige economies, and the social web over at Marciovm’s posterous*. For those of you who aren’t insanely nerdy** GitHub is… well… let’s just call it a very impressive collaborative tool for developing and sharing software***. But don’t worry, you don’t need to spend your days tied to a computer or have any interest in writing your own software to enjoy gems like:
Evangelists for Open Science should focus on promoting new, post-publication prestige metrics that will properly incentivize scientists to focus on the utility of their work, which will allow them to start worrying less about publishing in the right journals.
Thanks to Carl Boettiger for pointing me to the post. It’s definitely worth reading in its entirety.
_______________________________________________________
*A blog I’d never heard of before, but I subscribed to it’s RSS feed before I’d even finished the entire post.
**As far as biologists go. And, yes, when I say “insanely nerdy” I do mean it as a complement.
***For those interested in slightly more detail it’s a social application wrapped around the popular distributed version control system named Git. Kind of like Sourceforge on steroids.
Postdocs Galore [Jobs]
Advertisements for three exciting postdoctoral positions came out in the last week.
Interface between ecology, evolution and mathematics
The first is with Hélène Morlon’s group in Paris. Hélène and I were postdocs in Jessica Green’s lab at the same time. She is both very smart and extremely nice, oh, and did I mention, her lab is in PARIS. Here’s the ad. If it’s a good fit then you couldn’t go wrong with this postdoc.
A postdoctoral position is available in my new lab at the Ecole Polytechnique and/or at the Museum of Natural History in Paris to work at the interface between ecology-evolution and mathematics. Candidates with a background in biology and a strong interest in modeling, or with a theoretical background and a strong interest in biology, are encouraged to apply. More information is available here. Potential candidates should feel free to contact me. The deadline for application is May 8th.
The other two postdocs are associated with Tim Keitt’s lab (which I consider to be one of the top quantitative ecology groups out there).
Mechanistic niche modeling and climate change impacts
A postdoctoral position is anticipated as part of a collaborative project to develop and evaluate mechanistic niche models that incorporate geographic variation in physiological traits. The post doc will be based in Michael Angilletta’s laboratory at Arizona State University, but will interact with members of Lauren Buckley’s lab at the University of North Carolina in Chapel Hill and Tim Keitt’s lab at the University of Texas in Austin. The post doc will be expected to engage in modeling activities and coordinate lab studies of thermal physiology. Experience with mathematical modeling in C++, MATLAB, Python or R is beneficial and familiarity with environmental data and biophysical ecology is beneficial. More here.
Ecological forecasting or statistical landscape genetics
The Keitt Lab at the University of Texas at Austin seeks a postdoctoral investigator to join an interdisciplinary NSF-funded project linking ecophysiology, genomics and climate change. The position requires excellent modeling skills and the ability to engage in multidisciplinary research. Research areas of interest include either ecological forecasting or statistical landscape genomics. More here.
So, if you’re looking for a job go check out these great opportunities.
Learning to program like a professional using Software Carpentry
An increasingly large number of folks doing research in ecology and other biological disciplines spend a substantial portion of their time writing computer programs to analyze data and simulate the outcomes of biological models. However, most ecologists have little formal training in software development¹. A recent survey suggests that we are not only; with 96% of scientists reporting that they are mostly self-taught when it comes to writing code. This makes sense because there are only so many hours in the day, and scientists are typically more interested in answering important questions in their field than in sitting through a bachelors degree worth of computer science classes. But, it also means that we spend longer than necessary writing our software, it contains more bugs, and it is less useful to other scientists than it could be².
Software Carpentry to the Rescue
Fortunately you don’t need to go back college and get another degree to substantially improve your knowledge and abilities when it comes to scientific programming, because with a few weeks of hard work Software Carpentry will whip you into shape. Software Carpentry was started back in 1997 to teach scientists “the concepts, skills, and tools they need to use and build software more productively” and it does a great job. The newest version of the course is composed of a combination of video lectures and exercises, and provides quick and to the point information on such critical things as:
along with lots of treatment of best practices for writing code that is clear and easy to read both for other people and for yourself a year from now when you sit down and try to figure out exactly what you did³.
The great thing about Software Carpentry is that it skips over all of the theory and detail that you’d get when taking the relevant courses in computer science and gets straight to crux - how to use the available tools most effectively to conduct scientific research. This means that in about 40 hours of lecture and 100-200 hours of practice you can be a much, much, better programmer who rights code more quickly, with fewer bugs, that be easily reused. I think of it as boot camp for scientific software development. You won’t be an expert marksman or a black belt in Jiu-Jitsu when you’re finished, but you will know how to fire a gun and throw a punch.
I can say without hesitation that taking this course is one of the most important things I’ve done in terms of tool development in my entire scientific career. If you are going to write more than 100 lines of code per year for your research then you need to either take this course or find someone to offer something equivalent at your university. Watch the lectures, do the exercises, and it will save you time and energy on programming; giving you more of both to dedicate to asking and answering important scientific questions.
______________________________________________________
¹I took 3 computer science courses in college and I get the impression that that is about 2-3 more courses than most ecologists have taken.
²I don’t know of any data on this, but my impression is that over 90% of code written by ecologists is written by a single individual and never read or used by anyone else. This is in part because we have no culture of writing code in such a way that other people can understand what we’ve done and therefore modify it for their own use.
³I know that I’ve decided that it was easier to “just start from scratch” rather than reusing my own code on more than one occasion. That won’t be happening to me again thanks to Software Carpentry
RStudio [Things you should use]
If you use R (and it seems like everybody does these days) then you should check out RStudio – an easy to install, cross-platform IDE for R. Basically it’s a seamless integration of all of the aspects of R (including scripts, the console, figures, help, etc.) into a single easy to use package. For those of you are familiar with Matlab, it’s a very similar interface. It’s not a full blown IDE yet (no debugger; no lint) but what this actually means is that it’s simple and easy to use. If you use R I can’t imagine that you won’t love this new (and open source!) tool.
UPDATE: Check out another nice article on RStudio over at i’m a chordata! urochordata!
Beta Release of Database Toolkit
The Ecological Database Toolkit
Large amounts of ecological and environmental data are becoming increasingly available due to initiatives sponsoring the collection of large-scale data and efforts to increase the publication of already collected datasets. As a result, ecology is entering an era where progress will be increasingly limited by the speed at which we can organize and analyze data. To help improve ecologists’ ability to quickly access and analyze data we have been developing software that designs database structures for ecological datasets and then downloads the data, processes it, and installs it into several major database management systems (at the moment we support Microsoft Access, MySQL, PostgreSQL, and SQLite). The database toolkit system can substantially reduce hurdles to scientists using new databases, and save time and reduce import errors for more experienced users.
The database toolkit can download and install small datasets in seconds and large datasets in minutes. Imagine being able to download and import the newest version of the Breeding Bird Survey of North America (a database with 4 major tables and over 5 million records in the main table) in less than five minutes. Instead of spending an afternoon setting up the newest version of the dataset and checking your import for errors you could spend that afternoon working on your research. This is possible right now and we are working on making this possible for as many major public/semi-public ecological databases as possible. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days. We hope that this will make it much more likely that scientists will use multiple datasets in their analyses; allowing them to gain more rapid insight into the generality of the pattern/process they are studying.
We need your help
We have done quite a bit of testing on this system including building in automated tests based on manual imports of most of the currently available databases, but there are always bugs and imperfections in code that cannot be identified until the software is used in real world situations. That’s why we’re looking for folks to come try out the Database Toolkit and let us know what works and what doesn’t, what they’d like to see added or taken away, and if/when the system fails to work properly. So if you’ve got a few minutes to have half a dozen ecological databases automatically installed on your computer for you stop by the Database Toolkit page at EcologicalData.org, give it a try, and let us know what you think.
Postdoctoral position in macroecology, quantitative ecology, and ecoinformatics
We have a postdoc position available for someone interested in the general areas of macroecology, quantitative ecology, and ecoinformatics. Here’s the short ad with links to the full job description:
Ethan White’s lab at Utah State University is looking for a postdoc to collaborate on research studying approaches for unifying macroecological patterns (e.g., species abundance distributions and species-area relationships) and predicting variation in these patterns using ecological and environmental variables. The project aims to 1) evaluate the performance of models that link ecological patterns by using broad scale data on at least three major taxonomic groups (birds, plants, and mammals); and 2) combine models with ecological and environmental factors to explain continental scale variation in community structure. Models to be explored include maximum entropy models, neutral models, fractal based models, and statistical models. The postdoc will also be involved in an ecoinformatics initiative developing tools to facilitate the use of existing ecological data. There will be ample opportunity for independent and collaborative research in related areas of macroecology, community ecology, theoretical ecology, and ecoinformatics. The postdoc will benefit from interactions with researchers in Dr. White’s lab, the Weecology Interdisciplinary Research Group, and with Dr. John Harte’s lab at the University of California Berkeley. Applicants from a variety of backgrounds including ecology, mathematics, statistics, physics and computer science are encouraged to apply. The position is available for 1 year with the possibility for renewal depending on performance, and could begin as early as September 2010 and no later than May 2011. Applications will begin to be considered starting on September 1, 2010. Go to the USU job page to see the full advertisement and to apply.
If you’re interested in the position and are planning to be at ESA please leave a comment or drop me an email (ethan.white@usu.edu) and we can try to set up a time to talk while we’re in Pittsburgh. Questions about the position and expressions of interest are also welcome.
UPDATE: This position has been filled.
Blogrolling Scientific Programming Blogs
I’ve recently started reading two scientific programming blogs that I think are well worth paying attention to, so I’m blogrolling them and offering a brief introduction here.
Serendipity is Steve Easterbrook’s blog about the interface between software engineering and climate science. Steve has a realistic and balanced viewpoint regarding the reality of programming in scientific disciplines. The blog is well written, insightful, etc., but I think the thing that really won me over were his sharp witted responses to the periodically asinine comments he receives. For example:
I’d care a lot less about seeing all the source and data if I could just ignore climate scientists and shop elsewhere. But since I’m expected to hand over $$$ and change my lifestyle because of this research, your arguments ring hollow…
[You can shop elsewhere - there are thousands of climate scientists across the world. If you don't like the CRU folks, go to any one of a large number of climate science labs elsewhere (start here: http://www.realclimate.org/index.php/data-sources/). An analogy: Imagine your doctor told you that you have to change your eating habits, or your heart is unlikely to last out the year. You would go and get a second opinion from another doctor. And maybe a third. But when every qualified doctor tells you the same thing, do you finally accept their advice, or do you go around claiming that all doctors are corrupt? - Steve]
Software Carpentry is the sister blog to an excellent online (and occasionally in person) course on basic software development for scientists. I strongly recommend the course to anyone who is interested in getting more serious about their programming and the blog is a nice complement pointing readers to other resources and discussions related to scientific programming.
