Last week Zack Brym and I formally announced a semester long Data Carpentry course that we’ve have been building over the last year. One of the things I’m most excited about in this effort is our attempt to support collaborative lesson development for university/college coursework.
I’ve experience first hand the potential for this sort of collaborative lesson development though the development of workshop lessons in Software Carpentry and Data Carpentry. Many of the workshop lessons developed by these two organizations now have 100+ contributors. As far as I’m aware, Software Carpentry was the first demonstration that large-scale open collaboration on lessons could work (but I’d love to hear of earlier examples if folks are aware of them) and it has resulted in what is widely regarded as really high quality lesson material. Having seen this work so effectively for workshops, I’m interested in seeing how well it can work for full length courses.
Most college and university courses that I’m aware of start in one of three ways: 1) someone sites down and develops a course completely from scratch; 2) the course directly follows a text book; or 3) a new professor inherits a course from the person who taught it previously and adapts it.
Developing a course from scratch, even one following a text book fairly closely, is a huge time commitment. In contrast, with collaboratively developed courses new faculty, or faculty teaching new courses, wouldn’t need to start from scratch. They would be able to pick up an existing course to adapt and improve. I can’t even begin to describe how much easier this would have made my first few years as a faculty member. More generally, if we are teaching similar courses across dozens or hundreds of universities, it is much more efficient to share the effort of building and improving those courses than to have each person who teaches them do so independently.
In addition to the time and energy, there are often a lot of things that don’t work well the first time you teach a course and it typically takes a few rounds of teaching it to figure what works best. One of the challenges of developing lessons in isolation is that you only teach a class every 1 or 2 years. This makes it hard and slow to figure out what needs work. In contrast, a collaboratively developed course might be taught dozens or hundreds of times each year, allowing the course to be improved much more rapidly through large scale sampling and discussion of what works and what doesn’t. In addition to having more information, the fact that faculty are spending less time developing courses from scratch should leave them with more time for improving the materials. In combination this results in the potential for higher quality courses across institutions.
By involving large numbers of lesson developers, collaborative development also has the potential to help make courses more accurate, more up-to-date, and more approachable by novices. More lesson developers means a greater chance of having an expert on any particular topic involved, thus making the material more accurate and reducing the amount of bad practice/knowledge that gets taught. New faculty with more recent training on the development team can help keep both the material and the pedagogical practices up-to-date (this is hard when the same person teaches the same course for 20 years). More lesson developers also increases the likelihood that someone who isn’t an expert in any given piece of material is also involved, which should help make sure that the lesson avoids issues with expert blindness, thus making the material more accessible to students.
Collaborative college/university lesson development will not be without challenges. The skills required for collaborative lesson development in the style of Software and Data Carpentry require proficiency with computational approaches not familiar to many academics. The necessary skills include things like version control, developing materials in markdown, and working with static site generators like Jekyll. This means this approach is currently most accessible for those with some computational training and may initially work best for computing focused courses. In addition, organizing open collaborations takes time and energy, as does collectively deciding on how to design and update classes. Universities and colleges are not typically good at valuing time invested in non-traditional efforts and that would need to change to help support those managing development of courses with large numbers of faculty involved. More substantial may be the fact that faculty are not used to collaborating with other people on course development and are therefore not used to compromising and negotiating what should go into a course. This can be compensated for to some degree by making courses easy to modify and customize, as we’ve tried to do with the Data Carpentry Semester course, but ultimately there will still need to be a shift from prioritizing the personal desires of the faculty member to the best interests of the course more broadly. This approach will likely work best where there are a number of places that all want to teach the same general material.
Is it time? When I built my first version of Programming for Biologists back in 2010 I was really excited about the potential for collaborative open course development. I built the course using Drupal, emailed a bunch of my friends who were teaching similar courses and said “Hey, we should work together on this stuff”, and stuck some welcoming language on the homepage. Nothing happened. A few years later I was on sabbatical at the University of North Carolina and got the opportunity to talk a fair bit with Elliot Hauser who was part of a team trying to encourage this through a start-up called Coursefork. I was somewhat skeptical that this approach would work broadly at the time, but I thought it was really awesome that they were trying. They ended up pivoting to focus on helping computing education through a somewhat different route and became trinket. A couple of years I converted my course to Jekyll on GitHub and told a lot of people about it. There was much excited. Still nothing happened. So why might this work now? I think there are three things that increase the possibility of this becoming a bigger deal going forward. First, open source software development is becoming more frequent in academia. It still isn’t rewarded anywhere close to sufficiently, but the ethos of using and contributing to collaboratively developed tools is growing. Second, the technical tools that make this kind of collaboration easier are becoming more widely used and easier to learn through training efforts like Software Carpentry. Third, more and more people are actively developing university courses using these tools and making them available under open source licenses. Two of my favorites are Jenny Bryan’s Stat 545 and Karl Broman’s Tools for Reproducible Research. Our development of the Data Carpentry semester course has already benefited from using openly available materials like these and feedback from members of the computational teaching community. I guess we’ll see what happens next.
This post benefited from a number of comments and suggestions by Zack Brym, who has also played a central and absolutely essential role in the development of the Data Carpentry semester long course. The post also benefited from several conversations with Tracy Teal, the Executive Director of Data Carpentry about the potential value of these approaches for college courses
Over the last year and a half we have been actively developing a semester-long Data Carpentry course designed to be easily customized and integrated into existing graduate and undergraduate curricula.
Data Carpentry for Biologists contains course materials for teaching scientists how to work more effectively with data. The course provides introductions to data management and relational databases, data manipulation and analysis, and data visualization. It covers the same general types of material as a two-day Data Carpentry workshop, but expands the materials and opportunities for practice into a full-length university course. The teaching material uses R and SQLite, with some corresponding materials for Python as well. To help students understand the direct applications to their interests, the examples and exercises focus on biological questions and working with real data. The course emphasizes using best practices to produce reusable and reproducible data analysis.
Active-learning Teaching Materials
Learning computing requires active practice by working through programming problems. Just diving in to computing is challenging for most scientists, so the course instruction is designed to combine short live-coding introductions to concepts followed immediately by the students working on a related exercise. Additional exercises are assigned later for practice. This follows the “I do”, “We do”, “You do” approach to teaching, which leverages the benefits of active-learning and flipped classrooms without leaving students who are less comfortable with the material feeling lost. The bulk of class time is spent working on assigned exercises with the instructor moving around the room helping guide students through things they don’t understand and engaging with students who are thinking about advanced applications of what they’ve learned.
This approach is the result of lots of reading about effective teaching methods and Ethan’s experience teaching this and related courses over the last six years at Utah State University and the University of Florida. It seems to work well for both students that get the material easily and those that find it more challenging. We’ve also tried to make these materials as useful as possible for self-guided students.
Open course development
Software Carpentry and Data Carpentry have shown how powerful collaborative lesson development can be and we’re interested in bringing that to the university classroom. We have designed the course materials to be modular and easy to modify, and the course website easy to clone and set up. All of the teaching materials and associated website files are openly available at the Data Carpentry for Biologists repository on GitHub under CC-BY and MIT licenses. The course materials are all written in Markdown and everything runs on Jekyll through GitHub Pages. Making your own version of the course should take less than an hour. We’ve developed documentation for how to create your own version of the course and how to contribute to development. Exercises and assignments are modular and changing exercises and assignments simply involves reordering items in a list. Adding a new exercise involves creating a new Markdown file and then adding its title to the list of exercises for an assignment.
If you teach, or want to teach, a course like this, we’d love to get you involved. Here are some useful links for getting started.
We want to be sure getting involved is as easy as possible. We’ve worked hard to provide documentation and help resources for students and instructors. Students can find all they need to know at our student start guide. Instructors have access to course content and site design documentation.
Development of this course was generously support by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.
I’m looking for one or more graduate students to join my group next fall. In addition to the official add (below) I’d like to add a few extra thoughts. As Morgan Ernest noted in her recent ad, we have a relatively unique setup at Weecology in that we interact actively with members of the Ernest Lab. We share space, have joint lab meetings, and generally maintain a very close intellectual relationship. We do this with the goal of breaking down the barriers between the quantitative side of ecology and the field/lab side of ecology. Our goal is to train scientists who span these barriers in a way that allows them to tackle interesting and important questions.
I also believe it’s important to train students for multiple potential career paths. Members of my lab have gone on to faculty positions, postdocs, and jobs in both science non-profits and the software industry.
Scientists in my group regularly both write papers (e.g., these recent papers from dissertation chapters: Locey & White 2013, Xiao et al. 2014) and develop or contribute to software (e.g., EcoData Retriever, ecoretriever, rpartitions & pypartitions) even if they’ve never coded before they joined my lab.
My group generally works on problems at the population, community, and ecosystem levels of ecology. You can find out more about what we’ve been up to by checking out our website. If you’re interested in learning more about where the lab is headed I recommend reading my recently funded Moore Investigator in Data-Driven Discovery proposal.
PH.D STUDENT OPENINGS IN QUANTITATIVE, COMPUTATIONAL, AND MACRO- ECOLOGY
The White Lab at the University of Florida has openings for one or more PhD students in quantitative, computational, and/or macro- ecology to start fall 2015. The student(s) will be supported as graduate research assistants from a combination of NSF, Moore Foundation, and University of Florida sources depending on their research interests.
The White Lab uses computational, mathematical, and advanced statistical/machine learning methods to understand and make predictions/forecasts for ecological systems using large amounts of data. Background in quantitative and computational techniques is not necessary, only an interest in learning and applying them. Students are encouraged to develop their own research projects related to their interests.
The White Lab is currently at Utah State University, but is moving to the Department of Wildlife Ecology and Conservation at the University of Florida starting summer 2015.
Interested students should contact Dr. Ethan White (email@example.com) by Nov 15th, 2014 with their CV, GRE scores, and a brief statement of research interests.
UPDATE: Added a note that we work at population, community, and ecosystem levels.
As a budding macroecologist, I have thought a lot about what skills I need to acquire during my Ph.D. This is my model of the four basic attributes for a macroecologist, although I think it is more generally applicable to many ecologists as well:
- Knowledge of SQL
- Dealing with proper database format and structure
- Finding data
- Appropriate treatments of data
- Understanding what good data are
- Monte Carlo methods
- Maximum likelihood methods
- Power analysis
- Higher level calculus
- Should be able to derive analytical solutions for problems
- Should be able to write programs for analysis, not just simple statistics and simple graphs.
- Able to use version control
- Once you can program in one language, you should be able to program in other languages without much effort, but should be fluent in at least one language.
Achieve expertise in at least 2 out of the 4 basic areas, but be able to communicate with people who have skills in the other areas. However, if you are good at collaboration and come up with really good questions, you can make up for skill deficiencies by collaborating with others who possess those skills. Start with smaller collaborations with the people in your lab, then expand outside your lab or increase the number of collaborators as your collaboration skills improve.
Achieving proficiency in an area is best done by using it for a project that you are interested in. The more you struggle with something, the better you understand it eventually, so working on a project is a better way to learn than trying to learn by completing exercises.
The attribute should be generalizable to other problems: For example, if you need to learn maximum likelihood for your project, you should understand how to apply it to other questions. If you need to run an SQL query to get data from one database, you should understand how to write an SQL query to get data from a different database.
In graduate school:
Someone who wants to compile their own data or work with existing data sets needs to develop a good intuitive feel for data; even if they cannot write SQL code, they need to understand what good and bad databases look like and develop a good sense for questionable data, and how known issues with data could affect the appropriateness of data for a given question. The data skill is also useful if a student is collecting field data, because a little bit of thought before data collection goes a long way toward preventing problems later on.
A student who is getting a terminal master’s and is planning on using pre-existing data should probably be focusing on the data skill (because data is a highly marketable skill, and understanding data prevents major mistakes). If the data are not coming from a central database, like the BBS, where the quality of the data is known, additional time will have to be added for time to compile data, time to clean the data, and time to figure out if the data can be used responsibly, and time to fill holes in the data.
Master’s students who want to go on for a Ph.D. should decide what questions they are interested in and should try to pick a project that focuses on learning a good skill that will give them a headstart- more empirical (programming or stats), more theoretical (math), more applied (math (e.g., for developing models), stats(e.g., applying pre-existing models and evaluating models, etc.), or programming (e.g. making tools for people to use)).
Ph.D. students need to figure out what types of questions they are interested in, and learn those skills that will allow them to answer those questions. Don’t learn a skill because it is trendy or you think it will help you get a job later if you don’t actually want to use that skill. Conversely, don’t shy away from learning a skill if it is essential for you to pursue the questions you are interested in.
Right now, as a Ph.D. student, I am specializing in data and programming. I speak enough math and stats that I can communicate with other scientists and learn the specific analytical techniques I need for a given project. For my interests (testing questions with large datasets), I think that by the time I am done with my Ph.D., I will have the skills I need to be fairly independent with my research.
Slides and script from Ethan White’s Ignite talk on Big Data in Ecology from Sandra Chung and Jacquelyn Gill‘s excellent ESA 2013 session on Sharing Makes Science Better. Slides are also archived on figshare.
1. I’m here to talk to you about the use of big data in ecology and to help motivate a lot of the great tools and approaches that other folks will talk about later in the session.
2. The definition of big is of course relative, and so when we talk about big data in ecology we typically mean big relative to our standard approaches based on observations and experiments conducted by single investigators or small teams.
3. And for those of you who prefer a more precise definition, my friend Michael Weiser defines big data and ecoinformatics as involving anything that can’t be successfully opened in Microsoft Excel.
4. Data can be of unusually large size in two ways. It can be inherently large, like citizen science efforts such as Breeding Bird Survey, where large amounts of data are collected in a consistent manner.
5. Or it can be large because it’s composed of a large number of small datasets that are compiled from sources like Dryad, figshare, and Ecological Archives to form useful compilation datasets for analysis.
6. We have increasing amounts of both kinds of data in ecology as a result of both major data collection efforts and an increased emphasis on sharing data.
7-8. But what does this kind of data buy us. First, big data allows us to work at scales beyond those at which traditional approaches are typically feasible. This is critical because many of the most pressing issues in ecology including climate change, biodiversity, and invasive species operate at broad spatial and long temporal scales.
9-10. Second, big data allows us to answer questions in general ways, so that we get the answer today instead of waiting a decade to gradually compile enough results to reach concensus. We can do this by testing theories using large amounts of data from across ecosystems and taxonomic groups, so that we know that our results are general, and not specific to a single system (e.g., White et al. 2012).
11. This is the promise of big data in ecology, but realizing this potential is difficult because working with either truly big data or data compilations is inherently challenging, and we still lack sufficient data to answer many important questions.
12. This means that if we are going to take full advantage of big data in ecology we need 3 things. Training in computational methods for ecologists, tools to make it easier to work with existing data, and more data.
13. We need to train ecologists in the computational tools needed for working with big data, and there are an increasing number of efforts to do this including Software Carpentry (which I’m actively involved in) as well as training initiatives at many of the data and synthesis centers.
14. We need systems for storing, distributing, and searching data like DataONE, Dryad, NEON‘s data portal, as well as the standardized metadata and associated tools that make finding data to answer a particular research question easier.
15. We need crowd-sourced systems like the Ecological Data Wiki to allow us to work together on improving insufficient metadata and understanding what kinds of analyses are appropriate for different datasets and how to conduct them rigorously.
16. We need tools for quickly and easily accessing data like rOpenSci and the EcoData Retriever so that we can spend our time thinking and analyzing data rather than figuring out how to access it and restructure it.
17. We also need systems that help turn small data into big data compilations, whether it be through centralized standardized databases like GBIF or tools that pull data together from disparate sources like Map of Life.
18. And finally we we need to continue to share more and more data and share it in useful ways. With the good formats, standardized metadata, and open licenses that make it easy to work with.
19. And so, what I would like to leave you with is that we live in an exciting time in ecology thanks to the generation of large amounts of data by citizen science projects, exciting federal efforts like NEON, and a shift in scientific culture towards sharing data openly.
20. If we can train ecologists to work with and combine existing tools in interesting ways, it will let us combine datasets spanning the surface of the globe and diversity of life to make meaningful predictions about ecological systems.
Figuring out how to teach well as a professor at a research university is largely a self-study affair. For me the keys to productive self-study are good information and self-reflection. Without good information you’re not learning the right things and without self-reflection you don’t know if you are actually succeeding at implementing what you’ve learned. There have been some nice posts recently on information and self-reflection about how we teach over at Oikos (based on, indirectly, on a great piece on NPR) and Sociobiology (and a second piece) that are definitely worth a read. As part of a course I’m taking on how to teach programming I’m doing some reading about research on the best approaches to teaching and self-reflection on my own approaches in the classroom.
One of the things we’ve been reading is a great report by the US Department of Education’s Institute of Education Sciences on Organizing Instruction and Study to Improve Student Learning. The report synthesizes existing research on what to do in the classroom to facilitate meaningful long-term learning, and distills this information into seven recommendations and information on how strongly each recommendation is supported by available research.
- Space learning over time. Arrange to review key elements of course content after a delay of several weeks to several months after initial presentation. (moderate)
- Interleave worked example solutions with problem-solving exercises. Have students alternate between reading already worked solutions and trying to solve problems on their own. (moderate)
- Combine graphics with verbal descriptions. Combine graphical presentations (e.g., graphs, figures) that illustrate key processes and procedures with verbal descriptions. (moderate)
- Connect and integrate abstract and concrete representations of concepts. Connect and integrate abstract representations of a concept with concrete representations of the same concept. (moderate)
- Use quizzing to promote learning.
- Use pre-questions to introduce a new topic. (minimal)
- Use quizzes to re-expose students to key content (strong)
- Help students allocate study time efficiently.
- Teach students how to use delayed judgments of learning to identify content that needs further study. (minimal)
- Use tests and quizzes to identify content that needs to be learned (minimal)
- Ask deep explanatory questions. Use instructional prompts that encourage students to pose and answer “deep-level” questions on course material. These questions enable students to respond with explanations and supports deep understanding of taught material. (strong)
This is a nice summary, but it’s definitely worth reading the whole report to explore the depth of the thought process and learn more about specific ideas for how to implement these recommendations.
How am I doing?
Recently I’ve been teaching two courses on programming and database management for biologists. Because I’m not a big believer in classroom lecture, for this type of material, a typical day in one of these courses involves: 1) either reading up on the material in a text book or viewing a Software Carpentry lecture before coming to class; 2) a brief 5-10 minute period of either re-presenting complex material or answering questions about the reading/viewing; and 3) 45 minutes of working on exercises (during which time I’m typically bouncing from student to student helping them figure out things that they don’t understand). So, how am I doing with respect the the above recommendations?
1. Space learning over time. I’m doing OK here, but not as well as I’d like. The nice thing about teaching introductory programming concepts is that they naturally build on one another. If we learned about if-then statements two weeks ago then I’m going to use them in the exercises about loops that we’re learning about this week. I also have my advanced class use version control throughout the semester for retrieving data and turning in exercises to force them to become very comfortable with the work-flow. However, I haven’t done a very good job of bringing concepts back, on their own, later in the semester. The exercise based approach to the course is perfect for this, I just need to write more problems and insert them into the problem-sets a few weeks after we cover the original material.
2. Interleave worked example solutions with problem-solving exercises. I think I’m doing a pretty good job here. Student’s see worked examples for each concept in either a text book or video lecture (viewed outside of class) and if I think they need more for a particular concept we’ll walk through a problem at the beginning of class. I often use the Online Python Tutor for this purpose which provides a really nice presentation of what is going on in the program. We then spend most of the class period working on problem-solving exercises. Since my classes meets three days a week I think this leads to a pretty decent interleaving.
3. Combine graphics with verbal descriptions. I do some graphical presentation and the Online Python Tutor gives some nice graphical representations of running programs, but I need to learn more about how to communicate programming concepts graphically. I suspect that some of the students that struggle the most in my Intro class would benefit from a clearly graphical presentation of what is going happening in the program.
4. Connect and integrate abstract and concrete representations of concepts. I think I do this fairly well. The overall motivation for the course is to ground the programming material in the specific discipline that the students are interested in. So, we learn about the general concept and then apply it to concrete biological problems in the exercises.
5. Use quizzing to promote learning. I’m not convinced that pre-questions make a lot of sense for material like this. In more fact based classes they are helping to focus students’ attention on what is important, but I think the immediate engagement in problem-sets that focus on the important aspects works at least as well in my classroom. I do have one test in the course that occurs about half way through the Intro course after we’ve covered the core material. It is intended to provide the “delayed re-exposure” that has been shown to improve learning, but after reading this recommendation I’m starting to think that this would be better accomplished with a series of smaller quizzes.
6. Help students allocate study time efficiently. I spend a fair bit of time doing this when I help students who ask questions during the assignments. By looking at their code and talking to them it typically becomes clear where the “illusion of knowing” is creeping in and causing them problems and I think I do a fairly good job of breaking that cycle and helping them focus on what they still need to learn. I haven’t used quizzes for this yet, but I think they could be a valuable addition.
7. Ask deep explanatory questions. One of the main focuses in both of my courses is an individual project where the students work on a larger program to do something that is of interest to them. I do this with the hope that it can provide the kind of deep exposure that this recommendation envisions.
So, I guess I’m doing OK, but I need to work more on representation of material both through bringing back old material in the exercises and potentially through the use of short quizzes throughout the semester. I also need to work on alternative ways to present material to help reach folks whose brains work differently.
If you are a current or future teacher I really recommend reading the full report. It’s a quick read and provides lots of good information and food for thought when figuring out how to help your students learn.
Thanks for listening in on my self-reflection. If you have thoughts about this stuff I’d love to hear about it in the comments.
Some time ago in academia we realized that it didn’t make sense for individual scientists or even entire departments to maintain their own high performance computing resources. Use of these resources by an individual is intensive, but sporadic, and maintenance of the resources is expensive  so the universities soon realized they were better off having centralized high performance computing centers so that computing resources were available when needed and the averaging effects of having large numbers of individuals using the same computers meant that the machines didn’t spend much time sitting idle. This was obviously a smart decision.
So, why haven’t universities been smart enough to centralize an even more valuable computational resource, their computer labs?
As any student of Software Carpentry will tell you, it is far more important to be able to program well than it is to have access to a really large high performance computing center. This means that the most important computational resource a university has is the classes that teach their students how to program, and the computer labs on which they rely.
At my university  all of the computer labs on campus are controlled by either individual departments or individual colleges. This means that if you want to teach a class in one of them you can’t request it as a room through the normal scheduling process, you have to ask the cognizant university fiefdom for permission. This wouldn’t be a huge issue, except that in my experience the answer is typically a resounding no. And it’s not a “no, where really sorry but the classroom is booked solid with our own classes,” it’s “no, that computer lab is ours, good luck” .
And this means that we end up wasting a lot of expensive university resources. For example, last year I taught in a computer lab “owned” by another college . I taught in the second class slot of a four slot afternoon. In the slot before my class there was a class that used the room about four times during the semester (out of 48 class periods). There were no classes in the other two afternoon slots . That means that classes were being taught in the lab only 27% of the time or 2% of the time if I hadn’t been granted an exception to use the lab .
Since computing skills are increasingly critical to many areas of science (and everything else for that matter) this territoriality with respect to computer labs means that they proliferate across campus. The departments/colleges of Computer Science, Engineering, Social Sciences, Natural Resources and Biology  all end up creating and maintaining their own computer labs, and those labs end up sitting empty (or being used by students to send email) most of the time. This is horrifyingly inefficient in an era where funds for higher education are increasingly hard to come by and where technology turns over at an ever increasing rate. Which  brings me to the title of this post. The solution to this problem is for universities to stop allowing computer labs to be controlled by individual colleges/departments in exactly the same way that most classrooms are not controlled by colleges/departments. Most universities have a central unit that schedules classrooms and classes are fit into the available spaces. There is of course a highly justified bias to putting classes in the buildings of the cognizant department, but large classes in particular may very well not be in the department’s building. It works this way because if it didn’t then the university would be wasting huge amounts of space having one or more lecture halls in every department, even if they were only needed a few hours a week. The same issue applies to computer labs, only they are also packed full of expensive electronics. So please universities, for the love of all that is good and right and simply fiscally sound in the world, start treating computer labs like what they are: really valuable and expensive classrooms.
 Think of a single scientist who keeps 10 expensive computers, only uses them a total of 1-2 months per year, but when he does the 10 computers aren’t really enough so he has to wait a long time to finish the analysis.
 And I think the point I’m about to make is generally true; at least it has been at several other universities I’ve worked over the years.
 Or in some cases something more like “Frak you. You fraking biologists have no fraking right to teach anyone a fraking thing about fraking computers.” Needless to say, the individual in question wasn’t actually saying frak, but this is a family blog.
 As a result of a personal favor done for one administrator by another administrator.
 I know because I took advantage of this to hold my office hours in the computer lab following class.
 To be fair it should be noted that this and other computer labs are often used by students for doing homework (along with other less educationally oriented activities) when classes are not using the rooms, but in this case the classroom was a small part of a much larger lab and since I never witnessed the non-classroom portion of the lab being filled to capacity, the argument stands.
 etc., etc., etc.
An increasingly large number of folks doing research in ecology and other biological disciplines spend a substantial portion of their time writing computer programs to analyze data and simulate the outcomes of biological models. However, most ecologists have little formal training in software development¹. A recent survey suggests that we are not only; with 96% of scientists reporting that they are mostly self-taught when it comes to writing code. This makes sense because there are only so many hours in the day, and scientists are typically more interested in answering important questions in their field than in sitting through a bachelors degree worth of computer science classes. But, it also means that we spend longer than necessary writing our software, it contains more bugs, and it is less useful to other scientists than it could be².
Software Carpentry to the Rescue
Fortunately you don’t need to go back college and get another degree to substantially improve your knowledge and abilities when it comes to scientific programming, because with a few weeks of hard work Software Carpentry will whip you into shape. Software Carpentry was started back in 1997 to teach scientists “the concepts, skills, and tools they need to use and build software more productively” and it does a great job. The newest version of the course is composed of a combination of video lectures and exercises, and provides quick and to the point information on such critical things as:
along with lots of treatment of best practices for writing code that is clear and easy to read both for other people and for yourself a year from now when you sit down and try to figure out exactly what you did³.
The great thing about Software Carpentry is that it skips over all of the theory and detail that you’d get when taking the relevant courses in computer science and gets straight to crux – how to use the available tools most effectively to conduct scientific research. This means that in about 40 hours of lecture and 100-200 hours of practice you can be a much, much, better programmer who rights code more quickly, with fewer bugs, that be easily reused. I think of it as boot camp for scientific software development. You won’t be an expert marksman or a black belt in Jiu-Jitsu when you’re finished, but you will know how to fire a gun and throw a punch.
I can say without hesitation that taking this course is one of the most important things I’ve done in terms of tool development in my entire scientific career. If you are going to write more than 100 lines of code per year for your research then you need to either take this course or find someone to offer something equivalent at your university. Watch the lectures, do the exercises, and it will save you time and energy on programming; giving you more of both to dedicate to asking and answering important scientific questions.
¹I took 3 computer science courses in college and I get the impression that that is about 2-3 more courses than most ecologists have taken.
²I don’t know of any data on this, but my impression is that over 90% of code written by ecologists is written by a single individual and never read or used by anyone else. This is in part because we have no culture of writing code in such a way that other people can understand what we’ve done and therefore modify it for their own use.
³I know that I’ve decided that it was easier to “just start from scratch” rather than reusing my own code on more than one occasion. That won’t be happening to me again thanks to Software Carpentry
As I’ve mentioned before I’m not a big fan of the configuration of most comprehensive exams, but my post on the matter keeps languishing on my out of control To Do list. So, I was really pleased when a friend of mine passed along something that a student had sent him*. The piece is actually about a portion of thesis defenses, but I think it applies most appropriately to comprehensive exams (just substitute writtens for thesis, and add the fact that the guy who picks the snakes is hard of hearing). Regardless, it is short, hilarious and just the sort of thing stressed out students, postdocs, and faculty need to get a little chuckle as they finish up the semester. Go read it.
Anyone who has been around the halls of academia for a while has heard some well meaning soul talk about how we produce too many PhD students for the number of faculty positions, that this is unfair, and that therefore we should take fewer students. The most recent version of this idea on the web goes so far as calling the academic enterprise a Ponzi scheme. I’ve never personally found this argument very convincing. No other area of employment has a degree the guarantees its recipients their preferred job and I think that thinning the pool of potential talent from the scientific fields before it’s really possible to tell who the important thinkers of the next generation might be is bad for science (and all of the things that benefit from it). I’ve never taken the time to really expand on these thoughts, but thankfully James Keirstead over at Academic Productivity has an interesting post up responding to the ideas in the first link. Go check it out.
Some days I really wonder whether the bureaucratic infrastructure at institutions of higher education has any idea whatsoever that their job is to support the research and teaching missions of the university.
As a graduate student, explaining what your day to day life is like to your non-academic friends can sometimes be a little difficult. In this enjoyable piece from The Science Creative Quarterly Daven Tai takes a unique approach to this challenge:
Working to get your PhD is like training to become a Jedi Knight,” I started. “You follow a Master; you live a life of sacrifice; you must develop rational thought and patience…
If you’re looking for five minutes of academically oriented fun go check out the whole article.