I had an interesting conversation with someone the other day that made me think I needed one last frequency distribution post in order to avoid causing some people to not move forward with addressing interesting questions.
As a quantitative ecologist I spent a fair amount of time trying to figure out the best way to do things. In other words, I often want to know what the best method is available for answering a particular question. When I think I’ve figured this out I (sometimes, if I have the energy) try to communicate the best methodology more broadly to encourage good practice and accurate answers to questions of interest to ecologists. In some cases finding the best approach is fairly easy. For example, likelihood based methods for fitting and comparing simple frequency distributions are often straightforward and can be easily looked up online. However, in many cases the methodological challenges are more substantial, or the question being asked is not general enough that the methods have been worked out and clearly presented. This happens in the case of frequency distributions when one needs non-standard minimum and maximum values (a common case in ecological studies) or when one needs discrete analogs of traditionally continuous distributions. It’s not that these cases can’t be addressed, it’s just that you can’t look the solutions up on Wikipedia.
So, what is someone without a sufficient background to do (and, btw, that might be all of us if the problem is really hard or even… intractable). First, I’d recommend trying to ask for help. Talk to a statistician at your university or a quantitative colleague and see if they can help you figure things out. I am always pleased to try to help out because I always learn something in the process. Then, if that fails, just do something. Morgan and I will probably write more about this later, but please, please, please don’t let the questions you ask as an ecologists be defined by the availability of an ideal statistical methodology that is easy to implement. In the context of the current series of posts, if you are trying to do something with a more complex frequency distribution and you can’t find a solution to your problem using likelihood then use something else. If it was me I’d go with either normalized logarithmic binning or something based on the CDF as these methods can behave reasonably well. Sure, people like me may complain, but that’s fine. Just make clear that you are aware of the potential weaknesses and that you did what you did because you couldn’t figure out an appropriate alternative approach. That way you still get to make progress on the question of interest and you may motivate people to help work on developing better methods. Sure, you might be the presenting the “right” answer, but then I very much doubt that we ever are when studying ecological systems anyway.
As an ecology undergrad, I’m curious as to what you think “ecology” (or more broadly, science) students should be taught in University. Currenty, at University of Ottawa, the only statistics requirement for an Hons. BSC in Biology is a “statistics for life sciences” course, which is approximately equal to an AP Statistics, or 1st year Statistics course.
Would you advocate better statistical skills in favour of say, calculus, physics, chemistry, or some other field (assuming that # of required credits is constant)?
Again, as an undergrad with limited statistical knowledge, I’m unsure as to how much a limiting factor proper statistical knowledge is to publishing ecologists – thoughts?
Hi Andrew – That’s a great question that I’m afraid doesn’t have a simple answer. If I did offer a simple answer it would be that more statistical background is almost always of benefit to ecologists, scientists more broadly, and people in general regardless of their career paths. The more complex answer begins with the fact that one of the things that I love about ecology is that it is such a multidisciplinary field. This means that you never run out of things to learn, but makes decisions about undergraduate curricula difficult. Basically, the best set of undergraduate course work (and graduate course work for that matter) really depends on what kind of ecology you are interested in doing. For example, I’ve never used any chemistry in my research, so I would have been better off if two or three of my semesters of chemistry had been replaced with more mathematics, statistics, or computer science. But, someone whose research focuses on biogeochemistry will certainly need those same chemistry courses just to be ready to learn the more advanced material they need in graduate school.
I guess what I’d like to see in ecology programs (and biology programs in general) is more flexibility with respect to the non-biology coursework. The increasingly interdisciplinary nature of ecological research variously calls for background in mathematics, statistics, computer science, geology, chemistry, physics, and even areas of social science and politics. What additional courses students take could be determined by the kind of ecology students are interested in, or maybe more likely the kinds of supplemental course work that students enjoy could guide their decisions regarding specific areas of research interest (this is how it worked for me and still does to some degree). The first couple of years students could sample one course each from some of these areas and decide on a secondary field to focus on during their junior and senior years. You can probably do something like this yourself with some of your uncommitted hours (if you have any).
I know this isn’t a very specific answer, but hopefully it is of some use. If you have any more specific questions please let me know and I’ll do what I can to answer them.
Oh, and one other thing. If you’re getting even one statistics course as an undergraduate, that’s better than a lot of programs. The problem is that we typically teach that one course the wrong way (IMHO). You’ll probably be taught that there are a bunch of separate statistical tests – one for each type of situation. The truth is actually closer to the opposite – one underlying set of principles and approaches that apply to everything. If we introduced people to these ideas during their early statistics course, I think we’d find there was a lot more interest in, and a deeper understanding of, statistics.
About your second post – really? I have only ever learned about how different tests are to be used in specific circumstances. It seems that in all the statistics courses in the undergrad level, proofs are seldom given out, so we students learn our equations for distributions and tests, but never the reasons behind them. This way, we can easily memorize them for tests / assignments, but when it comes to really understanding why we get the results we get, well… I doubt many undergrads would be able to explain that.
Do you mean that it’s the underlying statistical methods and proofs that should be taught and that that specific tests should be more secondary material?
Yeah, that’s basically the idea. Most of the tests that you are being taught are based on the simple analytical solution to applying the same basic set of concepts from probability in a particular context. I’d like to see us start from the ground up by teaching probability and likelihood. The problem of course is that building from the ground up you can only probably introduce a few specific approaches in one semester, but if you understand that the linear regression and ANOVA that you’ve learned to perform are basically the same I think you’re much better off than if you are able to perform 3 different kinds of ANOVA by hand (since in reality you just get your stats package to do it anyway. Here’s a link to a chapter from Benjamin Bolker’s new book describes the relationship between likelihood and the tests you’re being taught. If you’re interested in an introduction to these ideas that is pretty accessible I’d strongly recommend The Ecological Detective.