I had an interesting conversation with someone the other day that made me think I needed one last frequency distribution post in order to avoid causing some people to not move forward with addressing interesting questions.
As a quantitative ecologist I spent a fair amount of time trying to figure out the best way to do things. In other words, I often want to know what the best method is available for answering a particular question. When I think I’ve figured this out I (sometimes, if I have the energy) try to communicate the best methodology more broadly to encourage good practice and accurate answers to questions of interest to ecologists. In some cases finding the best approach is fairly easy. For example, likelihood based methods for fitting and comparing simple frequency distributions are often straightforward and can be easily looked up online. However, in many cases the methodological challenges are more substantial, or the question being asked is not general enough that the methods have been worked out and clearly presented. This happens in the case of frequency distributions when one needs non-standard minimum and maximum values (a common case in ecological studies) or when one needs discrete analogs of traditionally continuous distributions. It’s not that these cases can’t be addressed, it’s just that you can’t look the solutions up on Wikipedia.
So, what is someone without a sufficient background to do (and, btw, that might be all of us if the problem is really hard or even… intractable). First, I’d recommend trying to ask for help. Talk to a statistician at your university or a quantitative colleague and see if they can help you figure things out. I am always pleased to try to help out because I always learn something in the process. Then, if that fails, just do something. Morgan and I will probably write more about this later, but please, please, please don’t let the questions you ask as an ecologists be defined by the availability of an ideal statistical methodology that is easy to implement. In the context of the current series of posts, if you are trying to do something with a more complex frequency distribution and you can’t find a solution to your problem using likelihood then use something else. If it was me I’d go with either normalized logarithmic binning or something based on the CDF as these methods can behave reasonably well. Sure, people like me may complain, but that’s fine. Just make clear that you are aware of the potential weaknesses and that you did what you did because you couldn’t figure out an appropriate alternative approach. That way you still get to make progress on the question of interest and you may motivate people to help work on developing better methods. Sure, you might be the presenting the “right” answer, but then I very much doubt that we ever are when studying ecological systems anyway.