Frequency distributions for ecologists III: Fitting model parameters


Don’t bin you’re data and fit a regression. Don’t use the CDF and fit a regression. Use maximum likelihood or other statistically grounded approaches that can typically be looked up on Wikipedia.

A bit more detail

OK, so you’ve visualized your data and after playing around a bit you have an idea of what the basic functional form of the model is. Now you want to estimate the parameters. So, for example, you’ve log-transformed the axes and you’re distribution is approximately a straight line so you think it’s a power-law and you want to estimate the exponent of the power-law (more about figuring out if this is actually the best model for you data in the next and final installment).

Ecologists typically fall back on what they know from an introductory statistics class or two to solve this problem – regression. They bin the data, count how many points occur for each binned set of x values and use those numbers for the y data and the bin center for the x data. They then fit a regression to these points to estimate parameters. You can’t blame us as a group for this approach because we typically don’t received training in the proper methods and we’re just doing out best. However, this approach is not valid and can yield very poor parameter estimates in some cases.

The best approach actually varies a bit depending on the specific form of the function, but generally what you want to do is maximum likelihood estimation (MLE). This approach basically determines the values of the free parameters that are most likely (i.e., have the greatest probability) of producing the observed data. Technically what you really want is actually the minimum variance unbiased estimator (MVUE), which will often be a slightly modified form of the maximum likelihood estimate. The bad news is that most ecologists won’t want to calculate the MLE themselves. The good news is that equations for calculating the MLE (or the MVUE) are readily available, and once you have the equation this approach is actually far less work than binning and fitting lines to data. For most common distributions you can just google the name of the distribution. The first link will typically be the Wikipedia entry. Just go to the Wikipedia page, scroll down to Parameter estimation and use the equation provided (and yes Wikipedia is quite reliable for this kind of information). Even if the best approach isn’t maximum likelihood this will give you typically provide you with the best approach (or at least a very good one). If you’re doing something more complicated there is an excellent series of reference books by Norman Johnson and colleagues that covers parameter estimation for most characterized distributions in detail. Some of these solutions will require the use of numerical methods. This is fairly straightforward in R and Matlab, but if that’s a bit much you should be able to find a statistician or quantitatively inclined ecological colleague to help (they’ll be impressed that you got as far as you did).

Recommended reading (focused on power-laws because that’s what I’m most familiar with)

Newman et al. 2005, Edwards et al. 2007White et al. 2008, Clauset et al. 2009

One Comment on “Frequency distributions for ecologists III: Fitting model parameters

  1. Pingback: Frequency distributions for ecologists IV: comparing model performance « Jabberwocky Ecology

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: