This is the story behind “Comparing process-based and constraint-based approaches for modeling macroecological patterns” by my former PhD student Xiao Xiao, James O’Dwyer, and myself.
I was on sabbatical in the fall of 2013 and was doing a lot of reading, and I reread “An integrative framework for stochastic, size-structured community assembly” by James O’Dwyer, Jessica Green, and colleagues. A couple of months earlier Xiao Xiao, Dan McGlinn, & I had submitted a paper on “A strong test of the Maximum Entropy Theory of Ecology“, where we had tested John Harte and colleague’s new maximum entropy based model by looking at four different predictions of the model simultaneously. In rereading O’Dwyer et al. I realized that their size-structured neutral theory would probably be able to predict a similar set of ecological distributions to those predicted by the maximum entropy model. We’d already conducted the first three levels in McGill’s hierarchy of model testing (see McGill 2003 and McGill et al. 2006) for Harte et al.’s maximum entropy model (checking the general form of the predictions, comparing to null hypotheses, and testing multiple complex predictions) and this would let us complete the last level by comparing the fit to realistic alternative models.
Getting to work
The math in O’Dwyer et al. is pretty advanced and I knew James through shared interests in ecological theory, so I emailed him and Xiao to see if it might it be mathematically tenable to use James’ model to make the same predictions we’d been testing and, if so, if he and Xiao were interested in working together on trying to do this.
What resulted was a very interdisciplinary collaboration, combining shared expertise in mathematical modeling, computing, analysis of large ecological datasets, and knowledge of the foundations of multiple models/theories. It was regular for two of the three people to have a detailed conversation that the third collaborator didn’t follow the details of but always felt comfortable interjecting to make sure that the big picture goals of the project stayed on track. In particular, I remember a \~100 message long email exchange where James and Xiao were working on getting the two theories to make identical predictions. They were on-boarding each other with the details of the two theories and then exchanging ideas in math that I wasn’t even trying to keep up with. I’d occasionally jump in to provide some relevant empirical details and information on other related theory/ideas to help keep things moving in the right direction, but generally just got to watch in awe as two folks with amazing theory skills did their thing. Xiao was constantly running and sharing new analyses which really helped make all of our interactions cohesive by grounding them in graphs and real values.
Reviews, revisions, and the speed of scientific dialog
During the review process John Harte pointed out that there was a second generation model from the maximum entropy theory that was expected to improve the areas where the version we were analyzing was performing poorly. We’d known about this work for a couple of years since we’d been actively sharing ideas and results with the Harte Lab throughout this research. We knew that this paper was already in review, but it didn’t seem like we could reasonably analyze work that they hadn’t made publicly available yet. So, we’d acknowledged in our paper that new models based on this general theory could improve it’s performance and planned to potentially come back later and analyze the new model in a second paper.
Aside: this is a perfect example of the advantages of preprints for facilitating a rapid scientific dialog. If this second generation paper had been posted as a preprint at the time it was initially submitted for review we would have been able to cite and analyze the new theory from earlier on in the process of working on our paper. In fact, we probably wouldn’t have had any choice because good reviewers would have pointed us to the preprint and told us that we needed to address it.
Without a preprint and with the paper still in review we could have easily told the editor that we couldn’t address the new model yet, and in fact the editor explicitly gave us that option. This would have made for a quick and easy acceptance since all the other comments involved only writing, but it arguably wasn’t in the best interests of moving science forward quickly. The new model would either be published first, or shortly after our paper, which would mean that the answer to the overarching question would have been very much up in the air. So, it would be better to add the new model to our analyses, but it would take a lot more work to do so. We would have to implement a new model from scratch, integrate it into our code base, and then rerun all of our fairly time-consuming analyses. Xiao was a newly minted PhD and James was an untenured assistant professor, so the best career strategy for them would have been to just get the paper in as is. This was particularly true for Xiao who was going to have to do the majority of the work getting the new model implemented, so we left the decision in her hands and made it clear that everyone was happy with either choice. She decided that the extra work was worth it to better answer the core question now and not only added the 2nd generation maximum entropy model, but also a more advanced version of the size-structured neutral theory model that she and James had been working on. This also broadened the scope of inference for the paper because we had now evaluated two models from each theory instead of just a single model.