This presentation by David Stockwell comes highly recommended by me. He mentions something very interesting to those of us who like to test hypotheses like climate models: A metric for assessing the performance of a model compared to a benchmark, in this case the Mean. The statistical measure of skill is called Nash-Sutcliffe Efficiency, and it originates from Hydrological modeling of river discharge. The idea is that one wants to minimize the ratio of the sum of the squared errors of a model to the sum of the squared errors of simple average, and get as close to 1 as possible by making this ratio, which is subtracted from 1, close to zero. But the drought predictions climate models are, in the case of David’s example, Australia, worse than the long term mean. Interestingly, in Richard Lindzen’s Heartland presentation, he noted that one argument for models is that they are our “only tool”:
The notion that models are our only tool, even, if it were true, depends on models being objective and not arbitrarily adjusted. However, models are hardly our only tool. Models do show why they get the results they get. The reasons involve physical processes that can be independently assessed by both observations and basic theory.
But David shows that one could perhaps go beyond this. Not only are models not our only tool, they are, in some cases, actually worse than nothing. Why models produce incorrect results for drought trends in Australia is something we might be able to investigate and it might prove worthwhile. But for the purposes of telling us about the future, models which perform this poorly with respect to records that predate them, can only mislead us, and dangerously so.
I would also like to add a thought of my own: The NSE could be generalized to use any “model” as something to compare models with, rather than just an average. This might allow the possibility of asking whether models perform better than slightly more complex models than the average, such as linear trends. Of course, the Mean is a standard to test against, but there are interesting possibilities with regard to testing models performance against observations relative too each other. For one thing, it might be possible to see if models performance with respect to the observations has improved over time.