## Probability of Setting versus holding records.

There’s been a lot of talk of records lately, and while I’m thinking about doing a full blown analysis myself, I thought I’d just comment on some information about the probability that a record is set at a given time in a record of something. Note that this is a somewhat different question from the probability a particular time is the record holder. Let’s suppose that the change in some variable relative to whatever it’s previous value was is a continuous random variable (so that ties have 0 probability) with a symmetric distribution with  mean value of zero: in other words, let’s assume that our variable changes randomly, and over a sufficiently long time should show no change. For simplicity let’s assume this also means that it has an equal probability that any point will be higher or lower than any given point in the time series. A given point has an equal probability of being higher or lower than the point preceding it or any point for that matter.  So, when the record starts, every single possible record has a probability of 1 of being “set” since there is no previous value to compare to. The next point has a probability of .5 of being lower than the previous point and .5 of being higher, which will be true of the next point relative to it. In order for a point to set a record, however, it must be lower or higher than every preceding point, not merely the previous point. So say it’s the mean temperature of a particular day of the year. The expected number of records set is N warm “records” and N cold “records” the first year, where N is the number of locations (since we could also go for high and low records for min and max separately, and do every day of the year, we could multiply the number by two and also by the number of days in a year (for simplicity, let’s not concern ourselves with the fact that since we are looking at records on calendar days of the year, another day is periodically added, or the fact that the temperature of a particular calendar day should drift back and forth in sync with the realignment of the calendar like that) and then the numbers could get larger). The next year the expected number of new warm records is N/2 and the expected number of new cold records is N/2 because half of locations should have that day colder and half warmer than that day the previous year-note that spatial correlation may lead to the stations doing those things being clustered together, which means that if the “clusters” are of the size of the area containing our N locations, these values won’t tend to be right. As long as the geographic region can easily contain within it a large number of “clusters” at a given time, the “cool” clusters should be as common and large as the “warm” clusters, however if clusters can encompass large portions of the Earth’s surface, differences from the expected behavior may occur. But regardless, given our previous assumptions about the behavior of our hypothetical temperature variable, the expected number of records continues to decay, in year three to N/4 each, then N/8 each, and so on (N/(2^year-1). If you have 10000 locations with records starting at the same time, the expected number of new records both warm and cold being set would drop to less than one in just 16 years. The main point here is that the probability that a record will get set decays with time rapidly until it gets close to zero at which point in decreases quite slowly. So it would not be surprising if one were to find the rate at which new records are set going down over time. However, if one is looking at a time series showing number of current record holders, that is an entirely different story. The first year, for instance, sets a record for both the warmest and coldest for that date automatically, but, since the next year will be warmer or colder (we have stipulated a tie is impossible), the maximum number of records that the first year might retain is merely half those it set-the maximum, not the expected number held. In point of fact, the number of records that may be held is of course 2N in total, spread out over all the years. The probability that after a long time, the first year still holds one of it’s set records will be exceedingly small.Year two has half of the records held by the first year, half by the second. The third year breaks a quarter of the standing records, and half of those records are from the first year and half the second, so the distribution of records would be .375, .375, and .125. And as the record gets longer, the probability of record retention by early years gets lower and lower, and the distribution gets flatter. For a very long dataset, the probability a record is held by a particular year is generally low, and not very different from the year before it. While there is still a higher probability of records in the earliest years, the decrease with time is not nearly so dramatic. Now, what all this means is that graphs that show the number of records decreasing with time, and the rate of that decrease becoming smaller with time, is entirely consistent even with a static climate, and would not require a climate to be changing in any particular way. Another point is that it is not obvious, since the climate is of course, definitely not completely static, what impact a change in climate would have on records. Presumably a climate that gets warmer would see an increase in high temperature records (given the character of twentieth century changes and also changes in the last thirty or so years, those would be generally exceptionally warm winter nights more so than exceptionally warm summer days) but also a decrease in the number of cold temperature records, relative to the naive statistical expectation. Since those two effects are in opposite directions it is conceivable that they would cancel one another and thus records wouldn’t necessarily be a metric indicative of a change in climate or lack there of because they could conceivably not change relative to the expectation even if there was a change in the different kinds of records due to a change in climate. Some would suggest that the ratio of one set of records to the other would be more appropriate. This however would be potentially highly misleading. Even the warm records decreased relative to expectations, in the cold records decreased relative to expectations faster, the ratio of warm to cold would go up-in other words, a ratio doesn’t tell you whether the change was in the numerator or the denominator, and moreover as the denominator approaches zero, even if the numerator is very small, as long as it is larger, the ratio becomes huge, blowing up at a rapid rate exaggerating what is going on and obscuring important details. Moreover, a change in the number of records is not necessarily and indication of a change in how extreme the climate is: if high temperature records are being set for nights in the dead of winter, but not so much in summer, that represents a decrease in the variance of climate and one becoming on the whole less extreme. As far as I can tell, that’s exactly what has been happening, and is not a particularly alarming way for climate to change at all. As a last point, has anyone really considered how much of a difference between new records and old records (which may have been standing, against difficult odds, for upwards of a hundred years) there generally is? In the real world the variable is generally rounded to whole degrees and not purely continuous (which in turn allows for the possibility of ties) so a record that gets broken may have only a single degree difference from the new record, which, even if the entire difference between the records could be attributed to anthropogenic changes (highly unlikely given we are talking about pairs of data points) these would represent small trends.

So, basically, I see people getting all worked up over records getting set or broken, and how scary that is, I have to admit to being mildly amused. To paraphrase Inigo Montoya, “You keeping using that statistic. I do not think it means what you think it means.”