It’s become part of my daily ritual to check the updated extended forecast in hopes of seeing of bona fide warm spell on the horizon. And each day my hopes are dashed–30s, some 40s, and now finally some low 50s–but no real birds-a-chirpin’ spring weather. As a son of the Lake Superior shoreline I’m somewhat ashamed to admit my impatience with the relatively mild New England winter, but this one just feels relentless.

So I thought I’d try and put some numbers behind my suspicions, and dig into some real historic data. I pulled some NOAA daily temerature data for Amherst (where I live)–luckily there’s a (mostly) complete record going back to the late 19th century.

The tidied data looks a little something like this (last 6 rows of the dataset):

Date jday year highT
42810 2015-03-18 77 2015 48.92
42811 2015-03-19 78 2015 30.02
42812 2015-03-20 79 2015 35.06
42813 2015-03-21 80 2015 30.02
42814 2015-03-22 81 2015 44.06
42815 2015-03-23 82 2015 30.02

A note on tidying. I dealt with missing values as follows: any year missing 10 or more days’ data from the first 82 days of the year was omitted from the dataset. Other missing values were interpolated linearly from adjacent days’ data.


  • jday is the Julian day of the year (counting from 1 to 365 [or 366] starting January 1 of each year). The data I got is current through March 23, 2015. That’s Julian day 82.
  • highT is the daily maximum temperature, which I’ve converted to Fahrenheit. Other measurements came in the raw data, but I decided to look only at highs.


I wanted to know how cold 2015 has been, compared to the average for the first 82 days of a year. So how about the average daily high, compared with the historic record? First a plot:

## Warning: Using size for a discrete variable is not advised.

Well that puts my mind at ease. Definitely colder on average. Notice that this year is especially an outlier in the context of a pretty clear increasing temperature trend. Don’t see it? Try now:

## Warning: Using size for a discrete variable is not advised.

Not only is 2015 an outlier, but it’s an influential outlier–if we remove that point from the plot, the linear regression line shifts noticeably:

## Warning: Using size for a discrete variable is not advised.

Here’s a similar picture, but looking instead at medians:

## Warning: Using size for a discrete variable is not advised.

Here the difference is even more striking. The typical day in the first 85 days of 2015 has been colder than just about any year in the historic record.

Now the year-to-date high: even in an otherwise chilly season a 60-degree day can really pick up the spirits. (And this year we’ve had nothing over 55.)

## Warning: Using size for a discrete variable is not advised.

Again, clearly colder than average, especially given the upward trend. But not as drastic. Also notice that the year-to-date high is not symetrically distributed, but has a heavy upper tail with some values in the 70s. (1921 even had an 80-degree day!)

Geting meta with ranks and order stats.

The maximum and the median are both order statistics–values corresponding to a given rank of the data. The maximum is the value of the highest-ranked point, and the median is the value of the middlest-ranked point. Now I want to look at the whole range of order stats, from the first (i.e. minimum) to the last (maximum). For 2015, these look thusly:

This is just taking all of the daily high temperatures so far this year and lining them up from coldest to warmest.

Let’s compare that (the 2015 order statistics) to the median order statistics over all years.

We see that only the minimum is (marginally) higher than the median year-to-date minimum for this time of year. Otherwise all of the order stats are colder than average–by over 5 degrees in some cases. Note that here I’m comparing the temperature order statistics over the days in a single year to an order statistic (the median) of such order statistics taken over all years.

Now let’s go a step further.

I’m going to take actual temperature values out of the picture entirely, and only look at ranks of order statistics for the year 2015. For each of the 2015 order statistics I’m going to ask the question: “Of all years’ order stats, where does this one line up?”

This is really interesting. It shows that just about all the year-to-date order statistics–from the 3rd-lowest daily high tempearture all the way to the very highest daily high temperature–were well below average. And many of them–from around the 40th to the 70th–were among the all-time low of these order statistics over all years on record. The uptick at the left side of the plot says that the coldest days this year weren’t extremely cold from a historical standpoint, but the consistent chill of the more typical days this year (those around the midde of the x-axis) is virtually without precedent. That is what’s been wearing on me.

Perhaps it would be better to show the y-axis in terms of percentiles. Here’s that:

For the sake of comparison, let’s look at the same plot for a range of years.

Here’s last year:

Not a warm year by any stretch, and the values around the median were still much colder than average, but at least the warm days were warmer.

2013 was exceptionally average:

And 2012 was exceptionally warm:

Curiously, this had sort of a mirrored shape from this year–the cold days were closer to average than the rest, which were very warm indeed.

But climate change!

Before you start using the words “global warming” and “hoax” in the same sentence (OMG like I just did!) I should point out that this year’s anomalous frigidity has been very spatially confined. According to, this winter was the country’s 19th warmest despite what we felt out here. And if you want to be spatially selective and get the opposite extreme, look at what’s going on out west. And let’s not forget that globally last year was the warmest ever.

To me the fact that we can still set cold records at the local scale speaks to the spatial and temporal variability of both climate and climate change. We’ve got enough of a signal to talk about warming climate with certainty, but the noise around that signal is large, palpable, and sometimes uncomfortably chilly.

Like always, I did all of this analysis using R. You can find the code on github