Wednesday, August 5, 2015

NOAA's data is difficult.

When it comes to things like climate change, I'm pretty much a pragmatists. Following the black body model, we are removing energy integrators (ie: green space) and replacing it with concrete, combined with releasing energy that was stored, logic dictates that the entropy of the system will increase, and thereby the temperature will increase because the earth is not expanding much.

I started looking at weather data from NOAA when another physics friend said "wouldn't it be nice if we could repair missing data like is done with the weather?". In my field, data is what it is, and missing data is missing data. If you publish something and you extrapolate a line, you make sure that you mention how it was calculated. Usually, if we find data missing, we just take the data again. What I decided to do seemed simple:
1) Take the data from NOAA that is modified
2) Take the data from NOAA that is raw
3) Look at the difference and see if I could back extract the variables for the "Pairwise Homogenization Algorithm"

...and then you look at NOAA's data and you start to believe that there is a conspiracy. Even if the data is quality, the released datasets are terribly formatted, or seems to be just incorrect. Here's an example:
USH00011084 1897   734  3  1292  3  1972  3  1786  3  2084  3  2761  3  2753  3  2547  3  2406  3  1878  3 -9999    -9999   
USH00011084 1900 -9999    -9999     1337a 3  1936  3  2378  3  2589  3  2770  3  2872  3  2700  3  2320  3  1486 3  1100  3
USH00011084 1926 -9999     1245     1251a    1781     2240     2654     2712     2763c    2770     2110     1256a    1421   
USH00011084 1927  1209     1821     1651     2183     2467     2707     2730     2594a    2579     2081     1907      871f 3
USH00011084 1928   800b    1135     1614     1711     2218     2596     2829     2817    -9999    -9999    -9999    -9999 

Here's an excerpt from the tavg dataset, where one would expect the average to be a positive number:
USH00017157 1940  -287      502     1059     1502     1822     2368     2447     2616     2161     1695     1035      897
In the line above, -287 represents a -2.87 degree average at the USH00017157 weather station in 1940. How can something defined as (tmax-tmin)/2 be negative? To NOAA's credit, the have good documentation for this formatting disaster in readme.txt.

Before I do anything else, I'm going to ask NOAA about this data.

No comments:

Post a Comment