I started looking at weather data from NOAA when another physics friend said "wouldn't it be nice if we could repair missing data like is done with the weather?". In my field, data is what it is, and missing data is missing data. If you publish something and you extrapolate a line, you make sure that you mention how it was calculated. Usually, if we find data missing, we just take the data again. What I decided to do seemed simple:
1) Take the data from NOAA that is modified
2) Take the data from NOAA that is raw
3) Look at the difference and see if I could back extract the variables for the "Pairwise Homogenization Algorithm"
...and then you look at NOAA's data and you start to believe that there is a conspiracy. Even if the data is quality, the released datasets are terribly formatted, or seems to be just incorrect. Here's an example:
USH00011084 1897 734 3 1292 3 1972 3 1786 3 2084 3 2761 3 2753 3 2547 3 2406 3 1878 3 -9999 -9999 USH00011084 1900 -9999 -9999 1337a 3 1936 3 2378 3 2589 3 2770 3 2872 3 2700 3 2320 3 1486 3 1100 3 USH00011084 1926 -9999 1245 1251a 1781 2240 2654 2712 2763c 2770 2110 1256a 1421 USH00011084 1927 1209 1821 1651 2183 2467 2707 2730 2594a 2579 2081 1907 871f 3 USH00011084 1928 800b 1135 1614 1711 2218 2596 2829 2817 -9999 -9999 -9999 -9999
Here's an excerpt from the tavg dataset, where one would expect the average to be a positive number:
USH00017157 1940 -287 502 1059 1502 1822 2368 2447 2616 2161 1695 1035 897In the line above, -287 represents a -2.87 degree average at the USH00017157 weather station in 1940. How can something defined as (tmax-tmin)/2 be negative? To NOAA's credit, the have good documentation for this formatting disaster in readme.txt.
Before I do anything else, I'm going to ask NOAA about this data.
No comments:
Post a Comment