Sunday, August 23, 2015

When in doubt, ask a mathematician.

As I have said in my previous posts, NOAA probably isn't part of a conspiracy, but they sure try to make you think they are. I have some major issues with the "Pairwise Homogenization Algorithm" as it creates residuals, which I do not believe are representative of actual changes. I'm going to table this until I can spend time with my math buddy.

Friday, August 21, 2015

NOAA gives you data that you cannot use.

NOAA releases data, and example tools to process things, but they do not release data that you can use. Check this note from the "Pairwise Homogenization Algorithm" software readme:
# A New Dataset must have at least a Station List and a Data Files. At a minimum, the
# Station List defines the Station ID/Latitude/Longitude. The Data Files (one for
# each station/element - see data/benchmark/world1/monthly/raw) defines all 
# of the Monthly Temperatures in Annual Records. All files are in the GHCNMv3 format.
Now, the data that is released is in the form of years, so you cannot use the software they supply to recreate their data.
I have the Pairwise Homogenization Algorithm setup in MATLAB, and I can make it do whatever I want to the data, including make it give me totally erroneous data.

I actually emailed NOAA, and they didn't get back to me because I'm asking questions that they already have decided everyone in "the know" should know. It's always nice to get an official answer.
--mail--
To whom it may concern,

I am looking for sources for the weights for the Pairwise Homogenization Algorithm that is used in the NOAA data reported in ushcn.tavg.latest.FLs.52j.tar.gz.   I have MATLAB scripts setup to analyze the data, but I’ve been having trouble.  I believe that I cannot reproduce the data because I am missing the monthly entries, and the exported data is yearly.   Any guidance or references would be appreciated. 

Also, there are about 20% fewer weather stations in 2015 than in 1990.  Do you have a reference for this as well?  I'm just curious why there are so many fewer stations.
--end mail--
After reading a bunch of papers, I've decided that things, such as infilling, are pretty much magic that should be disregarded when you can actually have a meaningful dataset.
Here are all of the reasons that you might need to tune a dataset:
Changes in type of equipment.
Changes in region of the thermometer (You get an urban heat island, for example).
A station has consistently given bad data.
A station has been moved.

After reading through the papers that I could find, I believe that what NOAA has done is make a self-correlating time series. You can use the Pairwise Homogenization Algorithm to create a series that creates the same anomalies in time that you are trying to avoid. What I really want to know is why NOAA does not release a useful, monthly, dataset. This is the sort of thing that makes me shake my head at the soft sciences. They might be right, or wrong, but they are definitely sloppy by the standards of engineering.

Where does this leave me? I'm going to look at the actual data, remove anything that is not complete, and then plot it. Even if there are changes due to urbanization, equipment changes, and bad data: it is real data. If there is heating due to urbanization, that's just part of the heat, and that should show an increased trend in heating over years.

The new temperature.sh file is in SVN as revision 6.

Tuesday, August 18, 2015

So, where did the weather stations go and does it matter?

In an attempt to determine if it actually matters what happened to the weather stations, I looked at the data for the last year (2015) and tried to match up the station locations over previous years. The stations in 2015 are largely in the previous year's stations. I plotted the ratio of available stations per year when compared to 2015.
What this means is that the weather stations in 2015 are largely available in previous years, which means that they can be used to make a real comparison. I would still love to know why so many stations were removed between 1990 and 2015.

I also found a free SVN site to post my scripts: https://subversion.assembla.com/svn/noaa This post can be recreated by checking out revision 5 and running "runme.m" via MATLAB or OCTAVE after getting the data via temperature.sh

Monday, August 17, 2015

NOAA's data. 20% fewer data points?

I'm still plugging away at NOAAs data. I have not yet come to a point that I'm happy with the data yet, but that is mostly because it's a terrible export. It is obviously from a database, but the format is indefensible. My respect for data in the soft sciences has waned considerably.

I have come across something that is odd, but not necessarily wrong. The number of weather stations trends up, and then down. I'm not sure what to think about this yet, but there are about 20% fewer stations today than in 1990.

Please excuse the huge graphs, but it's clearer this way. Why would one remove stations if the data is valid?

Thursday, August 6, 2015

NOAA's data, get it with this script.

NOAA's weather data, as I previously mentioned, is a bit difficult. It seems to have no coherent format, and it is obviously a dump from a database. In order to try to make it just a bit better, I used a bash script to download, and format the data:

temperature.sh

The script downloads the raw files to: ./temperature/downloads, and then tries to create CSV files in ./temperature/data/extracted

I've just started looking at the data, but wow... soft sciences.
The most jarring thing is that you will have data for a weather station given for the year. I'm sure that the internal database has monthly information because the year is given with an offset of missing months. So, the number 823b means that the temperature was 8.23 degrees C with 2 months missing. The letter denotes how many months are missing.

I will revisit this data when I feel that I can handle drudging through it.

I heard back from NOAA

I had a typo:
tavg = (tmax + tmin)/2
Claude
Thanks Claude! I found that I transcribed the average equation incorrectly. It happens to the best of us. Now on to look at the data. It seems that some weather stations are always below freezing. I guess you get that in Alaska.

Wednesday, August 5, 2015

NOAA's data is difficult.

When it comes to things like climate change, I'm pretty much a pragmatists. Following the black body model, we are removing energy integrators (ie: green space) and replacing it with concrete, combined with releasing energy that was stored, logic dictates that the entropy of the system will increase, and thereby the temperature will increase because the earth is not expanding much.

I started looking at weather data from NOAA when another physics friend said "wouldn't it be nice if we could repair missing data like is done with the weather?". In my field, data is what it is, and missing data is missing data. If you publish something and you extrapolate a line, you make sure that you mention how it was calculated. Usually, if we find data missing, we just take the data again. What I decided to do seemed simple:
1) Take the data from NOAA that is modified
2) Take the data from NOAA that is raw
3) Look at the difference and see if I could back extract the variables for the "Pairwise Homogenization Algorithm"

...and then you look at NOAA's data and you start to believe that there is a conspiracy. Even if the data is quality, the released datasets are terribly formatted, or seems to be just incorrect. Here's an example:
USH00011084 1897   734  3  1292  3  1972  3  1786  3  2084  3  2761  3  2753  3  2547  3  2406  3  1878  3 -9999    -9999   
USH00011084 1900 -9999    -9999     1337a 3  1936  3  2378  3  2589  3  2770  3  2872  3  2700  3  2320  3  1486 3  1100  3
USH00011084 1926 -9999     1245     1251a    1781     2240     2654     2712     2763c    2770     2110     1256a    1421   
USH00011084 1927  1209     1821     1651     2183     2467     2707     2730     2594a    2579     2081     1907      871f 3
USH00011084 1928   800b    1135     1614     1711     2218     2596     2829     2817    -9999    -9999    -9999    -9999 

Here's an excerpt from the tavg dataset, where one would expect the average to be a positive number:
USH00017157 1940  -287      502     1059     1502     1822     2368     2447     2616     2161     1695     1035      897
In the line above, -287 represents a -2.87 degree average at the USH00017157 weather station in 1940. How can something defined as (tmax-tmin)/2 be negative? To NOAA's credit, the have good documentation for this formatting disaster in readme.txt.

Before I do anything else, I'm going to ask NOAA about this data.

Saturday, August 1, 2015

VREF at 1.25v?

The MAX669 datasheet says that it has a bandgap reference that puts out 1.25V. I cannot image how that number was actually reached, when I'd expect the bandgap output to be 1.262V. I'm sure it's a great part, but I'm reluctant to use it just because I believe a part of the datasheet to be incorrect. Most datasheets are just marketing data, but there's really not much room for fudging a bandgap reference output voltage.