Sunday, August 23, 2015
When in doubt, ask a mathematician.
As I have said in my previous posts, NOAA probably isn't part of a conspiracy, but they sure try to make you think they are. I have some major issues with the "Pairwise Homogenization Algorithm" as it creates residuals, which I do not believe are representative of actual changes. I'm going to table this until I can spend time with my math buddy.
Friday, August 21, 2015
NOAA gives you data that you cannot use.
NOAA releases data, and example tools to process things, but they do not release data that you can use. Check this note from the "Pairwise Homogenization Algorithm" software readme:
I have the Pairwise Homogenization Algorithm setup in MATLAB, and I can make it do whatever I want to the data, including make it give me totally erroneous data.
I actually emailed NOAA, and they didn't get back to me because I'm asking questions that they already have decided everyone in "the know" should know. It's always nice to get an official answer.
Here are all of the reasons that you might need to tune a dataset:
Changes in type of equipment.
Changes in region of the thermometer (You get an urban heat island, for example).
A station has consistently given bad data.
A station has been moved.
After reading through the papers that I could find, I believe that what NOAA has done is make a self-correlating time series. You can use the Pairwise Homogenization Algorithm to create a series that creates the same anomalies in time that you are trying to avoid. What I really want to know is why NOAA does not release a useful, monthly, dataset. This is the sort of thing that makes me shake my head at the soft sciences. They might be right, or wrong, but they are definitely sloppy by the standards of engineering.
Where does this leave me? I'm going to look at the actual data, remove anything that is not complete, and then plot it. Even if there are changes due to urbanization, equipment changes, and bad data: it is real data. If there is heating due to urbanization, that's just part of the heat, and that should show an increased trend in heating over years.
The new temperature.sh file is in SVN as revision 6.
# A New Dataset must have at least a Station List and a Data Files. At a minimum, the # Station List defines the Station ID/Latitude/Longitude. The Data Files (one for # each station/element - see data/benchmark/world1/monthly/raw) defines all # of the Monthly Temperatures in Annual Records. All files are in the GHCNMv3 format.Now, the data that is released is in the form of years, so you cannot use the software they supply to recreate their data.
I have the Pairwise Homogenization Algorithm setup in MATLAB, and I can make it do whatever I want to the data, including make it give me totally erroneous data.
I actually emailed NOAA, and they didn't get back to me because I'm asking questions that they already have decided everyone in "the know" should know. It's always nice to get an official answer.
--mail-- To whom it may concern, I am looking for sources for the weights for the Pairwise Homogenization Algorithm that is used in the NOAA data reported in ushcn.tavg.latest.FLs.52j.tar.gz. I have MATLAB scripts setup to analyze the data, but I’ve been having trouble. I believe that I cannot reproduce the data because I am missing the monthly entries, and the exported data is yearly. Any guidance or references would be appreciated. Also, there are about 20% fewer weather stations in 2015 than in 1990. Do you have a reference for this as well? I'm just curious why there are so many fewer stations. --end mail--After reading a bunch of papers, I've decided that things, such as infilling, are pretty much magic that should be disregarded when you can actually have a meaningful dataset.
Here are all of the reasons that you might need to tune a dataset:
Changes in type of equipment.
Changes in region of the thermometer (You get an urban heat island, for example).
A station has consistently given bad data.
A station has been moved.
After reading through the papers that I could find, I believe that what NOAA has done is make a self-correlating time series. You can use the Pairwise Homogenization Algorithm to create a series that creates the same anomalies in time that you are trying to avoid. What I really want to know is why NOAA does not release a useful, monthly, dataset. This is the sort of thing that makes me shake my head at the soft sciences. They might be right, or wrong, but they are definitely sloppy by the standards of engineering.
Where does this leave me? I'm going to look at the actual data, remove anything that is not complete, and then plot it. Even if there are changes due to urbanization, equipment changes, and bad data: it is real data. If there is heating due to urbanization, that's just part of the heat, and that should show an increased trend in heating over years.
The new temperature.sh file is in SVN as revision 6.
Tuesday, August 18, 2015
So, where did the weather stations go and does it matter?
In an attempt to determine if it actually matters what happened to the weather stations, I looked at the data for the last year (2015) and tried to match up the station locations over previous years. The stations in 2015 are largely in the previous year's stations. I plotted the ratio of available stations per year when compared to 2015.
What this means is that the weather stations in 2015 are largely available in previous years, which means that they can be used to make a real comparison. I would still love to know why so many stations were removed between 1990 and 2015.
I also found a free SVN site to post my scripts: https://subversion.assembla.com/svn/noaa This post can be recreated by checking out revision 5 and running "runme.m" via MATLAB or OCTAVE after getting the data via temperature.sh
I also found a free SVN site to post my scripts: https://subversion.assembla.com/svn/noaa This post can be recreated by checking out revision 5 and running "runme.m" via MATLAB or OCTAVE after getting the data via temperature.sh
Monday, August 17, 2015
NOAA's data. 20% fewer data points?
I'm still plugging away at NOAAs data. I have not yet come to a point that I'm happy with the data yet, but that is mostly because it's a terrible export. It is obviously from a database, but the format is indefensible. My respect for data in the soft sciences has waned considerably.
I have come across something that is odd, but not necessarily wrong. The number of weather stations trends up, and then down. I'm not sure what to think about this yet, but there are about 20% fewer stations today than in 1990.
Please excuse the huge graphs, but it's clearer this way. Why would one remove stations if the data is valid?
I have come across something that is odd, but not necessarily wrong. The number of weather stations trends up, and then down. I'm not sure what to think about this yet, but there are about 20% fewer stations today than in 1990.
Please excuse the huge graphs, but it's clearer this way. Why would one remove stations if the data is valid?
Thursday, August 6, 2015
NOAA's data, get it with this script.
NOAA's weather data, as I previously mentioned, is a bit difficult. It seems to have no coherent format, and it is obviously a dump from a database. In order to try to make it just a bit better, I used a bash script to download, and format the data:
temperature.sh
The script downloads the raw files to: ./temperature/downloads, and then tries to create CSV files in ./temperature/data/extracted
I've just started looking at the data, but wow... soft sciences.
The most jarring thing is that you will have data for a weather station given for the year. I'm sure that the internal database has monthly information because the year is given with an offset of missing months. So, the number 823b means that the temperature was 8.23 degrees C with 2 months missing. The letter denotes how many months are missing.
I will revisit this data when I feel that I can handle drudging through it.
temperature.sh
The script downloads the raw files to: ./temperature/downloads, and then tries to create CSV files in ./temperature/data/extracted
I've just started looking at the data, but wow... soft sciences.
The most jarring thing is that you will have data for a weather station given for the year. I'm sure that the internal database has monthly information because the year is given with an offset of missing months. So, the number 823b means that the temperature was 8.23 degrees C with 2 months missing. The letter denotes how many months are missing.
I will revisit this data when I feel that I can handle drudging through it.
I heard back from NOAA
I had a typo:
tavg = (tmax + tmin)/2 ClaudeThanks Claude! I found that I transcribed the average equation incorrectly. It happens to the best of us. Now on to look at the data. It seems that some weather stations are always below freezing. I guess you get that in Alaska.
Wednesday, August 5, 2015
NOAA's data is difficult.
When it comes to things like climate change, I'm pretty much a pragmatists. Following the black body model, we are removing energy integrators (ie: green space) and replacing it with concrete, combined with releasing energy that was stored, logic dictates that the entropy of the system will increase, and thereby the temperature will increase because the earth is not expanding much.
I started looking at weather data from NOAA when another physics friend said "wouldn't it be nice if we could repair missing data like is done with the weather?". In my field, data is what it is, and missing data is missing data. If you publish something and you extrapolate a line, you make sure that you mention how it was calculated. Usually, if we find data missing, we just take the data again. What I decided to do seemed simple:
1) Take the data from NOAA that is modified
2) Take the data from NOAA that is raw
3) Look at the difference and see if I could back extract the variables for the "Pairwise Homogenization Algorithm"
...and then you look at NOAA's data and you start to believe that there is a conspiracy. Even if the data is quality, the released datasets are terribly formatted, or seems to be just incorrect. Here's an example:
Here's an excerpt from the tavg dataset, where one would expect the average to be a positive number:
Before I do anything else, I'm going to ask NOAA about this data.
I started looking at weather data from NOAA when another physics friend said "wouldn't it be nice if we could repair missing data like is done with the weather?". In my field, data is what it is, and missing data is missing data. If you publish something and you extrapolate a line, you make sure that you mention how it was calculated. Usually, if we find data missing, we just take the data again. What I decided to do seemed simple:
1) Take the data from NOAA that is modified
2) Take the data from NOAA that is raw
3) Look at the difference and see if I could back extract the variables for the "Pairwise Homogenization Algorithm"
...and then you look at NOAA's data and you start to believe that there is a conspiracy. Even if the data is quality, the released datasets are terribly formatted, or seems to be just incorrect. Here's an example:
USH00011084 1897 734 3 1292 3 1972 3 1786 3 2084 3 2761 3 2753 3 2547 3 2406 3 1878 3 -9999 -9999 USH00011084 1900 -9999 -9999 1337a 3 1936 3 2378 3 2589 3 2770 3 2872 3 2700 3 2320 3 1486 3 1100 3 USH00011084 1926 -9999 1245 1251a 1781 2240 2654 2712 2763c 2770 2110 1256a 1421 USH00011084 1927 1209 1821 1651 2183 2467 2707 2730 2594a 2579 2081 1907 871f 3 USH00011084 1928 800b 1135 1614 1711 2218 2596 2829 2817 -9999 -9999 -9999 -9999
Here's an excerpt from the tavg dataset, where one would expect the average to be a positive number:
USH00017157 1940 -287 502 1059 1502 1822 2368 2447 2616 2161 1695 1035 897In the line above, -287 represents a -2.87 degree average at the USH00017157 weather station in 1940. How can something defined as (tmax-tmin)/2 be negative? To NOAA's credit, the have good documentation for this formatting disaster in readme.txt.
Before I do anything else, I'm going to ask NOAA about this data.
Saturday, August 1, 2015
VREF at 1.25v?
The MAX669 datasheet says that it has a bandgap reference that puts out 1.25V. I cannot image how that number was actually reached, when I'd expect the bandgap output to be 1.262V. I'm sure it's a great part, but I'm reluctant to use it just because I believe a part of the datasheet to be incorrect. Most datasheets are just marketing data, but there's really not much room for fudging a bandgap reference output voltage.
Subscribe to:
Posts (Atom)