Homogenize This

MSimon · Post by **MSimon** » Thu Jan 21, 2010 3:38 am

http://wattsupwiththat.com/2009/10/13/h ... ture-data/

Take this test yourself to see how bad a shape the global data base is. Look for yourself following these directions using the window into the NOAA GHCN data provided by NASA GISS here.

Point to any location on the world map. You will see a list of stations and approximate populations. Locations with less than 10,000 are assumed to be rural (even though Oke has shown that even a town of 1,000 can have an urban warming of 2.2C).

You will see that the stations have a highly variable range of years with data.

Try and find a few stations with data that extends to 2009. To see how complete the data set is for that station, click in the bottom left of the graph Download monthly data as text.

For many, many stations, you will see the data set in a monthly tabular form has many missing data months mostly after 1990 (designated by 999.9).

Josh Cryer · Post by **Josh Cryer** » Thu Jan 21, 2010 4:10 am

Joseph D’Aleo cannot be trusted. He makes up allegations and doesn't retract them.

In all seriousness, GHCN doesn't fill in for missing data. Gaps are to be expected. I'm sure D'Aleo has nothing to add to the methodology so his comments are irrelevant.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 5:09 am

Josh Cryer wrote:Joseph D’Aleo cannot be trusted. He makes up allegations and doesn't retract them. :wink:

In all seriousness, GHCN doesn't fill in for missing data. Gaps are to be expected. I'm sure D'Aleo has nothing to add to the methodology so his comments are irrelevant.

I believe the map is from an official government source.

And exactly what has D'Aleo said that you think needs retraction? Are you confusing it with something I said that DID need retraction? I retracted it.

Josh Cryer · Post by **Josh Cryer** » Thu Jan 21, 2010 5:55 am

MSimon wrote:And exactly what has D'Aleo said that you think needs retraction?

That raw data was manipulated. A very strong allegation that people would lose careers over.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 7:09 am

Josh Cryer wrote:
MSimon wrote:And exactly what has D'Aleo said that you think needs retraction?
That raw data was manipulated. A very strong allegation that people would lose careers over.

Early days yet. A number of people I trust have looked at the data and think it has been manipulated.

You are qualified to refute the charge. Gather your troops and do a study.

You have already found things you are not happy with. So I'm not yet convinced the charge of manipulation is in error. We already know of one example. (the manipulations done to create the hockey stick) so I'd say it is up to you to prove your case.

If you do I will change my mind. And announce it.

Then we can look at the adjustments and homogenization. And then the computer code. Which Ian (Harry?) has said in the e-mails was rather dodgy. He was not happy with the data either. And one of the cabal said in the e-mails he was not giving the data to McIntyre just so he could show errors. And indication that he knew of errors. Or even manipulation.

But I'm not going to take the word of the Hide The Decline cabal for any past or future work.

Josh Cryer · Post by **Josh Cryer** » Thu Jan 21, 2010 7:47 am

MSimon,

Early days yet. A number of people I trust have looked at the data and think it has been manipulated.

The raw data has not been manipulated. D'Aleo and every site that talks about raw data manipulation needs to retract it, else they are spreading lies and disinformation.

The adjustments to that raw data are based on peer reviewed suggestions. I am going to eventually get to reproducing those. If I have issues with them I will surely make note of it. But that's a ways down the line (software takes quite some time to make).

Note that I am not looking at any code whatsoever to reproduce the data sets. I am doing essentially a black box analysis with the data. I want to read the data and read the papers as to not be biased by the source code. This makes the job far harder to accomplish since it means my own interpretation of how a given thing should be done. e.g, my software will have options to round .5 to even and up and away from zero.

You are qualified to refute the charge. Gather your troops and do a study.

In due time, but first I must have references to show people how the raw data is separate from climate science (because it comes from meteorologists as opposed to climatologists), so that any further analysis is bolstered by the understanding that I'm not cooking the data nor anyone else.

It may turn out that I of course blow a cover on the whole thing, but I am finding that unlikely.

You have already found things you are not happy with. So I'm not yet convinced the charge of manipulation is in error.

The charge was not manipulation of data, the charge was manipulation of *raw* data. That is a major allegation. They're obviously working with the data, but they are doing so under the guise of analysis. The question for you then should become whether or not the operations they take on the data are suitable. Globally GHCN claims homogeneity attempts make little difference to the data, but that they show up more so on local levels. We'll test that, too.

There's so much that can be tested, but no one seems to be doing it. You wanted an open source method, you got one. But I guarantee you that I will look at the data critically. The rounding thing only proves that. Rounding half up stood out to me because as I'm writing the software to reproduce USHCN/GHCN from NCDC data, that is a software tactic I would never utilize. I was taught to round to even or to add a random number and round that. We'll see if it makes a difference, I'm not a statistician, so I don't know. This is where my software engineering skills come in, because I just do what I know is right, and I may not even be sure why it is right. I admit I hate statistics.

We already know of one example. (the manipulations done to create the hockey stick) so I'd say it is up to you to prove your case.

I don't understand why you keep calling scientific analysis manipulation.

In noise reduction you're manipulating the data, are you not? Using basic sound scientific principles, right?

If you do I will change my mind. And announce it.

I know you will, but I'm not really doing it for you, it's just a spare time thing for my own understanding. Plus I've not written code in awhile and I need the practice.

And one of the cabal said in the e-mails he was not giving the data to McIntyre just so he could show errors.

There's a difference between software and programs. Software must be effectively bug free, or at least, good enough so that the end user doesn't know or care. Programs, however, just have to run, and just have to perform the job they are supposed to do.

Programs have far more bugs, in that vein. My parsing software will certainly break if the wrong CSV format is introduced, for example. So you work as you go with relative trial and error.

This is why I am using ML rather than C, to help reduce to an extent the errors that will assuredly exist.

And indication that he knew of errors. Or even manipulation.

It was eventually released, and there was no hubhub over it. I personally wouldn't want to release my personal code for my personal projects because it is 1) embarrassing, and 2) buggy, with lots of hacks here and there to fix random bugs. The fact that I am going to release this code into the open will force me to do the exact opposite, make the code clean, and make sure it is bugfree. This adds many hours to the production time.

But I'm not going to take the word of the Hide The Decline cabal for any past or future work.

I do hope that I can use their papers at least.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 8:23 am

The adjustments to that raw data are based on peer reviewed suggestions.

By the climate cabal? The folks involved in Hide the Decline? The hockey stick was peer reviewed and figured prominently in several IPCC reports. It now falls in the category of junk science.

The fact that the suggestion was peer reviewed makes it prima facie suspect.

In climate science forget peer review. Forget the RC cabal. Prove it is the right thing to do. I will not accept anything else.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 8:28 am

Quote:
But I'm not going to take the word of the Hide The Decline cabal for any past or future work.
I do hope that I can use their papers at least.

Sure you can. Redo them and show your work. Then I might give the position of the paper credence. As of now it is all suspect.

Jones and Mann are being investigated for academic fraud. The whole cabal needs to be looked at. Every single paper for 30 years.

You have your work cut out for you.

Josh Cryer · Post by **Josh Cryer** » Thu Jan 21, 2010 11:17 am

MSimon,

Sure you can. Redo them and show your work. Then I might give the position of the paper credence. As of now it is all suspect.

The fact that the suggestion was peer reviewed makes it prima facie suspect.

This is a contridiction. What do you expect me to do? Simply redo the algorithms? Do you doubt that they will come out similar or even exact? I already can reproduce 9641C_200907_raw.avg for values that are in the NCDC CDO. Exactly. I know exactly what they did to produce the raw.avg dataset.

Going the next step is going to be rather difficult (especially the missing values analysis), but I don't see anywhere, so far, where the reproduction fails. It all falls back to the second claim here, you will reject it because it is based on peer review analysis. I honestly believe that.

I'd be happy if it made you believe in the science, but I'm not going to expect that as an outcome, because I'd only be disappointed if it wasn't the outcome. (I do value and respect your opinion even if I disagree with it.)

Note, what I am doing is leagues beyond ccc-gistemp, as I am not basing anything I am doing off of previous code. It's all 1) reverse engineering and 2) following the algorithms in the papers.

The reverse engineering part is necessary because no papers deal with taking NCDC data and converting it into a format usable by GISTEMP (or similar programs). This is a procedural thing and not really covered in papers (there are dozens of ways to go about it, csv, xml, yaml, whatever).

Jones and Mann are being investigated for academic fraud. The whole cabal needs to be looked at. Every single paper for 30 years.

Every single paper is regularly looked at, not just for errors, but for citations. This is how most corrections are made, because you have information you want to cite, you do your experiments, and they don't fit the cite, either the cite is wrong or the experiment is, so you have to look at both. Generally new science destroys old as new information is found. See: The Missing Carbon Sink for an example. A paper noting the missing carbon sink is overthrown by one that finds it, etc.

I think Jones may get in a little bit of trouble for his private emails, but Mann looks to be fine.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 12:01 pm

Look I have followed the whole homogenization debate.

I think it is shoddy practice.

I think the surface station records are a mess.

And I think it is FRAUD to leave stations out that they use earlier for which there are later records.

Compare like to like.

Dropping stations that are still producing data? OK. Then drop them in the whole record.

Let me show you the trick. Stations A and B are "hot". Station C is "cold".

I use all stations up to 1990. Then I drop C and invent its record from 1990 on using A and B. Station C is still reporting.

That my friend IS fraud. I don't care if it heats or cools the final record.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 12:04 pm

Mann looks to be fine.

Other than the trick he used to Hide The Decline.

Josh Cryer · Post by **Josh Cryer** » Thu Jan 21, 2010 12:16 pm

Give me a link to this station dropping allegation, I could use something to blog about.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 12:25 pm

The world’s climate data has become increasingly sparse with a big dropoff around 1990. There was also a tenfold increase in missing months around the same time. Stations (90% in the United States which has the Cadillac data system) are poor to very poorly sited and not properly adjusted for urbanization. Numerous peer review papers suggest an exaggeration of the warming by 30%, 50% or even more.

I agree with the above. Even if D'Aleo wrote it. Why drop stations out of the record that are still reporting? It is FRAUD if the stations are still reporting.

===

There are problems with using trees as thermometers. Especially if the tree temperatures are declining while real temperatures are rising.

What does that mean? Trees are not good thermometers.

===

Another problem with the way they select the trees. A selection is made of trees that match the "known" temperature record. Which means some trees follow temperature and some don't. If all the trees don't follow temperature then trees are not good thermometers.

In fact McIntyre has done some work on using ALL the trees and found no trend.

It may have been a good idea. It didn't work. Drop it.

===

And then there is the Mann algorithm (filter) that flattens the past (including the MWP) and exaggerates the present.

When fed red (sometimes also referred to as pink) noise it produces a hockey stick.

http://en.wikipedia.org/wiki/Colors_of_noise

MSimon · Post by **MSimon** » Thu Jan 21, 2010 12:26 pm

Josh Cryer wrote:Give me a link to this station dropping allegation, I could use something to blog about.

Go back to the map post. It is in there. With fookin maps. The map I posted was "after".

Edit: This is the danm map post. Go up and follow the link.

MSimon · Post by **MSimon** » Thu Jan 21, 2010 12:31 pm

The reverse engineering part is necessary because no papers deal with taking NCDC data and converting it into a format usable by GISTEMP (or similar programs). This is a procedural thing and not really covered in papers (there are dozens of ways to go about it, csv, xml, yaml, whatever).

Well that is shoddy practice isn't it? ALL methods that could affect the data MUST be disclosed or it is not science.

ALL data. ALL code. ALL results (especially adverse to the hypothesis results) must be disclosed.