Science and Statistics – An unholy Alliance?

I came across this very interesting article the other day, written in 2010. It basically reinforces my own long held sense of unease with regard to statistical analysis. My reaction against statistics started early, in school, when first I was presented with its somewhat bizarre pseudo-mathematical methodology and nomenclature. My early rejection of the subject was more visceral and emotive rather than common sense factual and logical. It just hit a raw nerve with me somehow and all these years later, reading this article by Tom Siegfried, I begin to see perhaps why.

So let me begin by quoting a few passages from Siegfried’s text:

“During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot. ”

So it’s not statistics itself, but the misuse of this analytical toolbox which is the problem. To this I would add over-reliance, especially evident in the field of climate science. Too often in the peer reviewed climate science literature we find papers which base their conclusions almost totally on the results of some new statistical analysis/re-analysis of existing data. In order to fully appreciate what they are saying and, more importantly, in order to question what they are saying, one needs to be an expert not primarily in climate science, but in statistical analysis.

“Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing. ”

This does not inspire confidence.

“Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.”

With the increasingly pervasive use of statistical analysis in climate science, backed up by increasingly complex computer models, the above statement is magnified 10-fold in consideration of the results of the latest peer-reviewed scientific research. Much of this said research is aimed at pointing the finger at man as being responsible for the majority of post 1950 global warming, claiming also that we will continue to drive climate significantly into the future. Yet much of it is based upon statistical reanalysis of existing data.

“Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.””

A perfect illustration: the recently released paper by Marotzke and Forster. The main impetus for the paper was to address the apparent mismatch between climate models and real world observations (in particular the ‘pause’) which sceptics use to question the validity of the AGW theory. The paper concludes:

The differences between simulated and observed trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings used to drive models over the longer timescale. For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations. The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.”

So, climate models do not overestimate the response to GHG forcing, even though the CMIP5 model mean is increasingly diverging from actual recorded global mean surface temperatures (GMST) and even though almost all models clearly run ‘too hot’ when compared with actual GMSTs. Apparently, this impression is not borne out by statistically analysing the past temperature record and comparing that with the models [?] It’s opaque to me and probably a lot of other people besides. Nic Lewis thinks it is plain wrong, and says so at Climate Audit, laying out his reasons. He gave Marotzke and Forster the opportunity to reply to his concerns about their paper but they failed to respond before Nic Lewis published at Climate Audit. Instead, they have chosen to issue a rebuttal of Lewis’ rebuttal at Climate Lab Book here. I’ve no idea who will eventually be proved to be right or wrong in this kerfuffle, but I quote from statistical expert Gordon Hughes (Edinburgh University), being one of two people whom Nic Lewis asked to review his conclusions about M & F, 2015:

“The statistical methods used in the paper are so bad as to merit use in a class on how not to do applied statistics.

All this paper demonstrates is that climate scientists should take some basic courses in statistics and Nature should get some competent referees.”

 

The wider point here is that we have yet another paper which relies almost exclusively upon statistical methodology to draw conclusions about the real world – another paper which may have to be withdrawn. Science – and climate science in particular – is suffering from the all too pervasive influence of staistics. There is a place for statistics in the analysis of real world data and even I must (reluctantly) acknowledge this. However, science has, as Tom Siegfried points out, become “seduced” by the false promise of this “mutant” form of mathematics and is suffering from its misuse and its overuse.

3 comments

  1. We are living in an age of multiple “Truths”. M&K ruths for “Computational” truth. The math they do works. The results are 100% correct. Nic Lewis stands for “representational” Truth. He looks for ideas that explain in a causal and predictive way. M & K are speaking to what is sufficient while Lewis says necessary and sufficient are requirements for reasonable people faced with unreasonable demands in a resource limited world.

    The climate wars are fought over a word that has four variants: truth. Besides computational and rerepresentational, we have ideological and emotional. Greenpeace fights for ideological truth: humans are destroying the purity of Nature. Each man made CO2 molecule is a grievous assault on the Earth. Hansen fights for emotional truth: his grandchildren will inherit a world more human impacted than natural. All four “truths” are correct within their own view. There is no common ground. But the fighters of all four positions act as if they come from only one: the representational. Why? The Enlightenment, which said that the use of reason is the only legitimate way for men to prove that they are the product of a God worthy of His Name, and not just the rutting of beasts in the field.

    In this Ball there are many revelers but only one mask. It is essential for all of us to look at the face behind the mask before we vote on which will be the King of the Ball.

    Like

  2. All mathematical equations or tools need to be used with understanding. Mathematical proofs always start with a list of assumptions and most end with a list of “shouldn’t be used for X, y, z”. Even number theory still has problems – obscure which won’t disrupt the normal use of numbers.

    Like you I’ve always had reactions against statistics so spent much of my 2nd year at University (Maths major) plaguing my stats tutor with ‘proofs’ it wasn’t ‘proper maths’. 47 years later, I can still remember his exasperated sighs of “but it works…” and glum face.

    Statistics starts with using probabilities for independent events acting on independent variables at indetermined times e.g. tossing a perfectly balanced coin with identical tosses over and over again. You can probably see the most obvious problem for surface temperatures – they’re time-dependent. There are ways around it, and theories about the best approaches to use. But I shake my head in disbelief when I responses from climate scientists along the lines of “We’ve used it before, and it turned out all right”. Does this mean they predicted something that happened? Or that predicted what they expected to find? Or that everyone else thinks it’s OK? And so on.

    So, the 1st 2 things stats users need to do is (i) ensure variables are independent and (ii) arrange to take samples to avoid the events and variables having any commonality.

    And secondly, always remember correlation says absolutely zilch about causation.

    I like the example of carbon-dioxide. We know that a warming sea will transpire carbon-dioxide to the atmosphere, therefore we expect to find a correlation between increases of temperature and carbon-dioxide – with a time-lag (CO2 later than Temp) of indeterminate size. We do, over the long-term.

    Like

Leave a comment