I recently started reading Deborah G. Mayo’s new book Statistical Inference as Severe Testing. So far, so good. It’s thought-provoking, and I wholeheartedly agree with her that it’s important for researchers to understand the philosophical foundations of statistics (and of science in general). I also happen to think that the logic of test severity is important and worth understanding.

Perhaps I will write more about my agreements with her arguments this book at a later date, but today I am writing about what seems to me to be a pretty obvious error in an early chapter. Or maybe it’s not an error, but, rather, a little bit of logical and mathematical sleight of hand.

On pages 50 and 51, Mayo is discussing the likelihood principle (LP). She writes:

… Royall, who obeys the LP, speaks of "the irrelevance of the sample space" once the data are in hand. It’s not so obvious what’s meant. To explain, consider Jay Kadane: "Significance testing violates the Likelihood Principle, which states that, having observed the data, inference must rely only on what happened, and not on what might have happened but did not" (Kadane 2011, p. 439). According to Kadane, the probability statement \Pr(|d(\mathbf{X})|\geq 1.96)=0.05 "is a statement about d(\mathbf{X}) before it is observed. After it is observed, the event {d(\mathbf{X})>1.96} either happened or did not happen and hence has probability either one or zero" (ibid.).

Knowing d(\mathbf{X})=1.96, Kadane is saying there’s no more uncertainty about it. But would he really give it probability 1? That’s generally thought to invite the problem of "known (or old) evidence" made famous by Clark Glymour (1980). If the probability of the data \mathbf{x} is 1, Glymour argues, then \Pr(\mathbf{x}|H) also is 1, but the \Pr(H|\mathbf{x})=\Pr(H)\Pr(\mathbf{x}|H)/\Pr(\mathbf{x})=\Pr(H), so there is no boost in probability given \mathbf{x}. So does that mean known data don’t supply evidence? Surely not. …

From a quick glance at the linked Glymour chapter (specifically, p. 86), it’s not clear to me that this is actually an example of his "old evidence" problem. Glymour seems to be concerned with whether or not information that is known before a theory is developed can count as evidence for that theory under a Bayesian confirmation model of evidence (and, for what it’s worth, while looking for a relevant Glymour link, I found plenty of other people who are concerned with addressing his "old evidence" argument, e.g., van Fraassen, Howson, and whoever wrote this). By way of contrast, Mayo seems to conflate the temporal relationship between putative evidence and theory development, on the one hand, and the presence or absence of randomness in an observed test statistic, on the other.

Mayo also conflates the event {d(\mathbf{X})>1.96} and the likelihood \Pr(\mathbf{x}|H). She steps quickly from the former to d(\mathbf{X}=1.96) to "the probability of the data \mathbf{x}" to the likelihood \Pr(\mathbf{x}|H). (This is what feels like sleight of hand to me.) But even if we accept Kadane’s argument that {d(\mathbf{X})>1.96} has probability zero or one once we’ve observed the data, this doesn’t imply that the likelihood takes the same value. The probability of the event that a test statistic is more extreme than some criterion is not equivalent to the likelihood of the data conditional on some hypothesis, a point that is not lost on Mayo elsewhere, even earlier in this same chapter.

To be clear, I like Mayo’s (and Popper’s, and probably various other’s) notion of severity, the basic idea of which is that the result of a test doesn’t provide corroboration of a hypothesis unless it would have (probably) provided refutation if the hypothesis were false. I recently read Theory and Reality, in which similar notions are discussed, and I am convinced that it’s a crucial component of science in general and statistical analysis in particular. I’m still working out my thoughts on how this fits with my (mostly skeptical) attitude towards a Bayesian confirmation model of evidence and the likelihoodist school of statistical inference.

But regardless of where I end up with respect to all this, it does no one any good to (deliberately or accidentally) marshal such clearly flawed arguments.