It's clear that you have your own concept of a p-value, which is quite different from the one used by all the other people (including the proper interpretation and the usual misinterpretations).
You disagree with all the provided examples, but you have not given any concrete example of how the p-value would be used in a "well-formed experiment" (another concept that seems unique to you).
Of course you're free to redefine concepts as you please, if it makes you happy or it is useful to you in any other way.
I have redefined nothing. Go pull the the Higgs data; it will be as I say. Go read how to form experiments and calculate p-values, nothing will be substantially different than what I have said here.
I already explained a few messages ago that the null hypothesis was indeed that there is no Higgs boson and the peak observed in the LHC data is just noise. They made their analysis and rejected the null hypothesis (p-value less than 0.000001).
"If you correctly measure a p-value of 0.05, that measurement explicitly means that you expect that 95% of your future observations to be consistent with the hypothesis which you used to determine that p-value of 0.05."
Did they correctly measure a p-value of 0.000001? They (and everyone else, apart from maybe you) think that they did.
Do you think they expect 95% of their future observations to be consistent with the null hypothesis (that they used to determine that p-value)?
I would say they were quite confident that the observed peak was not noise, and therefore they expected the signal to be there again if the experiment was to be repeated, rejecting again the null hypothesis. Which is why they announced they had discovered the Higgs boson. But maybe you can convince the Swedish Academy of Science to take Higgs' prize back...
Look at Figures 8 and 9. They show the p-value (at whatever level of data was collected when the paper was written) over the parameter space that is being searched. You can see that the values observed have a clear separation -- most are close to 1 (null-hypothesis holds) with just one significant dip towards 0 (null-hypothesis doesn't hold). If you were to animate this graph with the p-values over time (as more observations are made), you would see the trend towards 0 or 1 much clearer.
The Boson experimenters would expect a (1 - p) reproduction rate for the next observation made (if the null-hypothesis holds). That is, the next observation has a p probability of fitting within the parameters of the null-hypothesis and (1-p) probability that it is inconsistent with the null-hypothesis. Why would they expect that? Because the math involved in telling you whether or not that is what you should expect is exactly what p-value calculates (again, assuming a well-formed experiement -- which the Higgs experiments probably are).
But again, when the null-hypothesis doesn't hold, p-value tells you very little (it's actually undefined in the math).
> But again, when the null-hypothesis doesn't hold, p-value tells you very little (it's actually undefined in the math).
The p-value is well defined whether the null hypothesis holds or not. You calculate it assuming it does. There you go, you have a properly calculated p-value. That's what physicists do:
"Taking into account the entire mass range of the search, 110– 600 GeV, the global significance of the excess is 5.1 σ, which corresponds to p0 = 1.7 × 10−7."
You see, they have calculated a p-value. Does the null hypothesis hold? I don't think they had any expectations consistent with the null hypothesis being true before the experiment. After the experiment they clearly think that the null hypothesis is false:
"These results provide conclusive evidence for the discovery of a new particle with mass 126.0 ± 0.4 (stat) ± 0.4 (sys) GeV."
They don't see any problem in stating a p-value and rejecting the null hypothesis at the same time (in fact, it's because the p-value that they calculated is very small that they conclude that the null hypothesis doesn't hold). Apparently you see a problem, because if the Higgs boson exists and produces the signal in the experiment then the null hypothesis is false and all the p-value calculations they did to get to that conclusion are "wrong".
Anyway, I have no need to convince you of anything. I can live with people being wrong on the internet.
Well, before you go, I implore you to look into the actual computation and theory of 'p-value'.
A p-value is simply P(X|H). P(X|H) only means something when H is true. If H is false, P(X|H) tells you nothing. Since H is your null-hypothesis, if it does not actually hold in the real-world, P(X|H) is meaningless.
If you read the paper I linked, they never explicitly call out the null hypothesis (nor do, I believe, they show the work for their calculations). There should be another paper somewhere that describes exactly what it is, in the terms I am using. So, phrases like, "[t]hey don't see any problem in stating a p-value and rejecting the null hypothesis" make me think you have no idea what you're talking about.
The null hypothesis can never be 'rejected' (ie. p-value can never reach 0). I don't think you will find anyone working on the Higgs boson that will claim otherwise.
I think we agree that their null hypothesis is "there is a background, with events coming from all the known particles". I think we agree that their conclusion is "these results provide conclusive evidence for the discovery of a new particle". I don't see how can they say that there is a new particle without rejecting the hypothesis that there is no such new particle. Of course you can say that the null hypothesis can never be rejected (relevant Dilbert strip: http://dilbert.com/strip/2001-10-25) but then they can never discover a new particle either.
Regarding p-values in general, your definition is the same I've been using all along. But I don't think it is meaningless when the null hypothesis does not hold. The meaning is clear: "the probability of getting a value for the statistic as high as the observed one if the null hypothesis was true". For example, there would be one chance in several millions of observing the kind of data they found at the LHC if the Higgs boson didn't exist.
You might want to look into the theory yourself, because the notion of p-values trending towards 1 if the null hypothesis is true is nonsense. By definition, if the null hypothesis is true the p-value is uniformly distributed between 0 and 1. If you have at some point a p-value close to one (or to any other number for that matter) and keep adding data, in the long run it will still be uniformly distributed between 0 and 1.
If the null-hypothesis is true, every observation made should be consistent with it. This will result in P(X|H) trending to 1 (since there will be experimental variance). No other experimental design makes sense.
>The meaning [of p-value when H is false] is clear: "the probability of getting a value for the statistic as high as the observed one if the null hypothesis was true"
This is a logical fallacy. It is counterfactual to consider a world where the null hypothesis is true, when it is not.
In fact, this is precisely the feature of the universe that p-value based experimentation exploits and is essentially the only way for us to gain any information about 'reality'.
>By definition, if the null hypothesis is true the p-value is uniformly distributed between 0 and 1.
I don't think so. If that were true, p-value would be entirely useless.
You definitely do not know what a p-value is. When you wrote "P(X|H)" I though X was shorthand for T>T(X) where T is the statistic, not that you were referring to the actual data X.
P(X|H) doesn't have the properties you claim, anyway. P(X|H)=1 corresponds to the case where only one outcome is possible. In non-trivial cases, the more data you add the lower this number will be.
Assume H="you have a fair coin". You throw it once: heads. P(X|H)=P(h|faircoin)=1/2 You throw it again: tails. P(X|H)=P(ht|faircoin)=1/4. You throw it again: tails. P(X|H)=P(htt|faircoin)=1/8. I guess the experiment is not well-formed...
Ok, so you're thinking about a random variable which converges to some value when the null hypothesis is true. This is fine, but it has nothing to do whatsoever with p-values.
Let me say that your notation is not very appropriate. It makes no sense to say that P(X|H) converges to 1. If you expect X to converge to C if the null hypothesis is true, you can simply say X->C. A proper notation involving probabilities would be P(|X-C|>epsilon)->0 for any positive epsilon (convergence in probability) or maybe P(X->C)=1 (convergence almost surely).
Taking as you suggest X=(#tails/#heads), you expect that X->1 if the coin is fair (I'm not sure why you find this is not a well-defined null hypothesis, but I don't really care). However, P(X)<1 for every X. In fact, P(X=1)->0 as the number if trials increases (X will get closer to 1 on average, but getting exactly 1 will get more and more unlikely).
As I said, you're free to prefer your converging statistics and your well-defined null hypothesis. But you should be aware that people are talking about something completely different when discussing things like the 1e-7 p-value in the Higgs boson discovery or the reproducibility of statistically significant results.
EDIT: Another example, maybe better-defined: a random variable distributed (under the null hypothesis) x~Normal(mu=0,sigma=1). Let's say you take N samples (I let you choose the number, so I don't pick one which is not good enough).The statistic is the mean X=(x_1+x_2+..+x_N)/N. If the null hypothesis is true, X->mu=0. You get X=1/sqrt(N). What's your "p-value" in that case?
>Ok, so you're thinking about a random variable which converges to some value when the null hypothesis is true. This is fine, but it has nothing to do whatsoever with p-values.
Well, the variable itself doesn't, just the observed value of P(X|H). X can be any random variable, but typically it will need to be transformed to have a normal distribution about 0 with a standard deviation of 1 (since this is what the typical null-hypothesis predicts).
To effectively use p-value analysis, it is typically assumed that your null-hypothesis predicts that your observations will be normally distributed with a mean of 0 and a standard deviation of 1. The total count of heads observed will not be distributed that way. Neither will the probability of a particular sequence (what your example seemed to be calculating). I say your null hypothesis is not well-defined because the term 'fair' remains undefined (though we could guess at the meaning) and in fact makes no predictions about the world. You need to apply transformations to your random variable so that it will appear normally distributed about 0 with a standard deviation of 1 if the hypothesis is true.
>Let me say that your notation is not very appropriate. It makes no sense to say that P(X|H) converges to 1.
My notation is perfectly appropriate. X is a random variable and a random variable is the only thing that can go there (if you are doing p-value analysis). X is not assumed to be uniform or simple (although it certainly could be). P(T>T(X)|H) can be replaced with P(Y|H) every time (Y = T>T(X)).
>As I said, you're free to prefer your converging statistics and your well-defined null hypothesis. But you should be aware that people are talking about something completely different when discussing things like the 1e-7 p-value in the Higgs boson discovery or the reproducibility of statistically significant results.
I'm glad that we finally agree on this (although I dispute that anyone working on the Higgs boson discovery disagrees with me). One of my first claims was that others may not be calculating true p-values, but may calculate something and call it 'p-value' and then think that it means something it does not. In fact, this entire topic even links to an article in a prominent publisher claiming the same.
Do you think it is purely coincidental that the figures I showed you from the Higgs experiment show the lines converging towards only two different numbers: 1 and 0?
Edit: You'll have to give me some time on your edit. It's not something I typically calculate and I have other business to attend to today.
Ok, so maybe your definition does correspond to a p-value after all. It's hard to say as you have refused to discuss concrete cases (like the fair coin or the loaded die, which are standard examples to introduce p-values). But if you're actually calculating a p-value then it won't behave as you expect. It won't converge to anything (edit: if the null hypothesis holds). P-values are by definition uniformly distributed when the null hypothesis is true. If your "p-value" is not, then it's not a p-value. It really is that simple. Or maybe everyone else is using the wrong "p-values" and yours are the real thing. You can believe it if you want.
Please disregard my previous questions, I see no point in continuing this discussion. But you might want to read a bit more about p-values: you won't find anyone (I hope!) sharing your point of view. Once you understand what the p-value is, and what it is not, you might indeed conclude that they are entirely useless. Of course it's your right to avoid learning what p-values really are, and keep the faith. It's your choice.
Do you think it's purely coincidental that this this figure https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATL... includes the sigma=0 (null hypothesis) line at 0.5 and not at 1? (Hint: the expect value of the p-value under the null hypothesis is 0.5.) (That's a rhetorical question: I already know it's because this is not a well-formed experiment or something.)
>P-values are by definition uniformly distributed when the null hypothesis is true.
Where are you getting this from? When the null hypothesis is true, the p-value should be 1. This follows directly from the definition. If the p-value is not 1 and the null hypothesis is in fact true, your experiment or calculations are wrong. You might also just have the wrong null hypothesis (eg. sensors have more noise than assumed).
I'm willing to accept that in some designs, the first observation of p-value may be uniformly distributed over (0,1], but, as additional observations are made, the value should converge to 0 or 1. What would be the purpose, or usefulness of p-value being uniformly distributed if the null-hypothesis is true? It's much simpler to design things to converge to a single number.
Edit: I have also considered that p-value could be uniformly distributed if the null-hypothesis is false (where you claimed true). I don't know the answer to that.
> If the null-hypothesis is true, every observation made should be consistent with it. This will result in P(X|H) trending to 1
Every observation being consistent with H doesn't mean that for each event X that occurs, the conditional probability of X given H will be, or "trend toward", 1. Assuming a perfectly deterministic universe, the P(X|everything else that is true) will be 1 for every X that occurs, but that doesn't mean P(X|H) for any particular true proposition H will be anything like that.
>Every observation being consistent with H doesn't mean that for each event X that occurs, the conditional probability of X given H will be, or "trend toward", 1.
Agreed. If you plot P(X|H) (computed over all observations) over time, in a well-formed experiment the line will trend to 1 if the null-hypothesis is true and to 0 if the null-hypothesis is false.
> A p-value is simply P(X|H). P(X|H) only means something when H is true. If H is false, P(X|H) tells you nothing.
If you know H is false, P(X|H) tells you nothing. But then H wouldn't be a hypothesis, null or otherwise.
If you don't know whether H is true, but you do know something about X, P(X|H) tells you something useful about whether the positive hypothesis to which H is the alternative has an effect apparent in the world to explain.
> The null hypothesis can never be 'rejected' (ie. p-value can never reach 0).
Rejection of the null hypothesis does not mean p-value = 0. Scientific progress is not based on logical certainty, but rather practical utility.
Necessary truths are the domain of pure logic, not empirical science.
What is the interpretation of a p-value = 0 then? Empirical science can never reject any theory, it is not powerful enough. At best it can provide a selection of least worst explanations.
The H in P(X|H) does not mean 'assumed to be true', it means 'is in fact true'. If H is in fact false, it is counter-factual to assume it is true and therefore any conclusions drawn from the assumption are invalid. This is independent of belief in H.
>Necessary truths are the domain of pure logic, not empirical science.
Science can never deliver truth, which is why it can never truly reject anything (including null-hypotheses).
More generally, this is referred to as the problem of induction.
You disagree with all the provided examples, but you have not given any concrete example of how the p-value would be used in a "well-formed experiment" (another concept that seems unique to you).
Of course you're free to redefine concepts as you please, if it makes you happy or it is useful to you in any other way.