If the p-values were accurate and averaged around 0.05, ~95% of results should b...

gwern · on Aug 30, 2015

No. P-values don't work that way and don't mean what you think they mean. Read OP or heck, any of the classics like "Why most published research findings are false" http://dx.plos.org/10.1371/journal.pmed.0020124

(36% may or may not be bad, but you can't know without additional stuff like power or prior probability of hypotheses being true; p-values have no intuitive meaning and aren't an answer to any question that people are asking, which is a major reason why Bayesian approaches can be useful. And from a Bayesian perspective, I find 36% totally unsurprising - if anything, substantially better than I had expected given the gross underpowering of most psych studies, the statistical-significance publication filter, and the dubiousness of most hypotheses.)

washedup · on Aug 30, 2015

I agree, 36% ain't too bad. But,it requires that any literature you use in your research should have been reproduced a few times by other researchers.

jsprogrammer · on Aug 30, 2015

A proper rebuttal would show what a p-value actually is and how it differs from what I claimed. Now, since a p-value is exactly what I previously claimed, you obviously can't do that. I'm not even sure what you are arguing against me here.

Dylan16807 · on Aug 30, 2015

The p-value is the chance of a false positive. But you don't know what the rate of true positives is, or the rate of false negatives.

In a world where there are only false positives and true negatives, and people publish all positive and negative results, then reproduction of a paper should be 95%.

But the reproduction rate when there actually is an effect is not 95%. Depending on sample size, I might get a true positive 20% of the time and a false negative 80% of the time, or I might get a true positive 99.8% of the time and a false negative .2% of the time.

So the average reproduction rate, where an effect actually exists, can be almost any number between 5 and 100. There is no reason to assume it will be 95%.

So the average reproduction rate, where some effects are real and some are imaginary, will almost certainly not be exactly 95%, and that is not a problem in and of itself.

(And when you talk about an average p-value of .05, that sounds like only publishing positive results, which is blatantly going to fail reproduction. 100 false hypotheses -> 5 publications, all false positives -> 5% reproduction rate)

jsprogrammer · on Aug 30, 2015

>In a world where there are only false positives and true negatives, and people publish all positive and negative results, then reproduction of a paper should be 95%.

This is the world p-value assumes and is therefore the only one worth considering in relation to my comment.

If an experiment is not well-formed then of course you won't see reproduction at the expected rate. This is what I'm referring to when I say that the low reproduction rate points to deep, fundamental flaws in the experiments.

I agree that the reproduction rate will never be exactly 95% (or 1 - p) due to the discrete nature of experimentation [that's why I used a ~ in front :)], but the reproduction rate of a well-formed experiment should very closely track 1 - p.

Dylan16807 · on Aug 30, 2015

>This is the world p-value assumes and is therefore the only one worth considering in relation to my comment.

I'm not sure if that was clear enough. In that world, no one has ever had a hypothesis that was correct. The whole field is useless, measuring things that are wrong and getting the occasional false positive.

You can talk about that world if you want, but it has no connection to reality. It's not p-values that assume that world, it's your misunderstanding of p-values.

>If an experiment is not well-formed then of course you won't see reproduction at the expected rate. This is what I'm referring to when I say that the low reproduction rate points to deep, fundamental flaws in the experiments.

Experiments don't have to have enormous sample sizes to be well-formed. That's the whole point of having a cutoff value.

It's not like an experiment that reproduces 80% of the time disproves the result the rest of the time, it just doesn't quite reach .05 on those trials

>the reproduction rate of a well-formed experiment should very closely track 1 - p

I'm suspicious of this. I don't have time to do the math right now, but an experiment that averages .01 might clear a .05 hurdle far more than 99% of the time, and would definitely be well-formed. And if you set a hurdle at .01 it would only clear it half the time, but it would still be well-formed.

jsprogrammer · on Aug 30, 2015

Hypotheses can never be proven to be correct. I don't want to be in any world where it is believed that a hypothesis is or could be correct.

This is a fundamental tenant of science. All that can be done is to reject hypotheses.

You (along with Gwern) have now claimed that I don't understand p-values, but you present no alternative understanding. The reason, of course, is that when you look at the mathematics behind p-value, it is obvious that it is exactly as I claim.

Edit to address your edit:

>I'm suspicious of this. I don't have time to do the math right now, but an experiment that averages .01 might clear a .05 hurdle far more than 99% of the time, and would definitely be well-formed. And if you set a hurdle at .01 it would only clear it half the time, but it would still be well-formed.

You are right that you need to be careful here about what you are comparing across instances. There will be variability since you are only sampling a distribution (most likely at a very low rate) and not observing the entire distribution (which, for continuous distributions, is impossible).

Dylan16807 · on Aug 30, 2015

On a certain philosophical level you can never be absolutely sure of anything, and p-values are meaningless.

On a practical level, p-values are the chance that a correlation is reported where 'reality' does not have a correlation. This is not the same number as the chance that the result agrees with 'reality'.

You can reject the concept of objectivity, but you cannot reject that logic. So I have explained the alternative understanding fine, just go back and replace 'true' and 'false' and 'correct' with a philosophically-hedged version.

jsprogrammer · on Aug 30, 2015

On a practical level, people may not be able execute a well formed experiment. I completely agree with that.

However, that doesn't change the meaning of the mathematics, only that your reality has diverged from what you originally intended/believed.

What is the meaning of the number that people call 'p-value' when it is not calculated on a well-formed experiment? I'm not sure if there is a general formula, but you may be able to find some meaning in a particular instance.

Dylan16807 · on Aug 30, 2015

You're either defining "well-formed" as there being no such thing as a true hypothesis, or you have completely lost me. Either way I don't think there's anything more I can say.

p does not tell you how likely a result is to be true.

jsprogrammer · on Aug 30, 2015

A well formed experiment tests only a null hypothesis.

p-value is exactly the probability that you observed X given that the previously stated null hypothesis was true at the time of observation. The value (1 - p-value) is exactly the probability that you will make an observation consistent with your hypothesis (ie. expected replication rate).

Wikipedia has a decent treatment that might help: https://en.wikipedia.org/wiki/P-value#Definition_and_interpr...

Dylan16807 · on Aug 30, 2015

But the importance of a p-value is showing when it's not the null hypothesis.

The only time you get 95% reproduction is a result that says the null hypothesis is true.

You're entirely right about that specific case.

But this only happens when nothing correlates. (And almost no science has been done, because most things in fact don't correlate.)

A result that disagrees with the null hypothesis at .05 does not imply any particular chance of another result that also disagrees with the null hypothesis at .05

If there is no correlation, then replication will happen 5% of the time. If there is correlation, it will be somewhere over 5%, but no particular value.

When people talk about reproduction, they talk about that chance. It will only be 95% by coincidence.

jsprogrammer · on Aug 30, 2015

>You're entirely right about that specific case.

In fact, this is the only case that matters. All other (valid) cases can be reduced to a single, null hypothesis design.

p-value is undefined for hypotheses that are not a null hypothesis. It is also undefined for hypotheses which do not hold.

Sure, you can walk through the motions, put some numbers together, and eventually produce a number between 0 and 1. However that does not mean you have computed a p-value. If you are testing a non-null hypothesis you have not computed a p-value. If you are testing a null hypothesis that doesn't hold, you have not computed a p-value.

Dylan16807 · on Aug 30, 2015

The null hypothesis is where nothing happens. You're supposed to be showing evidence against it. If you redefine things so your "null hypothesis" is where something happens, and you're showing evidence for it, you have done something very very wrong, and you should not be using a .05 threshold either.

Paradigma11 · on Aug 31, 2015

"The p-value is the chance of a false positive."

Nope, It's the chance of getting a result as this or more extreme under the assumption of the null hypotheses.

kgwgk · on Aug 30, 2015

Let me join the club of people claiming that you don't understand p-values.

It's not clear if you are talking about the rate of reproduction for a subset of the possible experimental outcomes (those rejecting the null at the alpha=5% level) or for the whole set.

When the null hypothesis is true (remember that there are fields where this is the norm, v.g. ESP), you would only reproduce (reject again) 5% of the rejections.

Of course you would reproduce (non-reject for the second time) 95% of the non-rejections. The global reproduction rate would be 0.95 x 0.95+0.05 x 0.5=0.905 (90.5% doesn't look ~95% either).

When the null hypothesis is not true, the probability of reproducing (in either sense) the result of a test depends on the effect size.

If the effect is huge, the test will be rejected with probability ~100% and the result will be reproduced with probability ~100%.

Or maybe you mean by reproducing "getting a lower p-value" in the the second trial? If the null is true, the probability of getting p-value2<p-value1 is precisely p-value1. If the null is not true, it will depend on the effect size. If you assume the effect size is the observed one, you expect p-value2 to be smaller than p-value1 with probability 50%.

jsprogrammer · on Aug 30, 2015

>If the null is true, the probability of getting p-value2<p-value1 is precisely p-value1.

Precisely. P-value is only defined when the null hypothesis is true.

Apparently everyone else is overlooking this fact.

kgwgk · on Aug 30, 2015

I don't see how does it contradict anything that I (and everyone else) wrote.

You seem to agree that when the null is true and the original result was p-value=0.05, the probability of reproducing the result (getting p-value<0.05 on a second trial) is 5%.

This seems incompatible with your original claim: "If the p-values were accurate and averaged around 0.05, ~95% of results should be reproducible."

Could you explain exactly what do the following mean:

p-values were accurate (that the null is true?)

p-values averaged around 0.05 (that you're taking the subset of outcomes with p-value 0.05?)

~95% of results should be reproducible (that if you take the previous subset you will get p-value<0.05 always in 95% of them? or exactly 95% of the time in all of them?)

jsprogrammer · on Aug 30, 2015

95% of the repeated observations that you make (in the same manner as the observations used to calculate a valid p-value of 0.05) will be consistent with the relevant null hypothesis.

What other meaning could there be? The result of an experiment is not a p-value, but a series of observations. Those are what need to be compared.

kgwgk · on Aug 30, 2015

I guess the bit "results should be reproducible" made us think that you were talking about reproducing the previous results (i.e. if the null hypothesis was rejected in the first trial, obtaining again a rejection if the trial was repeated).

If I understand your point, you're saying: "If the null hypothesis is true then with 95% probability it won't be rejected. And, independently of the result of the first trial, if we do a second trial and the null hypothesis is true then with probability 95% it won't be rejected".

Which seems correct, but you might be overlooking the fact that it's not very interesting and unrelated to the discussion.

jsprogrammer · on Aug 30, 2015

It's highly relevant to the discussion (interestingly titled: "P-value not as reliable as many scientists assume"), which is entirely about p-value, as I am describing to you the limits of p-value analysis.

That some perform calculations that are not p-value and call them p-value is not exactly my problem to solve. That others perform meta-analyses with numbers that others call p-values, but which aren't actually p-values isn't really my problem either.

I'll say it again. If you correctly measure (exercise left to reader) a p-value of 0.05, that measurement explicitly means that you expect that 95% of your future observations to be consistent with the hypothesis which you used to determine that p-value of 0.05.

Making future observations that are consistent with a known hypothesis is exactly what reproducibility refers to within the context of science.

If you expect 95% of observations (p = 0.05) to be consistent with previous findings, but only 36% are...you did not calculate a valid p-value (or are now testing something other than your hypothesis).

kgwgk · on Aug 30, 2015

> I'll say it again. If you correctly measure (exercise left to reader) a p-value of 0.05, that measurement explicitly means that you expect that 95% of your future observations to be consistent with the hypothesis which you used to determine that p-value of 0.05.

What do you mean with "measure a p-value"? You make your observation, calculate a statistic (a function of the observation), and look at the distribution of that statistic under the null. The p-value is, by definition, the percentile of the value you got in that distribution (which might or might not be the actual distribution).

You want to check if a die is loaded to yield 6 more often than it should. The null hypothesis is that the die is fair. You can calculate the distribution for the number of 6's in 3 rolls (0: 58%, 1:35%, 2: 7%, 3: 0.5%). You roll the die three times, you get three 6's. The p-value is 0.005. Do you agree? The p-value is 0.005 whether the die is fair (the null hypothesis is true) or loaded. Do you agree?

> Making future observations that are consistent with a known hypothesis is exactly what reproducibility refers to within the context of science.

Scientific experiments are usually about rejecting the null hypothesis. For example, the null hypothesis might be that there is no Higgs boson and the peak observed in the LHC data is just noise. They made their analysis and rejected the null hypothesis (p-value less than 0.000001, do you think they calculated it properly?). In this context, reproducibility means "finding the Higgs boson again if the experiment is repeated" and not "repeat the experiment and get a result consistent with the null hypothesis".

According to your description of the limits of p-value analysis, the only conclusion that physicists should get out of the experiment is that if they do it again they should expect to get results consistent with the null hypothesis (i.e. no Higgs boson) with 95% probability. But they see it as evidence that the null hypothesis is false and the Higgs boson real.

jsprogrammer · on Aug 31, 2015

Measuring a p-value is equivalent to calculating a p-value (ie. calculate the conditional probability P(X|H)).

I don't really agree that your die experiment is well-formed. For one, you are grossly under-sampling. It's known a priori that there are at least six possible outcomes, yet you are only considering three rolls, so you don't even have the possibility of observing each distinct value even once.

The p-value of a well-formed experiment should converge towards a fixed value as more observations are made. You will experience variance in the computed value due to the inherently discrete nature of experimentation. This will be especially pronounced for the first observations that are made.

I do not know if the Higgs boson experiment is well-formed. If it is well-formed and their null-hypothesis is true, their p-values will trend towards 1.

If their null-hypothesis is not true then the p-values do not mean much and will trend towards 0.

>In this context, reproducibility means "finding the Higgs boson again if the experiment is repeated"

"The Higgs boson exists" is not a valid hypothesis. Usually the null-hypothesis is "the explanation is measurement/background noise". Since that is really the only valid null-hypothesis, it is most likely what they are using.

kgwgk · on Aug 31, 2015

It's clear that you have your own concept of a p-value, which is quite different from the one used by all the other people (including the proper interpretation and the usual misinterpretations).

You disagree with all the provided examples, but you have not given any concrete example of how the p-value would be used in a "well-formed experiment" (another concept that seems unique to you).

Of course you're free to redefine concepts as you please, if it makes you happy or it is useful to you in any other way.

jsprogrammer · on Aug 31, 2015

I have redefined nothing. Go pull the the Higgs data; it will be as I say. Go read how to form experiments and calculate p-values, nothing will be substantially different than what I have said here.

kgwgk · on Aug 31, 2015

I already explained a few messages ago that the null hypothesis was indeed that there is no Higgs boson and the peak observed in the LHC data is just noise. They made their analysis and rejected the null hypothesis (p-value less than 0.000001).

"If you correctly measure a p-value of 0.05, that measurement explicitly means that you expect that 95% of your future observations to be consistent with the hypothesis which you used to determine that p-value of 0.05."

Did they correctly measure a p-value of 0.000001? They (and everyone else, apart from maybe you) think that they did.

Do you think they expect 95% of their future observations to be consistent with the null hypothesis (that they used to determine that p-value)?

I would say they were quite confident that the observed peak was not noise, and therefore they expected the signal to be there again if the experiment was to be repeated, rejecting again the null hypothesis. Which is why they announced they had discovered the Higgs boson. But maybe you can convince the Swedish Academy of Science to take Higgs' prize back...

jsprogrammer · on Aug 31, 2015

Here is one paper: http://arxiv.org/pdf/1207.7214v2.pdf

Look at Figures 8 and 9. They show the p-value (at whatever level of data was collected when the paper was written) over the parameter space that is being searched. You can see that the values observed have a clear separation -- most are close to 1 (null-hypothesis holds) with just one significant dip towards 0 (null-hypothesis doesn't hold). If you were to animate this graph with the p-values over time (as more observations are made), you would see the trend towards 0 or 1 much clearer.

The Boson experimenters would expect a (1 - p) reproduction rate for the next observation made (if the null-hypothesis holds). That is, the next observation has a p probability of fitting within the parameters of the null-hypothesis and (1-p) probability that it is inconsistent with the null-hypothesis. Why would they expect that? Because the math involved in telling you whether or not that is what you should expect is exactly what p-value calculates (again, assuming a well-formed experiement -- which the Higgs experiments probably are).

But again, when the null-hypothesis doesn't hold, p-value tells you very little (it's actually undefined in the math).

kgwgk · on Aug 31, 2015

> But again, when the null-hypothesis doesn't hold, p-value tells you very little (it's actually undefined in the math).

The p-value is well defined whether the null hypothesis holds or not. You calculate it assuming it does. There you go, you have a properly calculated p-value. That's what physicists do:

"Taking into account the entire mass range of the search, 110– 600 GeV, the global significance of the excess is 5.1 σ, which corresponds to p0 = 1.7 × 10−7."

You see, they have calculated a p-value. Does the null hypothesis hold? I don't think they had any expectations consistent with the null hypothesis being true before the experiment. After the experiment they clearly think that the null hypothesis is false:

"These results provide conclusive evidence for the discovery of a new particle with mass 126.0 ± 0.4 (stat) ± 0.4 (sys) GeV."

They don't see any problem in stating a p-value and rejecting the null hypothesis at the same time (in fact, it's because the p-value that they calculated is very small that they conclude that the null hypothesis doesn't hold). Apparently you see a problem, because if the Higgs boson exists and produces the signal in the experiment then the null hypothesis is false and all the p-value calculations they did to get to that conclusion are "wrong".

Anyway, I have no need to convince you of anything. I can live with people being wrong on the internet.

jsprogrammer · on Aug 31, 2015

Well, before you go, I implore you to look into the actual computation and theory of 'p-value'.

A p-value is simply P(X|H). P(X|H) only means something when H is true. If H is false, P(X|H) tells you nothing. Since H is your null-hypothesis, if it does not actually hold in the real-world, P(X|H) is meaningless.

If you read the paper I linked, they never explicitly call out the null hypothesis (nor do, I believe, they show the work for their calculations). There should be another paper somewhere that describes exactly what it is, in the terms I am using. So, phrases like, "[t]hey don't see any problem in stating a p-value and rejecting the null hypothesis" make me think you have no idea what you're talking about.

The null hypothesis can never be 'rejected' (ie. p-value can never reach 0). I don't think you will find anyone working on the Higgs boson that will claim otherwise.

kgwgk · on Sept 1, 2015

I think we agree that their null hypothesis is "there is a background, with events coming from all the known particles". I think we agree that their conclusion is "these results provide conclusive evidence for the discovery of a new particle". I don't see how can they say that there is a new particle without rejecting the hypothesis that there is no such new particle. Of course you can say that the null hypothesis can never be rejected (relevant Dilbert strip: http://dilbert.com/strip/2001-10-25) but then they can never discover a new particle either.

Regarding p-values in general, your definition is the same I've been using all along. But I don't think it is meaningless when the null hypothesis does not hold. The meaning is clear: "the probability of getting a value for the statistic as high as the observed one if the null hypothesis was true". For example, there would be one chance in several millions of observing the kind of data they found at the LHC if the Higgs boson didn't exist.

You might want to look into the theory yourself, because the notion of p-values trending towards 1 if the null hypothesis is true is nonsense. By definition, if the null hypothesis is true the p-value is uniformly distributed between 0 and 1. If you have at some point a p-value close to one (or to any other number for that matter) and keep adding data, in the long run it will still be uniformly distributed between 0 and 1.

jsprogrammer · on Sept 1, 2015

If the null-hypothesis is true, every observation made should be consistent with it. This will result in P(X|H) trending to 1 (since there will be experimental variance). No other experimental design makes sense.

>The meaning [of p-value when H is false] is clear: "the probability of getting a value for the statistic as high as the observed one if the null hypothesis was true"

This is a logical fallacy. It is counterfactual to consider a world where the null hypothesis is true, when it is not.

In fact, this is precisely the feature of the universe that p-value based experimentation exploits and is essentially the only way for us to gain any information about 'reality'.

>By definition, if the null hypothesis is true the p-value is uniformly distributed between 0 and 1.

I don't think so. If that were true, p-value would be entirely useless.

kgwgk · on Sept 2, 2015

You definitely do not know what a p-value is. When you wrote "P(X|H)" I though X was shorthand for T>T(X) where T is the statistic, not that you were referring to the actual data X.

P(X|H) doesn't have the properties you claim, anyway. P(X|H)=1 corresponds to the case where only one outcome is possible. In non-trivial cases, the more data you add the lower this number will be.

jsprogrammer · on Sept 2, 2015

X can be any random variable that satisfies the requirements of the null-hypothesis.

A more appropriate variable for your experiment would probably be the ratio of heads to tails (may need to add a bias to avoid division by 0).

"you have a fair coin" is not a hypothesis, at least not a well-defined one.

kgwgk · on Sept 2, 2015

Ok, so you're thinking about a random variable which converges to some value when the null hypothesis is true. This is fine, but it has nothing to do whatsoever with p-values.

Let me say that your notation is not very appropriate. It makes no sense to say that P(X|H) converges to 1. If you expect X to converge to C if the null hypothesis is true, you can simply say X->C. A proper notation involving probabilities would be P(|X-C|>epsilon)->0 for any positive epsilon (convergence in probability) or maybe P(X->C)=1 (convergence almost surely).

Taking as you suggest X=(#tails/#heads), you expect that X->1 if the coin is fair (I'm not sure why you find this is not a well-defined null hypothesis, but I don't really care). However, P(X)<1 for every X. In fact, P(X=1)->0 as the number if trials increases (X will get closer to 1 on average, but getting exactly 1 will get more and more unlikely).

As I said, you're free to prefer your converging statistics and your well-defined null hypothesis. But you should be aware that people are talking about something completely different when discussing things like the 1e-7 p-value in the Higgs boson discovery or the reproducibility of statistically significant results.

EDIT: Another example, maybe better-defined: a random variable distributed (under the null hypothesis) x~Normal(mu=0,sigma=1). Let's say you take N samples (I let you choose the number, so I don't pick one which is not good enough).The statistic is the mean X=(x_1+x_2+..+x_N)/N. If the null hypothesis is true, X->mu=0. You get X=1/sqrt(N). What's your "p-value" in that case?

jsprogrammer · on Sept 2, 2015

>Ok, so you're thinking about a random variable which converges to some value when the null hypothesis is true. This is fine, but it has nothing to do whatsoever with p-values.

Well, the variable itself doesn't, just the observed value of P(X|H). X can be any random variable, but typically it will need to be transformed to have a normal distribution about 0 with a standard deviation of 1 (since this is what the typical null-hypothesis predicts).

To effectively use p-value analysis, it is typically assumed that your null-hypothesis predicts that your observations will be normally distributed with a mean of 0 and a standard deviation of 1. The total count of heads observed will not be distributed that way. Neither will the probability of a particular sequence (what your example seemed to be calculating). I say your null hypothesis is not well-defined because the term 'fair' remains undefined (though we could guess at the meaning) and in fact makes no predictions about the world. You need to apply transformations to your random variable so that it will appear normally distributed about 0 with a standard deviation of 1 if the hypothesis is true.

>Let me say that your notation is not very appropriate. It makes no sense to say that P(X|H) converges to 1.

My notation is perfectly appropriate. X is a random variable and a random variable is the only thing that can go there (if you are doing p-value analysis). X is not assumed to be uniform or simple (although it certainly could be). P(T>T(X)|H) can be replaced with P(Y|H) every time (Y = T>T(X)).

>As I said, you're free to prefer your converging statistics and your well-defined null hypothesis. But you should be aware that people are talking about something completely different when discussing things like the 1e-7 p-value in the Higgs boson discovery or the reproducibility of statistically significant results.

I'm glad that we finally agree on this (although I dispute that anyone working on the Higgs boson discovery disagrees with me). One of my first claims was that others may not be calculating true p-values, but may calculate something and call it 'p-value' and then think that it means something it does not. In fact, this entire topic even links to an article in a prominent publisher claiming the same.

Do you think it is purely coincidental that the figures I showed you from the Higgs experiment show the lines converging towards only two different numbers: 1 and 0?

Edit: You'll have to give me some time on your edit. It's not something I typically calculate and I have other business to attend to today.

kgwgk · on Sept 2, 2015

Ok, so maybe your definition does correspond to a p-value after all. It's hard to say as you have refused to discuss concrete cases (like the fair coin or the loaded die, which are standard examples to introduce p-values). But if you're actually calculating a p-value then it won't behave as you expect. It won't converge to anything (edit: if the null hypothesis holds). P-values are by definition uniformly distributed when the null hypothesis is true. If your "p-value" is not, then it's not a p-value. It really is that simple. Or maybe everyone else is using the wrong "p-values" and yours are the real thing. You can believe it if you want.

Please disregard my previous questions, I see no point in continuing this discussion. But you might want to read a bit more about p-values: you won't find anyone (I hope!) sharing your point of view. Once you understand what the p-value is, and what it is not, you might indeed conclude that they are entirely useless. Of course it's your right to avoid learning what p-values really are, and keep the faith. It's your choice.

Do you think it's purely coincidental that this this figure https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATL... includes the sigma=0 (null hypothesis) line at 0.5 and not at 1? (Hint: the expect value of the p-value under the null hypothesis is 0.5.) (That's a rhetorical question: I already know it's because this is not a well-formed experiment or something.)

dragonwriter · on Sept 1, 2015

> If the null-hypothesis is true, every observation made should be consistent with it. This will result in P(X|H) trending to 1

Every observation being consistent with H doesn't mean that for each event X that occurs, the conditional probability of X given H will be, or "trend toward", 1. Assuming a perfectly deterministic universe, the P(X|everything else that is true) will be 1 for every X that occurs, but that doesn't mean P(X|H) for any particular true proposition H will be anything like that.

jsprogrammer · on Sept 2, 2015

>Every observation being consistent with H doesn't mean that for each event X that occurs, the conditional probability of X given H will be, or "trend toward", 1.

Agreed. If you plot P(X|H) (computed over all observations) over time, in a well-formed experiment the line will trend to 1 if the null-hypothesis is true and to 0 if the null-hypothesis is false.

It really is that simple.

dragonwriter · on Sept 1, 2015

> A p-value is simply P(X|H). P(X|H) only means something when H is true. If H is false, P(X|H) tells you nothing.

If you know H is false, P(X|H) tells you nothing. But then H wouldn't be a hypothesis, null or otherwise.

If you don't know whether H is true, but you do know something about X, P(X|H) tells you something useful about whether the positive hypothesis to which H is the alternative has an effect apparent in the world to explain.

> The null hypothesis can never be 'rejected' (ie. p-value can never reach 0).

Rejection of the null hypothesis does not mean p-value = 0. Scientific progress is not based on logical certainty, but rather practical utility.

Necessary truths are the domain of pure logic, not empirical science.

jsprogrammer · on Sept 2, 2015

What is the interpretation of a p-value = 0 then? Empirical science can never reject any theory, it is not powerful enough. At best it can provide a selection of least worst explanations.

The H in P(X|H) does not mean 'assumed to be true', it means 'is in fact true'. If H is in fact false, it is counter-factual to assume it is true and therefore any conclusions drawn from the assumption are invalid. This is independent of belief in H.

>Necessary truths are the domain of pure logic, not empirical science.

Science can never deliver truth, which is why it can never truly reject anything (including null-hypotheses).

More generally, this is referred to as the problem of induction.