You're absolutely right! My example was mainly for illustration, I was not sure that it would give exactly the same lower bound (but I was indeed surprised that it's below the 29% in those papers, which I thought a "hard" bound).
It seems the bound that you are calculating (that I have reproduced, R code below) was already published more than 50 years ago for this specific case of a normal distribution. See slide 10 in http://www.biostat.uzh.ch/teaching/master/previous/seminarba...
I have not really read the paper of Sellke et al. entirely, but it seems that the "calibration" they propose is more general but it makes some assumptions about the distribution of the p-value and it's therefore approximative.
I don't know R, but I found a few sites that happily run R code for me. I find the shape of that curve somehow pretty. 29% clearly can't be a hard bound, since we can get 4.8% by assuming no false negatives. I just wish I understood whether there is anything particularly natural about the number 29, or did they make their distributional assumptions for the same reasons you did: "mainly for illustration". If so, then the Nature article was terribly misleading by presenting that number as some kind of "speed of light"-type limit, because that makes p-values look worse than they really are. It seems that p-values are bad enough without making up more bad stuff about them! :)
Anyway, thanks for all your help. Your ability to dig up references (and pump out R code) at a moment's notice makes me think you are someone who knows quite a bit of statistics. I'll happily look at anything else you care to point me to.
Assuming no false negatives is just not an option :-) I think that can only happen if the situation is such that there can be no false positives either (i.e. the p-value when the null hypothesis is not true is always zero). EDIT: What I wrote is true only if the distributions under the null and the alternative are completely disjoint. You can actually have very low false negative rates if the distributions are not symmetric, and if you allow different distributions you can do even better: imagine the null hypothesis is x~Normal(0,1) and the alternative is x=C0=1.64 (exactly the cutoff value for 0.05 significance). If we get exactly p=0.05 then the probability of the null being true is 0%. I mean, we get p in [C0-epsilon C0+epsilon] with probability 1 under the alternative, but with probability->0 under the null as epsilon->0. Of course, this alternative is very unlikely and mixing continous and discrete distributions is always tricky. This is why it makes sense to make averages over prior distributions of the alternative.
As you can see in slide 14, there are multiple calibrations proposed under different assumptions. I agree it is misleading to give one as the "real" error I rate, but it's interesting that all of them are giving rates well above the nominal alpha rate. EDIT: note as well that this is for the case where in 50% of the cases the null hypothesis is true(nowhere in the calculation of p-values do we consider how often the null hypothesis is true, but obviously if it's always true 100% of the significant results will be false positives and if it's never true 0% of the significant results will be false positives).
In slide 11 there are other calculations for the normal case, two-sided test this time. But instead of looking for the mu1 giving the lowest bound, they calculate the aggregate error rate making some assumptions about the distribution of the mu1. For example, if I understand correctly the results, assuming mu1 is normally distributed around mu0=0, if you get a p-value=0.05 (in the two-sided test, some modifications are required to the calculation we did) you should expect the null hypothesis to be true at least 32.1% (if the distribution of mu1 is very concentrated around 0, the 50% rejection rate on the left side of the chart dominates, if the standard deviation if very high the region of almost 100% rejection rate far from mu0 at the right of the chart dominates, for some intermediate standard deviation one will hopefully get the 32.1% lower bound).
Unfortunately, I think the assumption behind the nice result 1/(1-1/(e p log(p)))) is that p-values follow a beta distribution when the null hypothesis is not true and I don't think there is a clear interpretation of that.
Ok, so the pretty result is mostly arbitrary. Fair enough. Re: false negatives... you seem to be living in a world of bell curves, or at least a mostly continuous world. I can easily make (very contrived) experiments where false negatives just don't happen. For instance: I have two coins. One is a perfectly fair coin. The other is a two-headed coin. You see me flip one of them. The null hypothesis is that I flipped the fair coin. A false negative means deciding the coin is fair but it's really not. This will never happen, because you will only decide that if it lands tails, and then it must be fair. (If I only flip it once, the false positive rate is something like 1/3, not 0.) But this is probably much too contrived for your taste, and maybe even for mine. But it's almost 5am, and I must go to sleep now, or else it will get bright soon, and I never will. I now appreciate the value of the 20min procrastination setting.
I agree on your point, if we are sufficiently creative we can get many extreme results. For example, I made an addition to the first paragraph of my previous comment, that you might have missed, giving an example where the probability of the null hypothesis being true when p=0.05 is zero (or arbitrarily small, if we replace the discrete probability lump under the alternative hypothesis by a continuous distribution which is concentrated enough). I also added a comment on the second paragraph, by the way.
One minor comment on your example. If H0:fair coin and H1:two-headed and the statistic is the number of heads h, I cannot reject (at the 0.05 level) the null when n (the number of flips) is small even if I'm only getting heads. For one flip, p[h=1|H0]=0.5. For two flips p[h=2|H0]=0.25. For n>5 you will of course reject the null hypothesis for every case where H1 is true (and for ~5% of the cases where H0 is true). There will be no false negatives. But I guess you have noticed that this doesn't help with the false discovery rate in this example: when H1 is true the p-value will be very small (1/2^n) so if the observed p-value is ~0.05 (or any other value larger than 1/2^n) then it's for sure a false positive (because there will be at least one occurrence of tails).
> so if the observed p-value is ~0.05 (or any other value larger than 1/2^n) then it's for sure a false positive (because there will be at least one occurrence of tails).
It seems the bound that you are calculating (that I have reproduced, R code below) was already published more than 50 years ago for this specific case of a normal distribution. See slide 10 in http://www.biostat.uzh.ch/teaching/master/previous/seminarba...
I have not really read the paper of Sellke et al. entirely, but it seems that the "calibration" they propose is more general but it makes some assumptions about the distribution of the p-value and it's therefore approximative.