Misinterpretations of P-values and statistical tests persist among researchers and professionals working with statistics and epidemiology
Background: The aim was to investigate inferences of statistically significant test results among persons with more or less statistical education and research experience.
Methods: A total of 75 doctoral students and 64 statisticians/epidemiologist responded to a web questionnaire about inferences of statistically significant findings. Participants were asked about their education and research experience, and also whether a ‘statistically significant’ test result (P = 0.024, α-level 0.05) could be inferred as proof or probability statements about the truth or falsehood of the null hypothesis (H0) and the alternative hypothesis (H1).
Results: Almost all participants reported having a university degree, and among statisticians/epidemiologist, most reported having a university degree in statistics and were working professionally with statistics. Overall, 9.4% of statisticians/epidemiologist and 24.0% of doctoral students responded that the statistically significant finding proved that H0 is not true, and 73.4% of statisticians/epidemiologists and 53.3% of doctoral students responded that the statistically significant finding indicated that H0 is improbable. Corresponding numbers about inferences about the alternative hypothesis (H1) were 12.0% and 6.2% about proving H1 being true and 62.7 and 62.5% for the conclusion that H1 is probable. Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis’ probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists.
Conclusions: Misinterpretation of P-values and statistically significant test results persists also among persons who have substantial statistical education and who work professionally with statistics.
2. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50. doi: 10.1007/s10654-016-0149-3
3. Amrhein V, Trafimow D, Greenland S. Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician. 2019;73(sup1):262–70. doi: 10.1080/00031305.2018.1543137
4. Rozeboom WW. The fallacy of the null-hypothesis significance test. Psychol Bull. 1960;57:416–28. doi: 10.1037/h0042040
5. Oakes M. Statistical inference: a commentary for the social and behavioral sciences. Chichester, UK: John Wiley & Sons; 1986.
6. Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p < 0.05”. The American Statistician. 2019;73(sup1):1–19. doi: 10.1080/00031305.2019.1583913
7. Bohlmeijer ET, Fledderus M, Rokx TA, Pieterse ME. Efficacy of an early intervention based on acceptance and commitment therapy for adults with depressive symptomatology: Evaluation in a randomized controlled trial. Behav Res Ther. 2011;49:62–7. doi: 10.1016/j.brat.2010.10.003
8. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130:995–1004. doi: 10.7326/0003-4819-130-12-199906150-00008
9. Gigerenzer G. Mindless statistics. The Journal of Socio-Economics. 2004;33:587–606. doi: 10.1016/j.socec.2004.09.033
10. Szucs D, Ioannidis JPA. When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment. Front Hum Neurosci. 2017;11:390. doi: 10.3389/fnhum.2017.00390
11. Stang A, Poole C, Kuss O. The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol. 2010;25:225–30. doi: 10.1007/s10654-010-9440-x
12. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. The American Statistician. 2016;70:129–33. doi: 10.1080/00031305.2016.1154108
13. Baker M. Statisticians issue warning over misuse of P values. Nature. 2016;531:151. doi: 10.1038/nature.2016.19503
14. Van Calster B, Steyerberg EW, Collins GS, Smits T. Consequences of relying on statistical significance: Some illustrations. Eur J Clin Invest. 2018;48:e12912. doi: 10.1111/eci.12912
15. Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567:305–7. doi: 10.1038/d41586-019-00857-9
16. Lytsy P. P in the right place: Revisiting the evidential value of P-values. J Evid Based Med. 2018;11:288–91. doi: 10.1111/jebm.12319
17. Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008;45:135–40. doi: 10.1053/j.seminhematol.2008.04.003
18. Cohen J. The Earth is Round (p < .05). American Psychologist. 1994;49:997–1003. doi: 10.1037/0003-066X.49.12.997
19. Falk R, Greenbaum CW. Significance Tests Die Hard: The Amazing Persistence of a Probabilistic Misconception. Theory & Psychology. 1995;5:75–98. doi: 10.1177/0959354395051004
20. McShane BB, Gal D. Statistical Significance and the Dichotomization of Evidence. Journal of the American Statistical Association. 2017;112:885–95. doi: 10.1080/01621459.2017.1289846
21. Badenes-Ribera L, Frias-Navarro D, Monterde-i-Bort H, Pascual-Soler M. Interpretation of the p value: A national survey study in academic psychologists from Spain. Psicothema. 2015;27:290–5.
22. Eddy DM. Probabilistic reasoning in clinical medicine: Problems and opportunities. In: Kahneman D, Slovic P and Tversky A, eds. Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press; 1982, pp. 249–67.
23. Grimes DR. Proposed mechanisms for homeopathy are physically impossible. Focus on Alternative and Complementary Therapies. 2012;17:149–55. doi: 10.1111/j.2042-7166.2012.01162.x
24. Committee SaT. Fourth Report. Evidence Check 2: Homeopathy. London: House of Commons; 2010.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright of their work, with first publication rights granted to Upsala Mecical Society. Read the full Copyright- and Licensing Statement.