Misinterpretations of P-values and statistical tests persist among researchers and professionals working with statistics and epidemiology

Per Lytsy; Mikael  Hartman; Ronnie  Pingel

doi:10.48101/ujms.v127.8760

Per Lytsy Department of Public Health and Caring Sciences, University of Uppsala, Uppsala, Sweden https://orcid.org/0000-0003-1949-6299
Mikael Hartman Independent Researcher
Ronnie Pingel Department of Public Health and Caring Sciences, University of Uppsala, Uppsala, Sweden; and Department of Statistics, University of Uppsala, Uppsala, Sweden https://orcid.org/0000-0002-4140-1981

DOI: https://doi.org/10.48101/ujms.v127.8760

Keywords: Statistical inference, null hypothesis significance testing, statistics, frequentist, P-value

Abstract

Background: The aim was to investigate inferences of statistically significant test results among persons with more or less statistical education and research experience.

Methods: A total of 75 doctoral students and 64 statisticians/epidemiologist responded to a web questionnaire about inferences of statistically significant findings. Participants were asked about their education and research experience, and also whether a ‘statistically significant’ test result (P = 0.024, α-level 0.05) could be inferred as proof or probability statements about the truth or falsehood of the null hypothesis (H₀) and the alternative hypothesis (H₁).

Results: Almost all participants reported having a university degree, and among statisticians/epidemiologist, most reported having a university degree in statistics and were working professionally with statistics. Overall, 9.4% of statisticians/epidemiologist and 24.0% of doctoral students responded that the statistically significant finding proved that H₀ is not true, and 73.4% of statisticians/epidemiologists and 53.3% of doctoral students responded that the statistically significant finding indicated that H₀ is improbable. Corresponding numbers about inferences about the alternative hypothesis (H₁) were 12.0% and 6.2% about proving H₁ being true and 62.7 and 62.5% for the conclusion that H₁ is probable. Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis’ probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists.

Conclusions: Misinterpretation of P-values and statistically significant test results persists also among persons who have substantial statistical education and who work professionally with statistics.

Downloads

Download data is not yet available.

References

1. Nickerson RS. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods. 2000;5:241–301. doi: 10.1037/1082-989X.5.2.241

2. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50. doi: 10.1007/s10654-016-0149-3

3. Amrhein V, Trafimow D, Greenland S. Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician. 2019;73(sup1):262–70. doi: 10.1080/00031305.2018.1543137

4. Rozeboom WW. The fallacy of the null-hypothesis significance test. Psychol Bull. 1960;57:416–28. doi: 10.1037/h0042040

5. Oakes M. Statistical inference: a commentary for the social and behavioral sciences. Chichester, UK: John Wiley & Sons; 1986.

6. Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p < 0.05”. The American Statistician. 2019;73(sup1):1–19. doi: 10.1080/00031305.2019.1583913

7. Bohlmeijer ET, Fledderus M, Rokx TA, Pieterse ME. Efficacy of an early intervention based on acceptance and commitment therapy for adults with depressive symptomatology: Evaluation in a randomized controlled trial. Behav Res Ther. 2011;49:62–7. doi: 10.1016/j.brat.2010.10.003

8. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130:995–1004. doi: 10.7326/0003-4819-130-12-199906150-00008

9. Gigerenzer G. Mindless statistics. The Journal of Socio-Economics. 2004;33:587–606. doi: 10.1016/j.socec.2004.09.033

10. Szucs D, Ioannidis JPA. When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment. Front Hum Neurosci. 2017;11:390. doi: 10.3389/fnhum.2017.00390

11. Stang A, Poole C, Kuss O. The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol. 2010;25:225–30. doi: 10.1007/s10654-010-9440-x

12. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. The American Statistician. 2016;70:129–33. doi: 10.1080/00031305.2016.1154108

13. Baker M. Statisticians issue warning over misuse of P values. Nature. 2016;531:151. doi: 10.1038/nature.2016.19503

14. Van Calster B, Steyerberg EW, Collins GS, Smits T. Consequences of relying on statistical significance: Some illustrations. Eur J Clin Invest. 2018;48:e12912. doi: 10.1111/eci.12912

15. Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567:305–7. doi: 10.1038/d41586-019-00857-9

16. Lytsy P. P in the right place: Revisiting the evidential value of P-values. J Evid Based Med. 2018;11:288–91. doi: 10.1111/jebm.12319

17. Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008;45:135–40. doi: 10.1053/j.seminhematol.2008.04.003

18. Cohen J. The Earth is Round (p < .05). American Psychologist. 1994;49:997–1003. doi: 10.1037/0003-066X.49.12.997

19. Falk R, Greenbaum CW. Significance Tests Die Hard: The Amazing Persistence of a Probabilistic Misconception. Theory & Psychology. 1995;5:75–98. doi: 10.1177/0959354395051004

20. McShane BB, Gal D. Statistical Significance and the Dichotomization of Evidence. Journal of the American Statistical Association. 2017;112:885–95. doi: 10.1080/01621459.2017.1289846

21. Badenes-Ribera L, Frias-Navarro D, Monterde-i-Bort H, Pascual-Soler M. Interpretation of the p value: A national survey study in academic psychologists from Spain. Psicothema. 2015;27:290–5.

22. Eddy DM. Probabilistic reasoning in clinical medicine: Problems and opportunities. In: Kahneman D, Slovic P and Tversky A, eds. Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press; 1982, pp. 249–67.

23. Grimes DR. Proposed mechanisms for homeopathy are physically impossible. Focus on Alternative and Complementary Therapies. 2012;17:149–55. doi: 10.1111/j.2042-7166.2012.01162.x

24. Committee SaT. Fourth Report. Evidence Check 2: Homeopathy. London: House of Commons; 2010.