Bonferroni correction…

When using Null Hypothesis Significance Testing (NHST), a researcher is stating that the null hypothesis will be rejected if certain likelihood criteria is met; most often p < .05. A p = .05 equate to a 1/20 chance that the null hypothesis will be rejected in error (Type I error). But, what happens if a DV is testing 9 times? One way is to employ a p-value correction method attributed to Italian mathematician, Carlo Bonferroni (Bonferroni correction). Under the Bonferroni correction, to maintain the limit of a 5% likelihood of a Type I error across multiple tests, one would divide the p-value (.05) by the number of tests.

Schvom (2019) explored customer satisfaction in the US Arline industry. The emerging scholar focused on two types of analysis: Differences in service element (a survey item-level metric) by groups and customer satisfaction by groups. There were 9 difference service elements. This is when a p-value correction method should be employed.

In this situation, a learned committee member should have advised the emerging scholar to review the Bonferroni correction and its application to avoid experiment-wise error rates. In this example, the emerging scholar should have used p = .0056 as the test of significance rather than p = .05. Thus, a Type I error was made in H2B when rejecting the null hypothesis that On-Time Arrival satisfaction was important by Gender (p = .009); however, satisfaction with Number of Layovers by Gender (p = .001) was correctly rejected (p. 29; see Appendix F). Was Bonferonni advised and not discussed by the student? Or, was this just a review/oversight error by the student, faculty, and University quality control? Who knows? I usually look at the size of the effect, not the p-values because the p-value is influenced by the size of the sample.

Note: Admittedly, the Bonferroni correction is the easiest for faculty to explain and, in my opinion, business students to understand. However, there are many methods of addressing family-wise errors. See the sequential testing methodologies of Dunn (1959, 1961), Sidek (1967), and Holm (1979).


Dunn, O. J. (1959). Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association, 54(287), 695-698.

Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52-64.

Holm, S. (1979). A simple sequentially rejective multiple test procedures. Scandinavian Journal of Statistics, 6(2), 65-70.

Schvom, A. F. (2019). A critical evaluation of service elements related to customer satisfaction in the U.S. Airline industry (Doctoral dissertation). ProQuest Dissertations & Theses Global: The Humanities and Social Sciences Collection. (13856698)

Sidak, Z. K. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318), 626-633.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s