# Bonferroni correction…

When using Null Hypothesis Significance Testing (NHST), a researcher is stating that the null hypothesis will be rejected if certain likelihood criteria is met; most often p < .05. A p = .05 equate to a 1/20 chance that the null hypothesis will be rejected in error (Type I error). But, what happens if a DV is testing 9 times? One way is to employ a p-value correction method attributed to Italian mathematician, Carlo Bonferroni (Bonferroni correction). Under the Bonferroni correction, to maintain the limit of a 5% likelihood of a Type I error across multiple tests, one would divide the p-value (.05) by the number of tests.

Schvom (2019) explored customer satisfaction in the US Arline industry. The emerging scholar focused on two types of analysis: Differences in service element (a survey item-level metric) by groups and customer satisfaction by groups. There were 9 difference service elements. This is when a p-value correction method should be employed.

In this situation, a learned committee member should have advised the emerging scholar to review the Bonferroni correction and its application to avoid experiment-wise error rates. In this example, the emerging scholar should have used p = .0056 as the test of significance rather than p = .05. Thus, a Type I error was made in H2B when rejecting the null hypothesis that On-Time Arrival satisfaction was important by Gender (p = .009); however, satisfaction with Number of Layovers by Gender (p = .001) was correctly rejected (p. 29; see Appendix F). Was Bonferonni advised and not discussed by the student? Or, was this just a review/oversight error by the student, faculty, and University quality control? Who knows? I usually look at the size of the effect, not the p-values because the p-value is influenced by the size of the sample.

Note: Admittedly, the Bonferroni correction is the easiest for faculty to explain and, in my opinion, business students to understand. However, there are many methods of addressing family-wise errors. See the sequential testing methodologies of Dunn (1959, 1961), Sidek (1967), and Holm (1979).

References:

Dunn, O. J. (1959). Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association, 54(287), 695-698. https://doi.org/10.1080/01621459.1959.10501524

Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52-64. https://doi.org/10.1080/01621459.1961.10482090

Holm, S. (1979). A simple sequentially rejective multiple test procedures. Scandinavian Journal of Statistics, 6(2), 65-70. https://www.jstor.org/stable/4615733

Schvom, A. F. (2019). A critical evaluation of service elements related to customer satisfaction in the U.S. Airline industry (Doctoral dissertation). ProQuest Dissertations & Theses Global: The Humanities and Social Sciences Collection. (13856698)

Sidak, Z. K. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318), 626-633. https://doi.org/10.1080/01621459.1967.10482935