Some clarifications regarding power and Type I error control for pairwise comparisons of three groups


Abstract


A previous study in this journal used Monte Carlo simulations to compare the power and familywise Type I error rates of ten multiple-testing procedures in the context of pairwise comparisons in balanced three-group designs. The authors concluded that the Benjamini–Hochberg procedure was the "best."' However, they did not compare the Benjamini–Hochberg procedure to commonly used multiple-testing procedures that were developed specifically for pairwise comparisons, such as Fisher's protected least significant difference and Tukey's honest significant difference. Simulations in the present study show that in the three-group case, Fisher's method is more powerful than both Tukey's method and the Benjamini–Hochberg procedure. Compared to the Benjamini–Hochberg procedure, Tukey's method is shown to be less powerful in terms of per-pair power (average probability of significance across the tests of false null hypotheses), but more powerful in terms of any-pair power (probability of significance in at least one test of a false null hypothesis). Additionally, the present study shows that small deviations from normality in the population distributions have little effect on the power of pairwise comparisons, and that the previous study's finding to the contrary was based on a methodological inconsistency.

DOI Code: 10.1285/i20705948v12n1p55

Keywords: Type I error; multiple comparisons; multiple testing; multiplicity; power

References


Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1):289–300.

Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association, 50(272):1096–1121.

Félix, V. B. and Menezes, A. F. B. (2018). Comparisons of ten corrections methods for t-test in multiple comparisons via Monte Carlo study. Electronic Journal of Applied Statistical Analysis, 11(1):74–91.

Finner, H. and Roters, M. (2001). On the false discovery rate and expected Type I errors. Biometrical Journal, 43(8):985–1005.

Fisher, R. A. (1935). The design of experiments. Oliver and Boyd.

Frane, A. V. (2015a). Are per-family Type I error rates relevant in social and behavioral science?. Journal of Modern Applied Statistical Methods, 14(1):12–23.

Frane, A. V. (2015b). Power and type I error control for univariate comparisons in multivariate two-group designs. Multivariate Behavioral Research, 50(2):233–247.

Hancock, G. R., and Klockars, A. J. (1996). The quest for α: Developments in multiple comparison procedures in the quarter century since Games (1971). Review of Educational Research, 66(3):269–306.

Hayter, A. J. (1986). The maximum familywise error rate of Fisher's least significant difference test. Journal of the American Statistical Association, 81(396):1000–1004.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75(4):800–802.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2):65–70.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75(2):383–386.

Keselman, H. J., Cribbie, R., and Holland, B. (1999). The pairwise multiple comparison multiplicity problem: An alternative approach to familywise/comparisonwise Type I error control. Psychological Methods, 4(1):58–69.

Keuls, M. (1952). The use of Studentized range in connection with an analysis of variance. Euphytica, 1(2):112–122.

Kramer, C. Y. (1956). Extensions of multiple range tests to group means with unequal number of replications. Biometrics, 12(3):307–310.

Li, D. (2008). A two-step rejection procedure for testing multiple hypotheses. Journal of Statistical Planning and Inference, 138(6):1521–1527.

Newman, D. (1939). The distribution of the range in samples from a normal population, expressed in terms of an independent estimate of standard deviation. Biometrika, 31(1/2):20–30.

Phillips, A., Fletcher, C., Atkinson, G., Channon, E., Douiri, A., Jaki, T., Maca, J., Morgan, D., Roger, J. H., and Terrill, P. (2013). Multiplicity: Discussion points from the statisticians in the Pharmaceutical Industry Multiplicity Expert Group. Pharmaceutical Statistics, 12(5):255–259.

Ramsey, P. H. (1978). Power differences between pairwise multiple comparisons. Journal of the American Statistical Association, 73(363):479–485.

Ramsey, P. H., Barrera, K., Hachimine–Semprebom, P., and Li, C.-C. (2011). Pairwise comparisons of means under realistic nonnormality, unequal variances, outliers and equal sample sizes. Journal of Statistical Computation and Simulation, 81(2):125–135.

Ramsey, P. H. and Ramsey, P. P. (2008). Power of pairwise comparisons in the equal variance and unequal sample size case. British Journal of Mathematical and Statistical Psychology, 61(1):115–131.

Ramsey, P. H. and Ramsey, P. P. (2009). Power and Type I errors for pairwise comparisons of means in the unequal variances case. British Journal of Mathematical and Statistical Psychology, 62(2):263–281.

R Core Team. (2017). R: A language and environment for statistical computing. https://www.R-project.org/

Richter, S. J. and McCann, M. H. (2012). Using the Tukey–Kramer omnibus test in the Hayter–Fisher procedure. British Journal of Mathematical and Statistical Psychology, 65(3):499–510.

Seaman, M. A., Levin, J. R., and Serlin, R. C. (1991). New developments in pairwise multiple comparisons: Some powerful and practicable procedures. Psychological Bulletin, 110(3):577–586.

Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81(395):826–831.

Tamhane, A. C. (2009). Statistical analysis of designed experiments: Theory and applications. Wiley.

Tukey, J. W. (1953). The problem of multiple comparisons. In H. I. Braun (Ed.), The collected works of John W. Tukey, volume VIII multiple comparisons: 1948–1983. Wiley.


Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.