Examination of Entropy balancing technique for estimating some standard measures of treatment effects: A simulation study


In observational studies, propensity score weighting methods are regarded as the conventional standard for estimating the effects of treatments on outcomes. We introduce entropy balancing, which despite its excellent conceptual properties, has been under-utilized in the applied studies. Using an extensive series of Monte Carlo simulations, we evaluated the performance of entropy balancing, in estimating difference in means, marginal odds ratios, rate ratios and hazard ratios. The performance of entropy balancing was relatively compared with that of inverse probability of treatment weighting using the propensity score. We found that entropy balancing outperformed the IPW method in estimating difference in means, marginal odds ratios, and hazard ratios, but when estimating marginal rate ratios, IPW performed better. Entropy balancing produced more biased estimates in many cases. However, the entropy balancing algorithm is capable of controlling bias by loosening the tightening of the pre-specified tolerance on covariate balance. We report findings as to when one technique is better than the other with no proclamation on whether one method is in every case superior to the other. Entropy balancing merits more widespread adoption in applied studies.

DOI Code: 10.1285/i20705948v12n2p491

Keywords: Entropy balancing; Monte Carlo simulation; Observational studies; Propensity score weighting; Treatment effect; odds ratios; hazard ratios; rate ratios


Adhikary, S. D., Liu, W.-M., Memtsoudis, S. G., Davis III, C. M., and Liu, J. (2016). Body mass index more than 45 kg/m2 as a cutoff point is associated with dramat- ically increased postoperative complications in total knee arthroplasty and total hip arthroplasty. The Journal of arthroplasty, 31(4):749–753.

Amusa, L., Zewotir, T., and North, D. (2019a). Evaluation of subset matching methods: Evidence from a monte carlo simulation study. American Journal of Applied Sciences, 16(3):92–100.

Amusa, L., Zewotir, T., and North, D. (2019b). A weighted covariate balancing method for estimating causal effects in case-control studies. Modern applied science, 13(4):40– 50.

Aria, M., Capaldo, G., Iorio, C., Orefice, C. I., Riccardi, M., and Siciliano, R. (2018). Pls path modeling for causal detection of project management skills: a research field in national research council in italy. Electronic Journal of Applied Statistical Analysis, 11(2):516–545.

Austin, P. (2007). The performance of different propensity score methods for estimating marginal odd ratios. Stat. Med., 26:3078–3094.

Austin, P. C. (2013). The performance of different propensity score methods for esti- mating marginal hazard ratios. Statistics in medicine, 32(16):2837–2849.

Austin, P. C. (2014). A comparison of 12 algorithms for matching on the propensity score. Statistics in medicine, 33(6):1057–1069.

Austin, P. C., Grootendorst, P., Normand, S. T., and Anderson, G. M. (2007). Condi- tioning on the propensity score can result in biased estimation of common measures of treatment effect: a monte carlo study. Statistics in medicine, 26(4):754–768.

Austin, P. C. and Small, D. S. (2014). The use of bootstrapping when using propensity- score matching without replacement: a simulation study. Statistics in medicine, 33(24):4306–4319.

Austin, P. C. and Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (iptw) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine, 34(28):3661–3679.

Austin, P. C. and Stuart, E. A. (2017). Estimating the effect of treatment on binary outcomes using full matching on the propensity score. Statistical methods in medical research, 26(6):2505–2525.

Bender, R., Augustin, T., and Blettner, M. (2005). Generating survival times to simulate cox proportional hazards models. Statistics in medicine, 24(11):1713–1723.

Brettschneider, C., Bleibler, F., Hiller, T. S., Konnopka, A., Breitbart, J., Margraf, J., Gensichen, J., Koenig, H. H., and Jena, P. S.-G. (2017). Excess costs of panic disorder with or without agoraphobia in germany - the application of entropy balancing to multiple imputed datasets. Journal of Mental Health Policy and Economics, 20:S3– S3.

Carpita, M. and Ciavolino, E. (2017). A generalized maximum entropy estimator to simple linear measurement error model with a composite indicator. Advances in Data Analysis and Classification, 11(1):139–158.

Ciavolino, E. and Carpita, M. (2015). The gme estimator for the regression model with a composite indicator as explanatory variable. Quality & Quantity, 49(3):955–965.

Dehejia, R. H. and Wahba, S. (2002). Propensity score-matching methods for nonexper- imental causal studies. Review of Economics and statistics, 84(1):151–161.

Gail, M. H., Wieand, S., and Piantadosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71(3):431–444.

Golan, A. (2018). Foundations of info-metrics: Modeling, inference, and imperfect in- formation. Oxford University Press.

Greenland, S. (1987). Interpretation and choice of effect measures in epidemiologic

analyses. American journal of epidemiology, 125(5):761–768.

Grupp, H., Kaufmann, C., K ̈onig, H.-H., Bleibler, F., Wild, B., Szecsenyi, J., Herzog, W., Schellberg, D., Sch ̈afert, R., and Konnopka, A. (2017). Excess costs from func- tional somatic syndromes in germany—an analysis using entropy balancing. Journal of psychosomatic research, 97:52–57.

Guo, S., Barth, R., and Gibbons, C. (2006). Propensity score matching strategies for evaluating substance abuse services for child welfare clients. Children and Youth Ser- vices Review, 28:357–83.

Guo, S. and Fraser, M. (2010). Propensity score analysis; Statistical methods and applica- tions. Advanced Quantitative Techniques in the Social Sciences. SAGE Publications.

Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20(1):25–46.

Hainmueller, J. (2014). ebal: Entropy reweighting to create balanced samples. R package version 0.1-6.

Harvey, R. A., Hayden, J. D., Kamble, P. S., Bouchard, J. R., and Huang, J. C. (2017). A comparison of entropy balance and probability weighting methods to generalize observational cohorts to a population: a simulation and empirical example. Pharma- coepidemiology and Drug Safety, 26(4):368–377.

Hirano, K. and Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes research methodology, 2(3-4):259–278.

Hirshberg, D. A. and Zubizarreta, J. R. (2017). On two approaches to weighting in causal inference. Epidemiology, 28(6):812–816.

Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and statistics, 86(1):4–29.

Joffe, M. M., Ten Have, T. R., Feldman, H. I., and Kimmel, S. E. (2004). Model selection, confounder control, and marginal structural models: review and new applications. The American Statistician, 58(4):272–279.

Kullback, S. (1959). Information theory and statistics. Wiley, New York.

Lee, B., Lessler, J., and Stuart, E. (2010). Improving propensity score weighting using

machine learning. Statistics in Medicine, 29:337–346.

Mattke, S., Han, D., Wilks, A., and Sloss, E. (2015). Medicare home visit program associated with fewer hospital and nursing home admissions, increased office visits. Health Affairs, 34(12):2138–2146.

Newcombe, R. G. (2006). A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine, 25(24):4235–4240.

Parish, W. J., Keyes, V., Beadles, C., and Kandilov, A. (2018). Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evalua- tion of a complex multi-state federal demonstration. Health Services and Outcomes Research Methodology, 18(1):17–46.

Pearson, J. L., Stanton, C. A., Cha, S., Niaura, R. S., Luta, G., and Graham, A. L. (2014). E-cigarettes and smoking cessation: insights and cautions from a secondary analysis of data from a study of online treatment-seeking smokers. Nicotine & Tobacco Research, 17(10):1219–1227.

R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Rosenbaum, P. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70:41–55.

Setoguchi, S., Schneeweiss, S., Brookhart, M., Glynn, R., and Cook, E. (2008). Eval- uating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety, 17:546–555.

Full Text: pdf

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.