Associated kernel discriminant analysis for multivariate mixed data


Abstract


Associated kernels have been introduced to improve the classical (symmetric) continuous kernels for smoothing any functional on several kinds of supports such as bounded continuous and discrete sets. In this paper, an associated kernel for discriminant analysis with multivariate mixed variables is proposed. These variables are of three types: continuous, categorical and
count. The method consists of using a product of adapted univariate associated kernels and an estimate of the misclassication rate. A new prole version cross-validation procedure of bandwidth matrices selection is introduced for multivariate mixed data, while a classical cross-validation is used for homogeneous data sets having the same reference measures. Simulations and validation results show the relevance of the proposed method. The method has been validated on real coronary heart disease data in comparison to the classical kernel discriminant analysis.

DOI Code: 10.1285/i20705948v9n2p385

Keywords: Bandwidth matrix, non-classical kernel, profile cross-validation

References


Aitchison, J. and Aitken, C.G.G. (1976). Multivariate binary discrimination by the kernel

method. Biometrika 63(3):413-420.

Antoniadis, A. (1997). Wavelets in statistics: a review (with discussion), Journal of the

Italian Statistical Society SeriesB 6(2):97-144.

Bouezmarni, T. and Rombouts, J.V.K. (2010). Nonparametric density estimation for

multivariate bounded data, Journal of Statistical Planning and Inference 140(1):139-

Chen, S.X. (1999). A beta kernel estimation for density functions, Computational Statis-

tics and Data Analysis 31(2):131-145.

Chen, S.X. (2000). Probability density function estimation using gamma kernels, Annals

of the Institute of Statistical Mathematics 52(3):471-480.

Duong, T. (2004). Bandwidth Selectors for Multivariate Kernel Density Estimation.

Ph.D. Thesis Manuscript to University of Western Australia, Perth, Australia, Oc-

tober 2004.

Duong, T. (2007). ks: Kernel density estimation and kernel discriminant analysis for

multivariate data in R, Journal of Statistical Software 21(7):1-16.

Gosh, A.K. and Chaudhury, P. (2004). Optimal smoothing in kernel analysis discrimi-

nant, Statistica Sinica 14(2):457-483.

Gosh, A.K. and Hall, P. (2008). On error-rate estimation in nonparametric classiffication,

Statistica Sinica 18:1081{1100.

Gu, C. (1993). Smoothing spline density estimation: A dimensionless automatic algo-

rithm. Journal of the American Statistical Association 88(422):495-504.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning,

Springer, New York.

Hall, P. and Wand, M.P. (1988). On nonparametric discrimination using density differ-

ences. Biometrika 75(3):541-547.

Halvorsen, K. (2015). ElemStatLearn: Data sets, functions and examples from

the book: The Elements of Statistical Learning, Data Mining, Inference, and

Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman, URL

http://cran.r-project.org/web/packages/ElemStatLearn/index.html.

Hayfield, T. and Racine, J.S. (2007). Nonparametric econometrics: the np package,

Journal of Statistical Software 27(5):1-32.

Igarashi, G. and Kakizawa, Y. (2015). Bias correction for some asymmetric kernel esti-

mators, Journal of Statistical Planning and Inference 159:37-63.


Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.