An Enhanced Fuzzy K-means Clustering with Application to Missing Data Imputation


Abstract


In this paper an adjustment on the Fuzzy K-means (FKM) clustering method was suggested to improve the process of clustering. Also, a novel technique for missing data imputation was proposed and it was implemented twice: (1) using FKM and (2) using the Enhanced Fuzzy K-means (EFKM) clustering. The suggested model for imputing missing data consists of three phases: (1) Input Vectors Partitioning, (2) Enhanced Fuzzy Clustering, and(3) Missing Data Imputation. The implementation and experiments showed a clear improvement in the imputation accuracy in favor of the EFKM according to the value of RMSE.

DOI Code: 10.1285/i20705948v11n2p674

Keywords: Missing Data Imputation, Cluster Analysis, Fuzzy K-means clustering, Data mining, Fuzzy sets, Fuzzy C-means.

References


Aggarwal, C. C., Philip, S. Y., Han, J., and Wang, J. (2003). -a framework for clustering evolving data streams. In Proceedings 2003 VLDB Conference, pages 81-92. Elsevier.

Baid, U., Talbar, S., and Talbar, S. (2017). Comparative study of k-means, gaussian mixture model, fuzzy c-means algorithms for brain tumor segmentation.

Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York.

Bezdek, J. C., Ehrlich, R., and Full, W. (1984). Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3):191-203.

Chappell, M. A., Okell, T. W., Jezzard, P., and Woolrich, M. W. (2010). A general framework for the analysis of vessel encoded arterial spin labeling for vascular territory mapping. Magnetic resonance in medicine, 64(5):1529-1539.

Dunn, J. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern., 3(3):32-57.

Gharehchopogh, F., Jabbari, N., and Ghaari, A. Z. (2012). Evaluation of fuzzy k-means and k-means clustering algorithms in intrusion detection systems. International Journal of Scientic Technology Research (IJSTR), ISSN 2277-8616, 1(11):66-72.

Huang, Z. and Ng, M. K. (1999). A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 7(4):446-452.

Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651-666.

Li, D., Deogun, J., Spaulding, W., and Shuart, B. (2004). Towards missing data imputation: A study of fuzzy k-means clustring menthod. Proceedings of 4th international conference of rough sets and current trends in computing (RSCTC), pages 573-579.

Li, J. and Lewis, H. (2016). Fuzzy clustering algorithmsreview of the applications. Smart Cloud (SmartCloud), IEEE International Conference on. IEEE.

Rahmani, M. K. I., Pal, N., and Arora, K. (2014). Clustering of image data using k-means and fuzzy k-means. (IJACSA) International Journal of Advanced Computer Science and Applications, 5(7).

Salleh, M. and Samat, N. (2017). An imputation for missing data features based on fuzzy swarm approach in heart disease classication. ICSI 2017: Advances in Swarm

Intelligence, pages 285-292.

Sarkar, M. and Leong, T. (2001). Fuzzy k-means clustering with missing values. Proceedings of the American Medical Informatics Association, Annual Symposium, pages 588-592.

Singpurwalla, N. D. and Booker, J. M. (2004). Membership functions and probability measures of fuzzy sets. Journal of the American Statistical Association, 99(467):867-877.

Tang, J., Wang, H., Wang, Y., Liu, X., and Liu, F. (2014). Hybrid prediction approach based on weekly similarities of traffic flow for different temporal scales. Transp. Res. Rec. J. Transp. Res. Board, (2443):21-31.

Tran, D. and Wagner, M. (1999). A robust clustering approach to fuzzy gaussian mixture models for speaker identication. In Knowledge-Based Intelligent Information Engineering Systems, 1999. Third International Conference, pages 337-340. IEEE.

Wu, B., Wang, L., and Xu, C. (2009). Possibilistic clustering using non-euclidean distance. In Control and Decision Conference, 2009. CCDC'09. Chinese, pages 938-940.IEEE.

Wu, K.-L. and Yang, M.-S. (2002). Alternative c-means clustering algorithms. Pattern recognition, 35(10):2267-2278.

Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8:338-353.

Zhang, L., Lu, W., and Liu, X. (2016). Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowl Based Syst, 99:51-70.


Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.