Voting-based Approach in Consensus Clustering through q-fold cross-validation


Abstract


Over the past 50 years, extensive research have been carried out to understand how clustering work in classifying data into meaningful groups. Various clustering algorithms and cluster validity indexes have been proposedand improvised to obtain the best clustering result. However, there is noclustering method that is able to give consistent results on similar structureof a dataset. An alternative mechanism to control the variation of resultsand improved the quality of traditional clustering is through consensus clustering. In this paper, we generate multiple partitions of consensus clusteringthrough a resampling method by employing q-fold cross-validation approach.q-fold cross-validation approach is able to speed-up the consensus partitionsprocedure with qth iterations. To encounter with different number of cluster labels occur in the partitions, we employed voting-based method in the second stage of consensus clustering to obtain optimal consensus partition.The performance of optimal consensus partitions is evaluated from Silhouetteplot

DOI Code: 10.1285/i20705948v12n3p657

Keywords: consensus clustering; resampling; k-medoids; optimal consensus partition; voting-based method

References


begin{thebibliography}{}

bibitem[Anderson et~al., 2006]{Anderson2006}

Anderson, B.~J., Gross, D.~S., Musicant, D.~R., Ritz, A.~M., Smith, T.~G., and

Steinberg, L.~E. (2006).

newblock Adapting k-medians to generate normalized cluster centers.

newblock In {em Proceedings of the 2006 SIAM International Conference on Data

Mining}, pages 165--175. SIAM.

bibitem[Arbin et~al., 2015]{Arbin2016}

Arbin, N., Suhaimi, N.~S., Mokhtar, N.~Z., and Othman, Z. (2015).

newblock {Comparative analysis between k-means and k-medoids for statistical

clustering}.

newblock {em Proceedings - AIMS 2015, 3rd International Conference on

Artificial Intelligence, Modelling and Simulation}, pages 117--121.

bibitem[Ben-David et~al., 2006]{Ben-David2006}

Ben-David, S., Luxburg, U., and Pál, D. (2006).

newblock A sober look at clustering stability.

newblock {em Learning Theory}.

bibitem[Ben-david et~al., 2007]{Ben-david2007}

Ben-david, S., Pál, D., and Simon, H.~U. (2007).

newblock Stability of k-means clustering.

newblock In {em Proceedings of the 20th Annual Conference on Learning

Theory}.

bibitem[Bezdek and Pal, 1998]{Bezdek1998}

Bezdek, J.~C. and Pal, N.~R. (1998).

newblock Some new indexes of cluster validity.

newblock {em Part B (Cybernetics) IEEE Transactions on Systems, Man, and

Cybernetics}, 28(3):301--315.

bibitem[Bubeck et~al., 2009]{Bubeck2009}

Bubeck, S., Meila, M., and von Luxburg, U. (2009).

newblock How the initialization affects the stability of the k-means

algorithm.

newblock {em arXiv preprint arXiv:0907.5494}.

bibitem[Celebi et~al., 2013]{Celebi2013}

Celebi, M.~E., Kingravi, H.~A., and Vela, P.~A. (2013).

newblock A comparative study of efficient initialization methods for the

k-means clustering algorithm.

newblock {em Expert Systems with Applications}, 40:200--210.

bibitem[de~Assis and de~Souza, 2011]{Assis2011}

de~Assis, E.~C. and de~Souza, R.~M. (2011).

newblock A k-medoids clustering algorithm for mixed feature-type symbolic

data.

newblock In {em Systems, Man, and Cybernetics (SMC), 2011 IEEE International

Conference on}, pages 527--531. IEEE.

bibitem[Dheeru and Karra~Taniskidou, 2017]{Dheeru2017}

Dheeru, D. and Karra~Taniskidou, E. (2017).

newblock {UCI} machine learning repository.

bibitem[Dresen et~al., 2008]{Dresen2008}

Dresen, I. M.~G., Boes, T., Huesing, J., Neuhaeuser, M., and Joeckel, K.-H.

(2008).

newblock New resampling method for evaluating stability of clusters.

newblock {em BMC bioinformatics}, 9(1):42.

bibitem[Dudoit and Fridlyand, 2002]{Dudoit2002}

Dudoit, S. and Fridlyand, J. (2002).

newblock A prediction-based resampling method for estimating the number of

clusters in a dataset.

newblock {em Genome biology}, 3(7):research0036--1.

bibitem[Dudoit and Fridlyand, 2003]{Dudoit2003}

Dudoit, S. and Fridlyand, J. (2003).

newblock Bagging to improve the accuracy of a clustering procedure.

newblock {em Bioinformatics}, 19:1090--1099.

bibitem[Fred and Jain, 2002a]{Fred2002a}

Fred, A. and Jain, A.~K. (2002a).

newblock Evidence accumulation clustering based on the k-means algorithm.

newblock In {em Joint IAPR International Workshops on Statistical Techniques

in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition

(SSPR)}, pages 442--451. Springer.

bibitem[Fred and Jain, 2002b]{Fred2002}

Fred, A. L.~N. and Jain, A.~K. (2002b).

newblock Data clustering using evidence accumulation.

newblock In {em Proc. Object recognition supported by user interaction for

service robots}, volume~4, pages 276--280 vol.4.

bibitem[Fred and Jain, 2005]{Fred2005}

Fred, A. L.~N. and Jain, A.~K. (2005).

newblock Combining multiple clusterings using evidence accumulation.

newblock {em IEEE Transactions on Pattern Analysis and Machine Intelligence},

(6):835--850.

bibitem[Goder and Filkov, 2008]{Goder2008}

Goder, A. and Filkov, V. (2008).

newblock Consensus clustering algorithms: Comparison and refinement.

newblock In {em Proceedings of the Meeting on Algorithm Engineering &

Expermiments}, pages 109--117. Society for Industrial and Applied

Mathematics.

bibitem[Guyon et~al., 2009]{Guyon2009}

Guyon, I., Von~Luxburg, U., and Williamson, R.~C. (2009).

newblock Clustering: Science or art.

newblock In {em NIPS 2009 workshop on clustering theory}, pages 1--11.

bibitem[Kiselev et~al., 2016]{Kiselev}

Kiselev, V.~Y., Kirschner, K., Schaub, M.~T., Andrews, T., Yiu, A., Chandra,

T., Natarajan, K.~N., Reik, W., Barahona, M., Green, A.~R., and Hemberg, M.

(2016).

newblock Sc3 - consensus clustering of single-cell rna-seq data.

newblock {em Nature methods}, 14(5):483.

bibitem[Kuncheva and Vetrov, 2006]{Kuncheva2006}

Kuncheva, L.~I. and Vetrov, D.~P. (2006).

newblock Evaluation of stability of k-means cluster ensembles with respect to

random initialization.

newblock {em IEEE Transactions on Pattern Analysis and Machine Intelligence},

(11):1798--1808.

bibitem[Lancichinetti and Fortunato, 2012]{Lancichinetti2012}

Lancichinetti, A. and Fortunato, S. (2012).

newblock Consensus clustering in complex networks.

newblock {em Scientific reports}, 2:336.

bibitem[Liu et~al., 2016]{Liu2016a}

Liu, A., Zou, S., Qiu, T., and Bai, X. (2016).

newblock Research on k-medoids clustering algorithm based on data density and

its parallel processing based on mapreduce.

newblock {em Journal of Residuals Science & Technology}, 13(7):e4015.

bibitem[Lord et~al., 2017]{Lord2017}

Lord, E., Willems, M., Lapointe, F.-J., and Makarenkov, V. (2017).

newblock Using the stability of objects to determine the number of clusters in

datasets.

newblock {em Information Sciences}, 393:29--46.

bibitem[Madhuri et~al., 2014]{Madhuri2014}

Madhuri, R., Murty, M.~R., Murthy, J., Reddy, P.~P., and Satapathy, S.~C.

(2014).

newblock {Cluster analysis on different data sets using k-modes and

k-prototype algorithms}.

newblock In {em ICT and Critical Infrastructure: Proceedings of the 48th

Annual Convention of Computer Society of India}, pages 137--144. Springer.

bibitem[Melnykov et~al., 2012]{Melnykov2012}

Melnykov, V., Chen, W.-C., and Maitra, R. (2012).

newblock Mixsim: An r package for simulating data to study performance of

clustering algorithms.

newblock {em Journal of Statistical Software}, 51(12):1.

bibitem[Milligan and Cooper, 1987]{Milligan1987}

Milligan, G.~W. and Cooper, M.~C. (1987).

newblock Methodology review: Clustering methods.

newblock {em Applied Psychological Measurement}, 11:329--354.

bibitem[Minaei-Bidgoli et~al., 2004]{Minaei-Bidgoli2004}

Minaei-Bidgoli, B., Topchy, A., and Punch, W.~F. (2004).

newblock Ensembles of partitions via data resampling.

newblock In {em Proc. ITCC 2004. Int. Conf. Information Technology: Coding

and Computing}, volume~2, pages 188--192 Vol.2.

bibitem[Monti, 2003]{Monti2003}

Monti, S. (2003).

newblock Consensus clustering: a resampling-based method for class discovery

and visualization of gene expression microarray data.

newblock {em Machine Learning}, 52(1/2):91--118.

bibitem[Nguyen and Caruana, 2007]{Nguyen2007}

Nguyen, N. and Caruana, R. (2007).

newblock Consensus clusterings.

newblock In {em Proc. Seventh IEEE Int. Conf. Data Mining (ICDM 2007)}, pages

--612.

bibitem[Novoselova and Tom, 2012]{Novoselova2012}

Novoselova, N. and Tom, I. (2012).

newblock Entropy-based cluster validation and estimation of the number of

clusters in gene expression data.

newblock {em Journal of bioinformatics and computational biology},

(5):1250011.

bibitem[Park and Jun, 2009]{Park2009a}

Park, H.~S. and Jun, C.~H. (2009).

newblock {A simple and fast algorithm for K-medoids clustering}.

newblock {em Expert Systems with Applications}, 36:3336--3341.

bibitem[Raykov et~al., 2016]{Raykov2016}

Raykov, Y.~P., Boukouvalas, A., Baig, F., and Little, M.~A. (2016).

newblock What to do when k-means clustering fails: a simple yet principled

alternative algorithm.

newblock {em PLoS ONE}, 11(9):e0162259.

bibitem[Rend{'o}n et~al., 2011]{Rendon2011}

Rend{'o}n, E., Abundez, I., Arizmendi, A., and Quiroz, E.~M. (2011).

newblock Internal versus external cluster validation indexes.

newblock {em International Journal of computers and communications},

(1):27--34.

bibitem[Risso et~al., 2018]{Risso2018}

Risso, D., Purvis, L., Fletcher, R., Das, D., Ngai, J., Dudoit, S., and Purdom,

E. (2018).

newblock clusterexperiment and rsec: A bioconductor package and framework for

clustering of single-cell and other large gene expression datasets.

newblock {em bioRxiv}, page 280545.

bibitem[Rousseeuw, 1987]{Rousseeuw1987}

Rousseeuw, P.~J. (1987).

newblock Silhouettes: A graphical aid to the interpretation and validation of

cluster analysis.

newblock {em Journal of computational and applied mathematics}, 20:53--65.

bibitem[Saeed et~al., 2012]{Saeed2012}

Saeed, F., Salim, N., and Abdo, A. (2012).

newblock Voting-based consensus clustering for combining multiple clusterings

of chemical structures.

newblock {em Journal of Cheminformatics}, 4(1):1.

bibitem[Saeed et~al., 2013]{Saeed2013c}

Saeed, F., Salim, N., and Abdo, A. (2013).

newblock Consensus methods for combining multiple clusterings of chemical

structures.

newblock {em Journal of Chemical Information and Modeling}, 53(5):1026--1034.

bibitem[Topchy et~al., 2004]{Topchy2004a}

Topchy, A., Jain, A.~K., and Punch, W. (2004).

newblock A mixture model for clustering ensembles.

newblock In {em Proceedings of the 2004 SIAM international conference on data

mining}, pages 379--390. SIAM.

bibitem[Topchy et~al., 2005]{Topchy2005}

Topchy, A., Jain, A.~K., and Punch, W. (2005).

newblock Clustering ensembles: Models of consensus and weak partitions.

newblock {em IEEE transactions on pattern analysis and machine intelligence},

(12):1866--1881.

bibitem[Vega-Pons and Ruiz-Shulcloper, 2011]{Vega-Pons2011}

Vega-Pons, S. and Ruiz-Shulcloper, J. (2011).

newblock {A survey of clustering ensemble algorithms}.

newblock {em International Journal of Pattern Recognition and Artificial

Intelligence}, 25(03):337--372.

bibitem[von Luxburg, 2010]{Luxburg2010}

von Luxburg, U. (2010).

newblock Clustering stability: An overview.

newblock {em Foundations and Trends in Machine Learning}, 2(3):235--274.

bibitem[Wang, 2010]{Wang2010}

Wang, J. (2010).

newblock Consistent selection of the number of clusters via crossvalidation.

newblock {em Biometrika}, 97(4):893--904.

bibitem[Xie et~al., 2011]{Xie2011}

Xie, J., Jiang, S., Xie, W., and Gao, X. (2011).

newblock {An efficient global K-means clustering algorithm}.

newblock {em Journal of Computers}, 6(2):271--279.

bibitem[Yang, 2016]{Yang2016}

Yang, Y. (2016).

newblock {em Temporal Data Mining Via Unsupervised Ensemble Learning}.

newblock Elsevier.

bibitem[Zhong et~al., 2017]{ZHONG2017}

Zhong, X., Yu, T., and Xia, H. (2017).

newblock A new partition-based clustering algorithm for mixed data.

newblock In {em Proceedings of the International MultiConference of Engineers

and Computer Scientists}, volume~1.

bibitem[Șenbabao{u{g}}lu et~al., 2014]{Senbabaoglu2014}

Șenbabao{u{g}}lu, Y., Michailidis, G., and Li, J.~Z. (2014).

newblock Critical limitations of consensus clustering in class discovery.

newblock {em Scientific reports}, 4:6207.

end{thebibliography}


Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.