KAJIAN METODE BERBASIS MODEL PADA ANALISIS KELOMPOK DENGAN PERANGKAT LUNAK MCLUST
Keywords: Algoritma EM, BIC, EM algorithm, K-mean clustering method, Metode berdasarkan model, Metode K-mean, Metode Ward, model-based clustering method, Ward clustering method
Abstract
Ward method and K-mean method are clustering method in which grouping only base on distance measure among observed objects, without considering statistical aspects. Model-based clustering is a method that use statistical aspects, as its theoretical basis i.e. probability maximum criterion. This model has tenmodels with a variety of geometrical characteristics. Data partition is conducted by utilizing EM (expectation-maximization) algorithm. Then by using Bayesian Information Criterion (BIC) the best model is obtained. This research aimed to assess the effectiveness of ten models from the model-based clustereng and then tocompare result of grouping methods between model-based clustering with Ward clustering and K-mean clustering. This study used simulated data and applied data. Simulated data are generated with the R programs versions 2.14.1. Proses analysis was performed by using the Mclust programs vesions 4.0 with an interface the R programs versions 2.14.1. The results showed that model-based clustering was more effective in separating the condition of one separate group and two overlap groups than ward clustering and K-mean clustering.
Metode Ward dan metode K-rataan adalah metode kelompok yang teknik-teknik pengelompokannya hanya memperhatikan ukuran jarak antar objek-objek pengamatan tanpa mempertimbangkan aspek statistiknya. Metode kelompok berbasis model adalah metode kelompok yang didasarkan pada aspek statistik, yaitu kriteria kemungkinan maksimum. Metode kelompok berbasis model mempunyai sepuluh model dengan berbagai macam sifat geometris. Penyekatan data dilakukan dengan menggunakan algoritma Ekspektasi-Maksimum (EM), kemudian dengan pendekatan Bayesian Information Criterion (BIC) diperoleh model terbaik. Penelitian ini bertujuan untuk mengkaji efektivitas dari sepuluh metode berbasis model dan kemudian membandingkan hasil pengelompokannya dengan metode Ward dan metode K-rataan. Penelitian ini menggunakan data simulasi yang dibangkitkan melali program R versi 2.14.1 dan dianalisis dengan menggunakan program Mclust versi 4.0 dengan interface program R. Hasil penelitian menunjukkan bahwa metode kelompok berbasis model lebih efektif memisahkan kelompok-kelompok yang saling tumpang tindih dibandingkan dengan metode gerombol Ward dan K-rataan.
Downloads
References
Branfield, J. D. & Raftery, A. E. (1993) Model-based gaussian and non-gaussian clustering. Biometrics, 49, 803-821.
Dempster, A. P., Laird, N. M., & Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, J. R. Statistics Society B, 39, 1-38.
Fraley, C. & Raftery A.E. (1998). How many cluster? Which clustering method? Answer via model-based cluster analysis. The Computer Journal, 41, 578-588.
Fraley, C. & Raftery, A. E. (1999). MCLUST:Software for model-based
clustering analysis. Journal of Classifications,16, 297-306.
Fraley, C. & Raftery, A. E. (2002). MCLUST: Software for rvlodel-Based clustering, density estimation and discriminant analysis. Technical Report 415, University of Washington, Department of Statistics.
Fraley C, & Raftery A. E. (2010). Mclust version 3 for R: Normal mixture modeling and model-based clustering.†Technical Report 504. University of Washington, Department of Statistics.
Johnson, R. A. & Wichern, D. W. (2007). Applied multivariate statistical analysis, (6th Ed). New Jersey: Prentice-Hall.
Mc Lachlan, G.J. & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker.
Pardede,T. (2008). Perbandingan Metode Berbasis Model (Model-Based) dengan Metode Metode K-rataan dalam Analsis Gugus. Jurnal Sigma, Sains dan Teknologi, 11(2), 157-166 .