MENGATASI MASALAH MULTIKOLINEARITAS DAN OUTLIER DENGAN PENDEKATAN ROBPCA (STUDI KASUS ANALISIS REGRESI ANGKA KEMATIAN BAYI DI JAWA TIMUR)
Keywords: infant mortality rate, multicollinearity, outlier, regression analysis, ROBPCA
Abstract
Multicollinearity and outliers existence in data can be detected by various techniques. Principal Component Analysis (PCA) is one of the statistical techniques that can be used to handle data reduction and multicollinearity problem. However, PCA is very sensitive to outliers as it based on the mean and the covariance matrix. Hubert et al. (2005) developed ROBPCA, a robust PCA to the outliers existence. The ROBPCA combines PP technique and Minimum Covariant Determinant (MCD) method for solving outliers problem. In the present study, ROBPCA is applied to the study case of the regression analysis of infant mortality rate in East Java Province in 2009. The result shows that ROBPCA is more robust compare to PCA when data contains outlier. ROBPCA can explain 85.6 percent of variation by 2 principal components, whereas, PCA needs 3 principal components to explain 86.6 percent of variation. Moreover, ROBPCA produces higher coefficient determination which means the regression model using ROBPCA is better in explaining response variable. The study findings also revealed that the average of duration of exclusive breastfeeding has the largest contribution in lowering infant mortality rate followed by percentage of delivery assisted by medical provider and percentage of households that have access to safe drinking water.
Downloads
References
Hubert, M. Rousseeuw, P.J. & Branden, K.V. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics, 47, 64-79.
Hubert, M. Rousseeuw, P.J. & Van Aelts, S. (2008). High-Breakdown robust multivariate methods. Statistical Science, 23 (1), 92-119.
Jimenez, L.O. & Landgrebe, D, (1995), High dimensional feature reduction via projection pursuit. Purdue University.
Johnson, R.A & Wichern, D.W, (2002), Applied multivariate statistical analysis (5th ed). New Jersey: Prentice Hall.
Naes, T., Isaksson, T., Fearn, T., & Davies, T., (2002), Multivariate calibration and classification. West Sussex: NIR Publication.
Rousseeuw, P.J. & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212-223.
Winarno, D. (2009). Analisis angka kematian bayi di Jawa Timur dengan pendekatan model regresi spasial. Thesis master yang tidak dipublikasikan, Institut Teknologi Surabaya.