The Implementation of Random Under-Sampling and Synthetic Minority Oevrsampling Techniques to Evaluate the Performance of the Classification and Regression Tree Method
Keywords: random under sampling (RUS), synthetic minority over-sampling technique (SMOTE), classification and regression tree (CART), data imbalance
Abstract
Class imbalance in datasets poses a significant challenge in the application of classification models, including the Classification and Regression Tree (CART) method. This study aims to evaluate the performance of CART combined with two data balancing techniques: Random Under Sampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE). The data set used in this research is the Heart Failure Clinical Records from Kaggle.com, which exhibits an imbalance where the number of deceased patients is 1,568 records (minority class) and the number of survivors is 3,432 records (majority class), with a total of 5,000 records. The RUS technique reduced the total number of records to 2,526, with each class containing 1,263 records. Conversely, after applying SMOTE, the total number of records increased to 5,474, with each class containing 2,737 records. Model performance evaluation was conducted using precision, recall, and F1-score metrics, both before and after implementing data balancing techniques. The results of the study showed that combining CART with SMOTE produced better performance in recognizing the minority class compared to RUS, achieving accuracy and F1-score of 88.203% and 88.195%, respectively. Meanwhile, RUS achieved an accuracy of 86.345% and an F1-score of 86.332%. Therefore, the use of SMOTE improved model accuracy by approximately 1.85% and F1-score by 1.86% compared to RUS. This study makes a significant contribution to improving prediction accuracy on imbalanced datasets and enriches scientific references related to the application of the CART method and data balancing techniques.
Downloads
References
Adhitya, R. R., Wina Witanti, & Rezki Yuniarti. (2023). Perbandingan metode CART dan Naïve Bayes untuk klasifikasi customer churn. INFOTECH Journal, 9(2), 307–318. https://doi.org/10.31949/infotech.v9i2.5641
Arifiyanti, A. A., & Wahyuni, E. D. (2020). SMOTE: Metode penyeimbang kelas pada klasifikasi data mining. SCAN - Jurnal Teknologi Informasi Dan Komunikasi, 15(1), 34–39. https://doi.org/10.33005/scan.v15i1.1850
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees (1st Edition). Routledge. https://doi.org/10.1201/9781315139470
Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20(1), 16. https://doi.org/10.1186/s12911-020-1023-5
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035
Irawan, E., & Wahono, R. S. (2015). Penggunaan Random Under Sampling untuk penanganan ketidakseimbangan kelas pada prediksi cacat software berbasis neural network. LmuKomputer.Com Journal of Software Engineering, 1(2), 92–100.
Jones, A. H. S., & Makmun, M. S. (2021). Implementasi metode CART untuk klasifikasi diagnosis penyakit hepatitis pada anak. Journal of Informatics Information System Software Engineering and Applications (INISTA), 3(2), 61–70.
Langguth, F., Sunkavalli, K., Hadap, S., & Goesele, M. (2016). Shading-Aware Multi-view Stereo. In Lecture Notes in Computer Science: Vol. vol 9907 (pp. 469–485). Springer. https://doi.org/10.1007/978-3-319-46487-9_29
Mahmood, A. M. (2015). Class Imbalance Learning in Data Mining – A Survey. International Journal of Communication Technology for Social Networking Services, 3(2), 17–36. https://doi.org/10.21742/ijctsns.2015.3.2.02
Pourmoradi, N. (2024). Heart failure clinical records. https://www.kaggle.com/datasets/nimapourmoradi/heart-failure-clinical-records
Prachuabsupakij, W., & Wuttikamonchai, O. (2016). An intelligent system to predict student’s graduation. International Conference on Science, Technology and Innovation for Sustainable Well-Being (STISWB VIII), 1–6.
Saifudin, A., & Wahono, R. S. (2015). Penerapan teknik ensemble untuk menangani ketidakseimbangan kelas pada prediksi cacat software. Journal of Software Engineering, 1(1), 28–37.
Sartono, B., & Syafitri, U. D. (2010). Metode pohon gabungan: solusi pilihan untuk mengatasi kelemahan pohon regresi dan klasifikasi tunggal. Indonesian Journal of Statistics and Its Applications, 15(1), 1–7.
Seiffert, C., Khoshgoftaar, T. M., & Van Hulse, J. (2009). Hybrid sampling for imbalanced data. Integrated Computer-Aided Engineering, 16(3), 193–210. https://doi.org/10.3233/ICA-2009-0314
Siringoringo, R. (2018). Klasifikasi data tidak seimbang menggunakan algoritma SMOTE dan K-Nearest Neighbor. Journal Information System Development, 3(1), 44–49.
Sutoyo, E., & Fadlurrahman, M. A. (2020). Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Television Advertisement Performance Rating Menggunakan Artificial Neural Network. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 6(3), 379–385. https://doi.org/10.26418/jp.v6i3.42896
Wijaya, J. (2019). Implementasi algoritma pohon keputusan cart untuk menentukan klasifikasi data evaluasi mobil [Thesis (Skripsi)]. Universitas Sanata Dharma.
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37. https://doi.org/10.1007/s10115-007-0114-2
Copyright (c) 2025 Jurnal Matematika Sains dan Teknologi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.