The Impact of Data Splitting on ANN Performance in Predicting Foreign Tourist Visits to Inodnesia
Keywords: Artificial Neural Network (ANN), data splitting, prediction, tourism
Abstract
The data sharing stage is an important step in model building using Artificial Neural Network (ANN) methods to avoid the risk of overfitting and underfitting that can affect model performance. Proper data division aims to ensure that the model can generalize well to data that has never been seen before. Generally, data sharing is done by dividing the dataset into two main parts, namely training and testing data. However, to better address overfitting, there are also those who divide the data into three parts, namely training, testing, and validation. This study aims to evaluate the performance of ANN modelling using these two ways of dividing data. The model is evaluated using Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) metrics to measure prediction error. The data used is data on foreign tourist arrivals to Indonesia, which has a fluctuating pattern and is influenced by calendar effects. The results show that the data division type with two groups generally produces a smaller MAPE value than the data division into three groups. However, the model with two parts of data is not able to capture the seasonal pattern in the data. On the other hand, the model with three parts of data can overcome this problem better. The best model was obtained with the proportion of training data, validation data, and test data of 80%, 10%, and 10%, respectively, which resulted in a MAPE value of 24.45%.
Downloads
References
Amir, F., Utami, E., & Hanafi, H. (2024). Literature Study on the Development of Neural Networks For Weather Forecasting. Jurnal Teknologi, 17(1), 49–57. https://doi.org/10.34151/jurtek.v17i1.4637
Birba, D. E. (2020). A Comparative study of data splitting algorithms for machine learning model selection.
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd Edition). O’Reilly Media, Inc.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. The MIT Press.
Hassoun, M. (2003). Fundamentals of Artificial Neural Networks. The MIT Press.
Kamilia, M., & Yeni, F. (2023). Validasi Data Pelanggan Menggunakan Customer Data Management dan Geographic Information System Melalui Website MyCX dan Starclick. Journal of Network and Computer Applications, 2(1), 37–43.
Maricar, M. A. (2019). Analisa Perbandingan Nilai Akurasi Moving Average dan Exponential Smoothing untuk Sistem Peramalan Pendapatan pada Perusahaan XYZ. Jurnal Sistem Dan Informatika, 13(2), 36–45.
Muraina, I. O. (2022). Ideal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientist and Data Analysts. 7th International Mardin Artuklu Scientific Researches Conference, 496–504.
Nabillah, I., & Ranggadara, I. (2020). Mean Absolute Percentage Error untuk Evaluasi Hasil Prediksi Komoditas Laut. JOINS (Journal of Information System), 5(2), 250–255. https://doi.org/10.33633/joins.v5i2.3900
Niazkar, H. R., & Niazkar, M. (2020). Application of artificial neural networks to predict the COVID-19 outbreak. Global Health Research and Policy, 5(1), 50. https://doi.org/10.1186/s41256-020-00175-y
Prasetyo, V. R., Mercifia, M., Averina, A., Sunyoto, L., & Budiarjo, B. (2022). Prediksi Rating Film Pada Website Imdb Menggunakan Metode Neural Network. NERO (Networking Engineering Research Operation), 7(1), 1–8.
Russell, S., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach. Pearson Education, Inc.
Saikia, P., Baruah, R. D., Singh, S. K., & Chaudhuri, P. K. (2020). Artificial Neural Networks in the domain of reservoir characterization: A review from shallow to deep models. Computers & Geosciences, 135, 104357. https://doi.org/10.1016/j.cageo.2019.104357
Sari, A. S. N., & Setiawan, E. P. (2024). Comparison of Fuzzy Time Series Lee, Chen, and Singh on Forecasting Foreign Tourist Arrivals to Indonesia in 2023. Jurnal Matematika, Statistika Dan Komputasi, 21(1), 10–32. https://doi.org/10.20956/j.v21i1.34914
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56), 1929–1958.
Zhang, X., & Liu, C.-A. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics, 235(1), 280–301. https://doi.org/10.1016/j.jeconom.2022.04.007
Copyright (c) 2025 Jurnal Matematika Sains dan Teknologi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.