The Impact of Data Splitting on ANN Performance in Predicting Foreign Tourist Visits to Inodnesia

Akbar Rizki; Muhammad Dzakwan Alifi; Haidar Ramdhani; Lilis Indra Purnama; Shalma Kaisya Candradewi; Farid Yafi Suwandi; Adelia Putri Pangestika

doi:10.33830/jmst.v26i1.11104.2025

Akbar Rizki
IPB University
Muhammad Dzakwan Alifi
IPB University
Haidar Ramdhani
IPB University
Lilis Indra Purnama
IPB University
Shalma Kaisya Candradewi
IPB University
Farid Yafi Suwandi
IPB University
Adelia Putri Pangestika
IPB University

Keywords: Artificial Neural Network (ANN), data splitting, prediction, tourism

Abstract

The data sharing stage is an important step in model building using Artificial Neural Network (ANN) methods to avoid the risk of overfitting and underfitting that can affect model performance. Proper data division aims to ensure that the model can generalize well to data that has never been seen before. Generally, data sharing is done by dividing the dataset into two main parts, namely training and testing data. However, to better address overfitting, there are also those who divide the data into three parts, namely training, testing, and validation. This study aims to evaluate the performance of ANN modelling using these two ways of dividing data. The model is evaluated using Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) metrics to measure prediction error. The data used is data on foreign tourist arrivals to Indonesia, which has a fluctuating pattern and is influenced by calendar effects. The results show that the data division type with two groups generally produces a smaller MAPE value than the data division into three groups. However, the model with two parts of data is not able to capture the seasonal pattern in the data. On the other hand, the model with three parts of data can overcome this problem better. The best model was obtained with the proportion of training data, validation data, and test data of 80%, 10%, and 10%, respectively, which resulted in a MAPE value of 24.45%.

Downloads

Download data is not yet available.

References

Amir, F., Utami, E., & Hanafi, H. (2024). Literature Study on the Development of Neural Networks For Weather Forecasting. Jurnal Teknologi, 17(1), 49–57. https://doi.org/10.34151/jurtek.v17i1.4637

Birba, D. E. (2020). A Comparative study of data splitting algorithms for machine learning model selection.

Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd Edition). O’Reilly Media, Inc.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. The MIT Press.

Hassoun, M. (2003). Fundamentals of Artificial Neural Networks. The MIT Press.

Kamilia, M., & Yeni, F. (2023). Validasi Data Pelanggan Menggunakan Customer Data Management dan Geographic Information System Melalui Website MyCX dan Starclick. Journal of Network and Computer Applications, 2(1), 37–43.

Maricar, M. A. (2019). Analisa Perbandingan Nilai Akurasi Moving Average dan Exponential Smoothing untuk Sistem Peramalan Pendapatan pada Perusahaan XYZ. Jurnal Sistem Dan Informatika, 13(2), 36–45.

Muraina, I. O. (2022). Ideal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientist and Data Analysts. 7th International Mardin Artuklu Scientific Researches Conference, 496–504.

Nabillah, I., & Ranggadara, I. (2020). Mean Absolute Percentage Error untuk Evaluasi Hasil Prediksi Komoditas Laut. JOINS (Journal of Information System), 5(2), 250–255. https://doi.org/10.33633/joins.v5i2.3900

Niazkar, H. R., & Niazkar, M. (2020). Application of artificial neural networks to predict the COVID-19 outbreak. Global Health Research and Policy, 5(1), 50. https://doi.org/10.1186/s41256-020-00175-y

Prasetyo, V. R., Mercifia, M., Averina, A., Sunyoto, L., & Budiarjo, B. (2022). Prediksi Rating Film Pada Website Imdb Menggunakan Metode Neural Network. NERO (Networking Engineering Research Operation), 7(1), 1–8.

Russell, S., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach. Pearson Education, Inc.

Saikia, P., Baruah, R. D., Singh, S. K., & Chaudhuri, P. K. (2020). Artificial Neural Networks in the domain of reservoir characterization: A review from shallow to deep models. Computers & Geosciences, 135, 104357. https://doi.org/10.1016/j.cageo.2019.104357

Sari, A. S. N., & Setiawan, E. P. (2024). Comparison of Fuzzy Time Series Lee, Chen, and Singh on Forecasting Foreign Tourist Arrivals to Indonesia in 2023. Jurnal Matematika, Statistika Dan Komputasi, 21(1), 10–32. https://doi.org/10.20956/j.v21i1.34914

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56), 1929–1958.

Zhang, X., & Liu, C.-A. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics, 235(1), 280–301. https://doi.org/10.1016/j.jeconom.2022.04.007

Published Mar 31, 2025

DOI:

https://doi.org/10.33830/jmst.v26i1.11104.2025

How to Cite

Rizki, A., Alifi, M. D., Ramdhani, H., Purnama, L. I., Candradewi, S. K., Suwandi, F. Y., & Pangestika, A. P. (2025). The Impact of Data Splitting on ANN Performance in Predicting Foreign Tourist Visits to Inodnesia. Jurnal Matematika Sains Dan Teknologi, 26(1), 36–46. https://doi.org/10.33830/jmst.v26i1.11104.2025

Download Citation

Issue

Vol. 26 No. 1 (2025)

Section

Articles

License & Copyright Holder

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.