PENGGUNAAN N-MERS FREQUENCY DAN ALGORITMA AGNES UNTUK PEMBENTUKAN POHON FILOGENETIK PADA VIRUS MEMATIKAN
Keywords: n-mers frequency, AGNES, DNA sequences, deadly viruses
Abstract
Every organism has DNA (deoxyribonucleic acid) which carries genetic information. One of the methods for analyzing strings of DNA sequences is n-mers frequency. It is a data mining method on strings of DNA sequences that is converted into numerical data. We studied 13 deadly viruses, consisting of Rabies, HIV, Ebola, Smallpox, Marburg, Herpes B, Lujo, Avian Influenza, Spanish Flu H1N1, Dengue, HPV, SARS-CoV, and SARS-CoV-2. This study aims to establish the phylogenetic tree and find out the genetic relationship of the deadly viruses. The first method we used are collecting viral DNA sequences from the NCBI database. Afterward, the strings of DNA sequences were converted into numerical data using the n-mers frequency. After that, the dissimilarity matrix was calculated and the phylogenetic tree was established using the AGNES algorithm. Based on the phylogenetic tree, the aforementioned 13 viruses were classified into three clusters, namely cluster 1 from the realm Riboviria, cluster 2 from the realm Duplodnaviria and cluster 3 from the realm Varidnaviria. The clustering results of 13 viruses are valid because each virus is clustered based on its taxon. In addition, viruses that have the closest genetic relationship are grouped first, while viruses that have the distant genetic relationship are grouped later.
Downloads
References
Aggarwal, C. C., & Reddy, C. K. (2014). Data Clustering Algorithms and Applications. In CRC Press.
Baskara, B. (2020, April 18). Rangkaian Peristiwa Pertama Covid-19. Kompas.Com. https://bebas.kompas.id/baca/riset/2020/04/18/rangkaian-peristiwa-pertama-covid-19/
Bustamam, A., Fitria, I., & Umam, K. (2017). Application of Agglomerative Clustering for Analyzing Phylogenetically on Bacterium of Saliva. 030126, 030126–1. https://doi.org/10.1063/1.4991230
Chisholm, S. J., & Wordsworth, S. (2016). Annual Report of the Chief Medical Officer 2016.
Chor, B., Horn, D., Goldman, N., Levy, Y., & Massingham, T. (2009). Genomic DNA k-mer spectra: Models and modalities. Genome Biology, 10(10). https://doi.org/10.1186/gb-2009-10-10-r108
Gollin, S. M. (2015). Epidemiology of HPV-Associated Oropharyngeal Squamous Cell Carcinoma. In Human Papillomavirus (HPV)-Associated Oropharyngeal Cancer (pp. 1–23). Springer International Publishing. https://doi.org/10.1007/978-3-319-21100-8_1
Gorbalenya, A. E., Krupovic, M., Siddell, S., Varsani, A., & Kuhn, J. H. (2018). Riboviria: establishing a single taxon that comprises RNA viruses at the basal rank of virus taxonomy. International Committee on Taxonomy of Viruses (ICTV). https://talk.ictvonline.org/taxonomy/p/taxonomy-history?taxnode_id=202007095
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques Third Edition. Morgan Kaufmann Publishers.
ICTV. (2020). International Committee on Taxonomy of Viruses (ICTV). https://talk.ictvonline.org/taxonomy/
Juniman, P. T. (2018, February 21). 10 Virus Paling Mematikan di Dunia. CNN Indonesia. https://www.cnnindonesia.com/gaya-hidup/20180220211154-255-277570/10-virus-paling-mematikan-di-dunia
Kassambara, A. (2019). Hierarchical Clustering in R: The Essentials. https://www.datanovia.com/en/lessons/agglomerative-hierarchical-clustering/
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data : an introduction to cluster analysis. Wiley.
Koonin, E., Dolja, V., Krupovic, M., Varsani, A., Wolf, Y., Yutin, N., Zerbini, M., & Kuhn, J. (2019a). Create a megataxonomic framework, filling all principal/primary taxonomic ranks, for dsDNA viruses encoding HK97-type major capsid proteins. International Committee on Taxonomy of Viruses (ICTV). https://talk.ictvonline.org/taxonomy/p/taxonomy-history?taxnode_id=202007117
Koonin, E., Dolja, V., Krupovic, M., Varsani, A., Wolf, Y., Yutin, N., Zerbini, M., & Kuhn, J. (2019b). Create a megataxonomic framework, filling all principal taxonomic ranks, for DNA viruses encoding vertical jelly roll-type major capsid proteins. International Committee on Taxonomy of Viruses (ICTV). https://talk.ictvonline.org/taxonomy/p/taxonomy-history?taxnode_id=202008702
Mahdiyah, U., Wahyuniar, L. S., Rochana, S., Informatika, T., Teknik, F., & Kediri, K. (2019). KLASIFIKASI DNA MENGGUNAKAN FITUR N-MERS DENGAN INTEGRASI. JOUTICA, 4(2), 225–228.
Mount, D. W. (2004). Bioinformatics : sequence and genome analysis. Cold Spring Harbor Laboratory Press.
NCBI. (2015). Human papillomavirus type 16 isolate CNA34, complete genome. https://www.ncbi.nlm.nih.gov/nuccore/KP212153.1/
NCBI. (2020). Nucleotide - National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/nuccore
NLM. (2021). Congressional Justification FY 2021 - Department of Health and Human Services National Institutes of Health National Library of Medicine (NLM). https://www.nlm.nih.gov/about/2021CJ.html
Prasetyanto, A. (2020, April 20). 12 Virus Paling Mematikan di Dunia: Corona hingga Ebola. Kumparan.Com. https://kumparan.com/kumparansains/12-virus-paling-mematikan-di-dunia-corona-hingga-ebola-1tFSaC1xXwG/full
Umam, K., & Sagara, R. (2020). Penggunaan N-mers Frequency pada Analisis Barisan DNA. Jambura Journal of Mathematics, 2(2), 73–86. https://doi.org/10.34312/jjom.v2i2.4320
Walker, P. J., Siddell, S. G., Lefkowitz, E. J., Mushegian, A. R., Adriaenssens, E. M., Dempsey, D. M., Dutilh, B. E., Harrach, B., Harrison, R. L., Hendrickson, R. C., Junglen, S., Knowles, N. J., Kropinski, A. M., Krupovic, M., & Kuhn, J. H. (2020). Changes to virus taxonomy and the Statutes ratified by the International Committee on Taxonomy of Viruses ( 2020 ). Archives of Virology, 165(11), 2737–2748. https://doi.org/10.1007/s00705-020-04752-x
Widya Putri, A. (2019, November 16). Sejarah Epidemi SARS: Bukti Wabah Virus yang Tak Pernah Berakhir. Tirto.ID. https://tirto.id/sejarah-epidemi-sars-bukti-wabah-virus-yang-tak-pernah-berakhir-elth
Copyright (c) 2020 Jurnal Matematika Sains dan Teknologi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.