POSITION-BASED DYNAMIC WEIGHTING ON APPROXIMATE STRING MATCHING
Abstract
Written communication in social media that emphasizes the speed of information dissemination, the phenomenon of using non-standard language often occurs at the level of sentences, clauses, phrases and words. As a source of data, social media with this phenomenon presents challenges in the process of extracting information. Normalization of non-standard language into standard language begins in the word normalization process where non-standard words (NSW) are normalized to standard forms (standard words (SW)). The normalization process using edit distance has limitations in the process of weighting the static mismatch, match, and gap values. In calculating the mismatch value, statida weighting cannot provide a weight difference due to incorrect keystrokes on the keyboard, especially adjacent keys. Due to the limited edit distance weighting, this research proposes a dynamic weighting method for mismatch weights. The result of this research is that there is a new method of dynamic weighting based on the position of the keyboard keys that can be used to normalize NSW using the approximate string matching method.
Downloads
References
E. D. S. Watie, ‘Komunikasi dan media sosial (communications and social media)’, Jurnal The Messenger, vol. 3, no. 2, pp. 69–74, 2016.
M. E. Yuliana and W. Nugrahaningsih, ‘PENGGUNAAN KATA TIDAK BAKU DI MEDIA SOSIAL INSTAGRAM’, INCONTECSS| ISBN: 978-623-92318-1-1, no. 16 November, pp. 323–327, 2019.
U. C. Zulkifli, ‘Pengembangan Modul PreprocessingTeks untuk Kasus Formalisasi dan Pengecekan Ejaan Bahasa Indonesia pada Aplikasi Web Mining Simple Solution (WMSS)’, Jurnal Matematika, Statistika dan Komputasi, vol. 15, no. 2, pp. 95–103, 2019.
E. Yustika S, ‘Kesalahan pengetikan (typo) seringkali dianggap sepele’, 2017. https://blog.typoonline.com/kesalahan-pengetikan-typo-seringkali-dianggap-sepele-namun-bisa-berakibat-fatal/ (accessed Nov. 17, 2020).
K. N. Lakonawa, S. A. Mola, and A. Fanggidae, ‘NAZIEF-ADRIANI STEMMER DENGAN IMBUHAN TAK BAKU PADA NORMALISASI BAHASA PERCAKAPAN DI MEDIA SOSIAL’, J-Icon: Jurnal Komputer dan Informatika, vol. 9, no. 1, pp. 65–73, 2021.
A. W. R. Riady, ‘Normalisasi Mikroteks Berdasarkan Phonetic pada Twitter Berbahasa Indonesia Menggunakan Algoritma Jaro-Winkler Distance dan Rule Based’, 2019.
S. Priansya, ‘Normalisasi Teks Media Sosial Menggunakan Word2vec, Levenshtein Distance, dan Jaro-Winkler Distance’, PhD Thesis, Institut Teknologi Sepuluh Nopember, 2017.
K. M. M. Aung, ‘Comparison of Levenshtein Distance Algorithm and Needleman-Wunsch Distance Algorithm for String Matching’, PhD Thesis, University of Computer Studies, Yangon, 2019.
Z. Saniyah, ‘Normalisasi Mikroteks Berbentuk Singkatan pada Teks Twitter Berbahasa Indonesia Menggunakan Algoritma Longest Common Subsequences’, 2019.
R. V. Imbar, A. Adelia, M. Ayub, and A. Rehatta, ‘Implementasi Cosine Similarity dan Algoritma Smith-Waterman untuk Mendeteksi Kemiripan Teks’, Jurnal Informatika, vol. 10, no. 1, pp. 31–42, 2014.
F. Marthin, ‘Implemantasi Algoritma Smith Waterman Untuk Pengecekan Kesalahan Ejaan Keyword Query’, Universitas Duta Wacana, Yogyakarta, 2013. [Online]. Available: https://katalog.ukdw.ac.id/4592/1/22094764_bab1_bab5_daftarpustaka.pdf
P. Jokinen, J. Tarhio, and E. Ukkonen, ‘A comparison of approximate string matching algorithms’, Software: Practice and Experience, vol. 26, no. 12, pp. 1439–1458, 1996.
M. M. Hossain, M. F. Labib, A. S. Rifat, A. K. Das, and M. Mukta, ‘Auto-correction of english to bengali transliteration system using levenshtein distance’, in 2019 7th International Conference on Smart Computing & Communications (ICSCC), 2019, pp. 1–5.
Q. Zhou, ‘A New Approach to Sequence Local Alignment: Normalization with Concave Functions’, 2019.
S. B. Needleman and C. D. Wunsch, ‘A general method applicable to the search for similarities in the amino acid sequence of two proteins’, Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.
Copyright (c) 2021 J-Icon : Jurnal Komputer dan Informatika
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author submitting the manuscript must understand and agree that if accepted for publication, authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.