POSITION-BASED DYNAMIC WEIGHTING ON APPROXIMATE STRING MATCHING

  • Sebastianus A S Mola(1)
    Universitas Nusa Cendana
  • Meiton Boru(2)
    Universitas Nusa Cendana
  • Emerensye Sofia Yublina Pandie(3*)
    Universitas Nusa Cendana
  • (*) Corresponding Author
Keywords: dynamic weighting, mismatch weight, NSW normalization, approximate string matching

Abstract

Written communication in social media that emphasizes the speed of information dissemination, the phenomenon of using non-standard language often occurs at the level of sentences, clauses, phrases and words. As a source of data, social media with this phenomenon presents challenges in the process of extracting information. Normalization of non-standard language into standard language begins in the word normalization process where non-standard words (NSW) are normalized to standard forms (standard words (SW)). The normalization process using edit distance has limitations in the process of weighting the static mismatch, match, and gap values. In calculating the mismatch value, statida weighting cannot provide a weight difference due to incorrect keystrokes on the keyboard, especially adjacent keys. Due to the limited edit distance weighting, this research proposes a dynamic weighting method for mismatch weights. The result of this research is that there is a new method of dynamic weighting based on the position of the keyboard keys that can be used to normalize NSW using the approximate string matching method.

Downloads

Download data is not yet available.

References

E. D. S. Watie, ‘Komunikasi dan media sosial (communications and social media)’, Jurnal The Messenger, vol. 3, no. 2, pp. 69–74, 2016.

M. E. Yuliana and W. Nugrahaningsih, ‘PENGGUNAAN KATA TIDAK BAKU DI MEDIA SOSIAL INSTAGRAM’, INCONTECSS| ISBN: 978-623-92318-1-1, no. 16 November, pp. 323–327, 2019.

U. C. Zulkifli, ‘Pengembangan Modul PreprocessingTeks untuk Kasus Formalisasi dan Pengecekan Ejaan Bahasa Indonesia pada Aplikasi Web Mining Simple Solution (WMSS)’, Jurnal Matematika, Statistika dan Komputasi, vol. 15, no. 2, pp. 95–103, 2019.

E. Yustika S, ‘Kesalahan pengetikan (typo) seringkali dianggap sepele’, 2017. https://blog.typoonline.com/kesalahan-pengetikan-typo-seringkali-dianggap-sepele-namun-bisa-berakibat-fatal/ (accessed Nov. 17, 2020).

K. N. Lakonawa, S. A. Mola, and A. Fanggidae, ‘NAZIEF-ADRIANI STEMMER DENGAN IMBUHAN TAK BAKU PADA NORMALISASI BAHASA PERCAKAPAN DI MEDIA SOSIAL’, J-Icon: Jurnal Komputer dan Informatika, vol. 9, no. 1, pp. 65–73, 2021.

A. W. R. Riady, ‘Normalisasi Mikroteks Berdasarkan Phonetic pada Twitter Berbahasa Indonesia Menggunakan Algoritma Jaro-Winkler Distance dan Rule Based’, 2019.

S. Priansya, ‘Normalisasi Teks Media Sosial Menggunakan Word2vec, Levenshtein Distance, dan Jaro-Winkler Distance’, PhD Thesis, Institut Teknologi Sepuluh Nopember, 2017.

K. M. M. Aung, ‘Comparison of Levenshtein Distance Algorithm and Needleman-Wunsch Distance Algorithm for String Matching’, PhD Thesis, University of Computer Studies, Yangon, 2019.

Z. Saniyah, ‘Normalisasi Mikroteks Berbentuk Singkatan pada Teks Twitter Berbahasa Indonesia Menggunakan Algoritma Longest Common Subsequences’, 2019.

R. V. Imbar, A. Adelia, M. Ayub, and A. Rehatta, ‘Implementasi Cosine Similarity dan Algoritma Smith-Waterman untuk Mendeteksi Kemiripan Teks’, Jurnal Informatika, vol. 10, no. 1, pp. 31–42, 2014.

F. Marthin, ‘Implemantasi Algoritma Smith Waterman Untuk Pengecekan Kesalahan Ejaan Keyword Query’, Universitas Duta Wacana, Yogyakarta, 2013. [Online]. Available: https://katalog.ukdw.ac.id/4592/1/22094764_bab1_bab5_daftarpustaka.pdf

P. Jokinen, J. Tarhio, and E. Ukkonen, ‘A comparison of approximate string matching algorithms’, Software: Practice and Experience, vol. 26, no. 12, pp. 1439–1458, 1996.

M. M. Hossain, M. F. Labib, A. S. Rifat, A. K. Das, and M. Mukta, ‘Auto-correction of english to bengali transliteration system using levenshtein distance’, in 2019 7th International Conference on Smart Computing & Communications (ICSCC), 2019, pp. 1–5.

Q. Zhou, ‘A New Approach to Sequence Local Alignment: Normalization with Concave Functions’, 2019.

S. B. Needleman and C. D. Wunsch, ‘A general method applicable to the search for similarities in the amino acid sequence of two proteins’, Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.

PlumX Metrics

Published
2021-10-13
How to Cite
[1]
S. Mola, M. Boru, and E. Pandie, “POSITION-BASED DYNAMIC WEIGHTING ON APPROXIMATE STRING MATCHING”, jicon, vol. 9, no. 2, pp. 168-175, Oct. 2021.
Section
Articles

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.