NAZIEF-ADRIANI STEMMER DENGAN IMBUHAN TAK BAKU PADA NORMALISASI BAHASA PERCAKAPAN DI MEDIA SOSIAL
Abstract
The use of non-standard language is increasingly prevalent in communication on social media. The use of indefinite language is not limited to sentences, clauses, or phrases but also word usage. In this study, the nonstandard word (NSW) will be normalized to the Indonesian standard word (SW). The Nazief-Adriani stemmer (NAS) method was developed into a nonstandard stemmer (NSS) by increasing its ability to detect non-standard additives. The Needleman-Wunsch similarity algorithm is used to weight the matches. The test results with the Mean Reciprocal Rank (MRR) of 3,438 NSW found that the use of NSS with the number of queries = 9 (Q = 9) had the highest of 79.26% with an average of 50.48%. Meanwhile, MRR testing using NAS with Q = 9 got the highest result of 72.87% and an average of 47.23%. Of the two MRR tests carried out, there were 3 letters that had the highest stemming results, both in tests using NAS and using NSS, namely the initial letters r, f and j. The most significant increase in MRR value occurs in the initial letters 'd', 'n' and 't' which are the initial letters of some non-standard affixes.
Downloads
References
L. Agusta, ‘Perbandingan algoritma stemming Porter dengan algoritma Nazief & Adriani untuk stemming dokumen teks bahasa indonesia’, Konferensi Nasional Sistem dan Informatika, vol. 2009, pp. 196–201, 2009.
D. Wahyudi, T. Susyanto, and D. Nugroho, ‘Implementasi dan analisis algoritma stemming nazief & adriani dan porter pada dokumen berbahasa indonesia’, Jurnal Ilmiah SINUS, vol. 15, no. 2, Art. no. 2, 2017.
M. W. Sardjono, M. Cahyanti, M. Mujahidin, and R. Arianty, ‘Pendeteksi Kesamaan Kata untuk Judul Penulisan Berbahasa Indonesia Menggunakan Algoritma Stemming Nazief-Adriani’, Sebatik, vol. 22, no. 2, Art. no. 2, 2018.
M. A. Saragih, ‘Implementasi Algoritma Brute Force dalam Pecncocokan Teks Font Italic Untuk Kata Berbahasa Inggris pada Dokumen Microsoft Office Word’, Pelita Informatika Budi Darma, vol. 4, pp. 84–86, 2013.
M. R. F. Zen, S. W. Putri, and M. F. Rasyid, Penerapan Algoritma Needleman-Wunsch sebagai Salah Satu Implementasi Program Dinamis pada Pensejajaran DNA dan Protein. Laboratorium Ilmu dan Rekayasa Komputasi, Program Studi Teknik Informatika, 2006.
M. A. Malendes and H. Bunyamin, ‘Analisa Perbandingan dan Implementasi Algoritma DNA Pairwise Sequence Alignment Needleman-Wunsch dan Lempel-Ziv’, Jurnal Teknik Informatika dan Sistem Informasi, vol. 3, no. 1, Art. no. 1, 2017.
A. M. Barik, R. Mahendra, and M. Adriani, ‘Normalization of Indonesian-English Code-Mixed Twitter Data’, in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), 2019, pp. 417–424.
D. Gunawan, Z. Saniyah, and A. Hizriadi, ‘Normalization of Abbreviation and Acronym on Microtext in Bahasa Indonesia by Using Dictionary-Based and Longest Common Subsequence (LCS)’, Procedia Computer Science, vol. 161, pp. 553–559, 2019.
S. A. Ansari, U. Zafar, and A. Karim, ‘Improving text normalization by optimizing nearest neighbor matching’, arXiv preprint arXiv:1712.09518, 2017.
J. Porta and J.-L. Sancho, ‘Word Normalization in Twitter Using Finite-state Transducers.’, Tweet-Norm@ SEPLN, vol. 1086, pp. 49–53, 2013.
N. A. Salsabila, ‘nasalsabila/kamus-alay’, Aug. 19, 2020. https://github.com/nasalsabila/kamus-alay (accessed Oct. 06, 2020).
A. R. Dewi, ‘Penerapan Algoritma Needleman-Wunsch untuk Mengidentifikasi Mutasi pada Sekuen DNA Virus Korona-Application Of Needleman-Wunsch Algorithm To Identify Mutations In Corona Virus DNA Sequences’, PhD Thesis, Institut Teknologi Sepuluh Nopember, 2018.
R. Sunartio, H. N. Palit, and A. Gunawan, ‘Hotel Recommender System Menggunakan Metode Pendekatan Graph pada Dataset Trivago’, Jurnal Infra, vol. 8, no. 1, Art. no. 1, 2020.
Copyright (c) 2021 J-Icon : Jurnal Komputer dan Informatika
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author submitting the manuscript must understand and agree that if accepted for publication, authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.