KLASIFIKASI SPAM E-MAIL MENGGUNAKAN METODE TRANSFORMED COMPLEMENT NAÏVE BAYES (TCNB)
Abstract
Classification is one of the ways to organize text so that the texts with the same contents can be grouped in the same category. One of the famous text classification methods is the Naïve Bayes Method. Naïve Bayes has efficient computation and good prediction result however the performance of Naïve Bayes is not really good in classifying unbalanced dataset. This Naïve Bayes method is then modified to overcome the weakness, this modified method is then known as Transformed Complement Naïve Bayes (TCNB) method. In this research, TCNB method was used to the spam e-mails whose dataset were unbalanced and were consisted of 481 dataset in spam e-mail class, and 2412 dataset in legitimate e-mail class (in total, there are 2893 dataset). The classification was done with and without cross validation. The classification with cross validation was done starting from k=2 until k=10. The classification without cross validation was done by dividing the training data by 80% and testing data by 20%. The result showed that the classification by using TCNB with cross validation had its best accuracy level on k=10 by 93,917% and the classification without cross validation had its best accuracy by 92,760%. Thus it can be concluded that TCNB can handle unbalanced dataset with good prediction accuracy.
Downloads
References
[2] Jackson et al,.1989, Similarity Coefficient: Measures of co-occurrence and association or simply measures of occurrence, University of Toronto,Canada.
[3] Graham P., 2002. A Plan for Spam.
[4] Pozzolo, A., Caelen, O. and Bontempi, G., 2012, Comparison of balancing techniques for unbalanced datasets.
[5] Sun,Y.,Mohamed, K, S., Wong, A. K., & Wang, Y., Cost-sensitive Boosting fof Classification of Imbalanced Data. Pattern Recognition Society, 3358-3378.
[6] Kibriya,A., Frank, E., Pfhringer, B. and Holmes, G.,2008, Multinomil naïve Bayes for text categorization revisited.
[7] Rennie, J., Shih, L., Teevan, J.and Karger, D., 2003, Tackling the Poor Assumptions of Naïve Bayes Text Classifier.
[8] Sanu, Anindhyan., 2016, Studi Perbandingan Performansi Multinominal naїve Bayes dan Transformed Complement Naїve Bayes saat klasifikasi teks pada Dataset yang tidak seimbang.
[9] Mahinovs, A. and Tiwari, A., 2007, Text classification method review.
[10] Saad, Omar., Darwish, Asharf., and Faraj, Ramadan., 2012. A survey of Machine Learning Techniques for Spam Filtering, International Journal of Computer Science and Network Security.
[11] Anugroho, Prasetyo.,2010. Klasifikasi e-mail spam dengan metode naїve bayes classifier menggunakan java programming.
[12] Manning, C., Raghavan, P. and Schutze, H., 2009. An introduction to information retrieval.
[13] Full. 1994. Neural Network in Computer Science. Singapura: McGrawHill.
[14] Han, J., and Kamber M. 2006. Data Mining:Concept and Techniques. New York:Morgan Kaufmann Publisher.
[15] Sheu, jyh-jian.2008. An Effecient Two-Phase Spam Filtering Method Based On E-mails Categorization.
Copyright (c) 2019 Jurnal Komputer dan Informatika
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author submitting the manuscript must understand and agree that if accepted for publication, authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.