Data Resampling Approach to Handle the Imbalanced Class Problem
Main Article Content
Abstract
Imbalanced class problem (machine learning) is a problem that arises because of the significant difference in the number of instances between the minority class and the majority class. Imbalanced class ratio makes the classifier do the wrong decision when classifying, which tends to prefer the majority class and ignore the minority class. To tackle this problem, we use a data resampling approach that use 6 types of popular data resampling techniques, such as: (i) random oversampling (ROS), (ii) random undersampling (RUS), (iii) synthetic minority oversampling technique (SMOTE), (iv) adaptive synthetic sampling (ADASYN), (v) SMOTETomek, and (vi) SMOTEENN to balance the ratio of the number of instances of 15 types of datasets. Furthermore, this balanced dataset is classified using a random forest classifier. The metric used as a performance measurement tool is the geometric mean (G-Mean). To compare the performance of the 6 types of data resampling techniques, these G-Mean values were tested using Friedman's nonparametric statistical test, and if the null hypothesis was rejected, it was continued with Nemenyi's Post Hoc statistical test. Based on mean of ranks values, the best resampling technique is SMOTEENN (1.700), ADASYN (2.767), RUS (3.333), SMOTETomek (3.867), SMOTE (4.000), ROS (5.333).
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
The author who submits the manuscript must understand and agree that if accepted for publication, the copyright of the article belongs to JICON and Nusa Cendana University as the journal publisher. Copyright (copyright) includes the exclusive right to reproduce and provide articles in all forms and media, including reprints, photographs, microfilm and any other similar reproductions, as well as translations. The author has the right for the following:
1. reproduce all or part of published material for the author's own use as classroom teaching materials or oral presentation materials in various forums;
2. reuse part or all of the material as compilation material for the author's written work;
2. make copies of published material for distribution within the institution where the author works.
JICON and Nusa Cendana University and Editors make every effort to ensure that no data, opinion or statement is wrong or misleading to be published in this journal. The content of articles published on JICON is the sole and exclusive responsibility of their respective authors.
References
[2] B. Krawczyk, “Learning from Imbalanced Data: Open Challenges and Future Directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232, 2016, doi: 10.1007/s13748-016-0094-0.
[3] V. Ganganwar, “An overview of classification algorithms for imbalanced datasets,” International Journal of Emerging Technology and Advanced Engineering, vol. 2, no. 4, pp. 42–47, 2012.
[4] M. Fatourechi, R. K. Ward, S. G. Mason, J. Huggins, A. Schloegl, and G. E. Birch, “Comparison of evaluation metrics in classification applications with imbalanced datasets,” in 2008 seventh international conference on machine learning and applications, 2008, pp. 777–782. doi: 10.1109/ICMLA.2008.34.
[5] J. Brownlee, Imbalanced classification with python: Better metrics, balance skewed classes, cost-sensitive learning. Machine Learning Mastery, 2020.
[6] V. López, A. Fernández, S. García, V. Palade, and F. Herrera, “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics,” Information sciences, vol. 250, pp. 113–141, 2013, doi: 10.1016/j.ins.2013.07.007.
[7] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
[8] A. Saifudin and R. S. Wahono, “Penerapan teknik ensemble untuk menangani ketidakseimbangan kelas pada prediksi cacat software,” IlmuKomputer. com Journal of Software Engineering, vol. 1, no. 1, pp. 28–37, 2015.
[9] J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques, 3rd ed. Amsterdam: Morgan Kaufmann, 2011.
[10] F. Grina, Z. Elouedi, and E. Lefevre, “A preprocessing approach for class-imbalanced data using SMOTE and belief function theory,” International Conference on Intelligent Data Engineering and Automated Learning, pp. 3–11, 2020, doi: 10.1007/978-3-030-62365-4_1.
[11] A. Saifudin and R. S. Wahono, “Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” IlmuKomputer. com Journal of Software Engineering, vol. 1, no. 2, pp. 76–85, 2015.