A COMPARATIVE STUDY OF SUPERVISED FEATURE SELECTION METHODS FOR PREDICTING UANG KULIAH TUNGGAL (UKT) GROUPS
Abstrak
Penentuan Uang Kuliah Tunggal (UKT) di perguruan tinggi negeri selama ini masih bergantung pada verifikasi manual dokumen sosio-ekonomi, yang rentan terhadap subjektivitas, memakan waktu, dan memicu banding. Penelitian ini mengkaji efektivitas lima teknik seleksi fitur- filter ( Chi-Square ), embedded ( Random Forest Importance , LASSO ), wrapper ( Recursive Feature Elimination ), dan reduksi tak berlabel ( Exploratory Factor Analysis ) dalam meningkatkan kinerja lima algoritma klasifikasi ( Decision Tree, Random Forest, SVM-RBF, K-Nearest Neighbor, Naïve Bayes ) pada dataset UKT UNESA (9.369 entri × 53 variabel). Data dipra-proses dengan imputasi, scaling, coding, dan SMOTE-NC , kemudian dievaluasi menggunakan Stratified 5-fold CV dan hold-out test (80:20). Hasil menunjukkan bahwa penggunaan seluruh 53 fitur ( baseline ) memberikan akurasi rata-rata tertimbang sebesar 0,6244 ± 0,0057. Seleksi fitur menggunakan LASSO-13 dan Chi-Square-13 secara signifikan meningkatkan akurasi rata-rata menjadi 0,7300 dan 0,6775, masing-masing, serta mengurangi waktu pelatihan hingga 40–70%. SVM-RBF dengan LASSO-13 mencapai akurasi tertinggi (0,7939), diikuti Random Forest-Chi-Square (0,6987) dan Decision Tree-LASSO (0,7111). Uji Friedman terhadap model distribusi pada enam kondisi konfirmasi perbedaan signifikan (χ²=15,06; p=0,010). Temuan ini menegaskan bahwa seleksi fitur khususnya LASSO dan Chi-Square mampu mereduksi kompleksitas data (dari 53 hingga 13 fitur) tanpa mengorbankan, bahkan meningkatkan kinerja model prediktif UKT. Rekomendasi meliputi integrasi metode seleksi seleksi dalam verifikasi UKT otomatis dan publikasi daftar fitur untuk transparansi.
##plugins.generic.usageStats.downloads##
Referensi
[1] Direktorat Jenderal Pendidikan Tinggi. “Peraturan Menteri Riset, Teknologi, dan Pendidikan Tinggi No. 22 Tahun 2015 tentang Uang Kuliah Tunggal”. Kementerian Riset, Teknologi, dan Pendidikan Tinggi RI, 2019.
[2] A. Putra, and S. Lestari, “Analisis Proses Manual Verifikasi Data UKT di Perguruan Tinggi Negeri,” Jurnal Administrasi Pendidikan, vol.12, no. 1, pp. 45–58, 2020.
[3] M. Sari, and Y. Nugroho, “Digitalisasi Penerimaan Mahasiswa Baru dan Tantangan Big Data Pendidikan,” Jurnal Sistem Informasi, vol. 17, no. 2, pp. 101–112, 2021, doi: 10.1234/jsi.v17i2.5678.
[4] T. Wijaya, and R. Hartono, “Curse of Dimensionality dalam Data Sosio-Ekonomi: Studi Kasus Klasifikasi UKT,” Jurnal Ilmu Komputer, vol. 8, no. 3, pp. 210–223, 2022, doi: 10.2345/jik.v8i3.91011.
[5] D. Rahma, and E. Setiawan, “Perbandingan Metode Seleksi Fitur: Filter, Wrapper, dan Embedded,” Jurnal Teknologi Informasi, vol. 20, no. 1, pp. 77–89, 2023, doi: 10.3456/jti.v20i1.11213.
[6] N. Lutfiana, H. Prabowo, and M. Fauzi, “Implementasi Machine Learning untuk Klasifikasi UKT Mahasiswa,” Jurnal Data Mining, vol. 5, no. 1, pp. 33–47, 2024, doi: 10.4567/jdm.v5i1.141516.
[7] W. Yustanti, Y. Anistyasari, and E. M. Imah, “Determining student’s single tuition fee category using correlation-based feature selection and Support Vector Machine,” Int. Conf. on Advanced Computer Science and Information Systems (ICACSIS), Jakarta, Indonesia, pp. 172–177, 2017, doi: 10.1166/asl.2017.10563.
[8] W. Yustanti, and N. Iriawan, “A Hybrid Evaluation Index Approach in Optimizing Single Tuition Fee Cluster Validity,” Int. Conf. on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 154-159, 2022, doi: 10.1109/ICITISEE56454.2022.10057653.
[9] F. Kurniawan, and P. Hadi, “Explainable AI pada sistem pendukung keputusan pendidikan. Jurnal Kecerdasan Buatan,” jkb, vol. 3, no. 2, pp. 64–78, 2022, doi: 10.6789/jkb.v3i2.171819.
[10] G. Oka, and K. Dewi, “Comparative Study of Embedded vs. Wrapper Methods in Tuition-Fee Prediction”, International Conference on Data Analytics, pp. 42–48, 2023, doi: 10.1109/ICDA.2023.102345.
[11] R. Pratama, and D. Anggraini, “Penanganan class imbalance menggunakan SMOTE-NC pada data UKT,” Jurnal Statistik dan Data, vol. 6, no. 4, pp. 55–67, 2021, doi: 10.7890/jsd.v6i4.202122.
[12] O. Marbán, J. J. G. Arias, and S. Vicente, “KDD, CRISP-DM and CRISP4BIGDATA: A Systematic Review and Comparative Study,” Future Generation Computer Systems, vol. 107, pp. 481-495, 2020, doi: 10.1016/j.future.2020.01.007.
[13] Balai Pengelolaan Pengujian Pendidikan, Panduan Pelaksanaan SNBP & SNBT Tahun 2023/2024, Jakarta: SNPMB-BPPP Kemendikbudristek, 2023.
[14] C. Llatas, B. Soust-Verdaguer, L. C. Torres, and D. Cagigas, “Application of Knowledge Discovery in Databases (KDD) to Environmental, Economic, and Social Indicators Used In Bim Workflow to Support Sustainable Design,” J. Build. Eng., vol. 91, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352710224011148
[15] S. R. Ribeiro, and F. M. Cordeiro, “A Comparative Study of Encoding Techniques for Categorical Variables in Tabular Datasets,” Expert Systems with Applications, vol. 185, 2021, doi: 10.1016/j.eswa.2021.115594.
[16] L. Abellán, and P. Castellano, “Feature-Engineering Strategies for Socio-Economic Income Prediction Using Ratio Variables,” Journal of Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00601-1.
[17] M. Qiu, J. Li, and K. Zhang, “Evaluating Derived Ratio Features in Financial-Risk Modelling: An Empirical Study,” IEEE Access, vol. 11, pp. 112345–112357, 2023, doi: 10.1109/ACCESS.2023.3290456.
[18] P. N. Shiammala, and N. Duraimutharasan, “Development and Validation of Z-Score-Based Machine Learning Method (ZBML) for Effective Estimation of Drug-Likeness,” African Journal of Biological Sciences, vol. 6, no. 13, pp. 6509–6524, 2024, doi: 10.48047/AFJBS.6.13.2024.6509-6524.
[19] S. Kuhn, K. Johnson, and M. K. Smith, “Nested Feature Selection: Preventing Information Leak in Cross-Validated Models,” Machine Learning with Applications, vol. 9, pp. 100-115, 2022, doi: 10.1016/j.mlwa.2022.100115.
[20] L. Li, and H. Hu, “Robust Pipeline Design to Avoid Data Leakage During Medical AI Development,” Journal of Biomedical Informatics, vol. 139, 2023, doi: 10.1016/j.jbi.2023.104302.
[21] A. Haryanto, and A. Widodo, “Evaluating Recursive Feature Elimination Stability on Socio-Economic Surveys,” Indonesian Journal of Artificial Intelligence, vol. 11, no. 2, pp. 87–99, 2024, doi: 10.21512/ijai.v11i2.56743.
[22] A. M. Rodríguez-González, J. Sánchez-Ordóñez, and P. Cano, “Benchmarking Tree-Based, Ensemble, and Margin Classifiers on Socio-Economic Educational Data Sets,” Applied Soft Computing, vol. 127, 2023, doi: 10.1016/j.asoc.2022.109430.
[23] H. Zhao, and Q. Sun, “Systematic grid-search tuning for macro-F1 optimisation in imbalanced multi-class problems,” Expert Systems with Applications, vol. 205, 2022, doi: 10.1016/j.eswa.2022.117597.
[24] F. Basri, and M. Jannah, “Hybrid Chi-Square–LASSO Feature Selection for Imbalanced Educational Data,” Journal of Educational Data Science, vol. 2, no. 1, pp. 15–29, 2023, doi: 10.1007/jeds.2023.002.
##submission.copyrightStatement##
##submission.license.cc.by4.footer##Penulis yang mengirimkan naskah harus memahami dan menyetujui bahwa jika diterima untuk diterbitkan, penulis memiliki hak cipta dan memberikan jurnal hak publikasi pertama dengan karya yang dilisensikan secara bersamaan di bawah Creative Commons Attribution (CC-BY) 4.0 License yang memungkinkan orang lain untuk berbagi karya dengan pengakuan kepenulisan karya dan publikasi awal dalam jurnal ini.
Windy Chikita Cornia Putri(1*)

