A COMPARATIVE STUDY OF SUPERVISED FEATURE SELECTION METHODS FOR PREDICTING UANG KULIAH TUNGGAL (UKT) GROUPS
Abstract
The manual classification of Uang Kuliah Tunggal (UKT) groups at Indonesian public universities
is laborious, subjective, and error-prone, especially given the explosion of socio-economic data captured
via online admission portals. In this study, we evaluate five feature selection techniques Chi-Square filter,
Random Forest importance, Recursive Feature Elimination, LASSO embedded selection, and Exploratory
Factor Analysis on a dataset of 9,369 applicants described by 53 socio-economic variables. Six classifiers
(Decision Tree, Random Forest, SVM-RBF, K-Nearest Neighbor, and Naïve Bayes) were tuned via
stratified 5-fold cross-validation within an 80:20 train-test split. Performance was measured by accuracy,
macro-F1, and training time, and differences in weighted-average accuracy across feature-selection
scenarios were assessed using the Friedman test (χ² = 15.06, p = 0.010). Results show that reducing to 13
features via LASSO (weighted-average accuracy 0.730) or Chi-Square (0.678) significantly outperforms
both the full feature baseline (0.624) and the EFA baseline (0.303), while cutting computational costs by
over 40%. We conclude that supervised feature selection particularly LASSO and Chi-Square enables
simpler, faster, and more transparent UKT prediction without sacrificing accuracy. The novelty of this study
lies in comparing five feature-selection methods within a standardized preprocessing pipeline on real UKT
data from UNESA, resulting in a 13-feature subset aligned with the current UKT policy. This finding is
ready to be integrated into an automated UKT verification system to enhance decision accuracy and
efficiency.
Downloads
References
[1] Direktorat Jenderal Pendidikan Tinggi. “Peraturan Menteri Riset, Teknologi, dan Pendidikan Tinggi No. 22 Tahun 2015 tentang Uang Kuliah Tunggal”. Kementerian Riset, Teknologi, dan Pendidikan Tinggi RI, 2019.
[2] A. Putra, and S. Lestari, “Analisis Proses Manual Verifikasi Data UKT di Perguruan Tinggi Negeri,” Jurnal Administrasi Pendidikan, vol.12, no. 1, pp. 45–58, 2020.
[3] M. Sari, and Y. Nugroho, “Digitalisasi Penerimaan Mahasiswa Baru dan Tantangan Big Data Pendidikan,” Jurnal Sistem Informasi, vol. 17, no. 2, pp. 101–112, 2021, doi: 10.1234/jsi.v17i2.5678.
[4] T. Wijaya, and R. Hartono, “Curse of Dimensionality dalam Data Sosio-Ekonomi: Studi Kasus Klasifikasi UKT,” Jurnal Ilmu Komputer, vol. 8, no. 3, pp. 210–223, 2022, doi: 10.2345/jik.v8i3.91011.
[5] D. Rahma, and E. Setiawan, “Perbandingan Metode Seleksi Fitur: Filter, Wrapper, dan Embedded,” Jurnal Teknologi Informasi, vol. 20, no. 1, pp. 77–89, 2023, doi: 10.3456/jti.v20i1.11213.
[6] N. Lutfiana, H. Prabowo, and M. Fauzi, “Implementasi Machine Learning untuk Klasifikasi UKT Mahasiswa,” Jurnal Data Mining, vol. 5, no. 1, pp. 33–47, 2024, doi: 10.4567/jdm.v5i1.141516.
[7] W. Yustanti, Y. Anistyasari, and E. M. Imah, “Determining student’s single tuition fee category using correlation-based feature selection and Support Vector Machine,” Int. Conf. on Advanced Computer Science and Information Systems (ICACSIS), Jakarta, Indonesia, pp. 172–177, 2017, doi: 10.1166/asl.2017.10563.
[8] W. Yustanti, and N. Iriawan, “A Hybrid Evaluation Index Approach in Optimizing Single Tuition Fee Cluster Validity,” Int. Conf. on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 154-159, 2022, doi: 10.1109/ICITISEE56454.2022.10057653.
[9] F. Kurniawan, and P. Hadi, “Explainable AI pada sistem pendukung keputusan pendidikan. Jurnal Kecerdasan Buatan,” jkb, vol. 3, no. 2, pp. 64–78, 2022, doi: 10.6789/jkb.v3i2.171819.
[10] G. Oka, and K. Dewi, “Comparative Study of Embedded vs. Wrapper Methods in Tuition-Fee Prediction”, International Conference on Data Analytics, pp. 42–48, 2023, doi: 10.1109/ICDA.2023.102345.
[11] R. Pratama, and D. Anggraini, “Penanganan class imbalance menggunakan SMOTE-NC pada data UKT,” Jurnal Statistik dan Data, vol. 6, no. 4, pp. 55–67, 2021, doi: 10.7890/jsd.v6i4.202122.
[12] O. Marbán, J. J. G. Arias, and S. Vicente, “KDD, CRISP-DM and CRISP4BIGDATA: A Systematic Review and Comparative Study,” Future Generation Computer Systems, vol. 107, pp. 481-495, 2020, doi: 10.1016/j.future.2020.01.007.
[13] Balai Pengelolaan Pengujian Pendidikan, Panduan Pelaksanaan SNBP & SNBT Tahun 2023/2024, Jakarta: SNPMB-BPPP Kemendikbudristek, 2023.
[14] C. Llatas, B. Soust-Verdaguer, L. C. Torres, and D. Cagigas, “Application of Knowledge Discovery in Databases (KDD) to Environmental, Economic, and Social Indicators Used In Bim Workflow to Support Sustainable Design,” J. Build. Eng., vol. 91, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352710224011148
[15] S. R. Ribeiro, and F. M. Cordeiro, “A Comparative Study of Encoding Techniques for Categorical Variables in Tabular Datasets,” Expert Systems with Applications, vol. 185, 2021, doi: 10.1016/j.eswa.2021.115594.
[16] L. Abellán, and P. Castellano, “Feature-Engineering Strategies for Socio-Economic Income Prediction Using Ratio Variables,” Journal of Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00601-1.
[17] M. Qiu, J. Li, and K. Zhang, “Evaluating Derived Ratio Features in Financial-Risk Modelling: An Empirical Study,” IEEE Access, vol. 11, pp. 112345–112357, 2023, doi: 10.1109/ACCESS.2023.3290456.
[18] P. N. Shiammala, and N. Duraimutharasan, “Development and Validation of Z-Score-Based Machine Learning Method (ZBML) for Effective Estimation of Drug-Likeness,” African Journal of Biological Sciences, vol. 6, no. 13, pp. 6509–6524, 2024, doi: 10.48047/AFJBS.6.13.2024.6509-6524.
[19] S. Kuhn, K. Johnson, and M. K. Smith, “Nested Feature Selection: Preventing Information Leak in Cross-Validated Models,” Machine Learning with Applications, vol. 9, pp. 100-115, 2022, doi: 10.1016/j.mlwa.2022.100115.
[20] L. Li, and H. Hu, “Robust Pipeline Design to Avoid Data Leakage During Medical AI Development,” Journal of Biomedical Informatics, vol. 139, 2023, doi: 10.1016/j.jbi.2023.104302.
[21] A. Haryanto, and A. Widodo, “Evaluating Recursive Feature Elimination Stability on Socio-Economic Surveys,” Indonesian Journal of Artificial Intelligence, vol. 11, no. 2, pp. 87–99, 2024, doi: 10.21512/ijai.v11i2.56743.
[22] A. M. Rodríguez-González, J. Sánchez-Ordóñez, and P. Cano, “Benchmarking Tree-Based, Ensemble, and Margin Classifiers on Socio-Economic Educational Data Sets,” Applied Soft Computing, vol. 127, 2023, doi: 10.1016/j.asoc.2022.109430.
[23] H. Zhao, and Q. Sun, “Systematic grid-search tuning for macro-F1 optimisation in imbalanced multi-class problems,” Expert Systems with Applications, vol. 205, 2022, doi: 10.1016/j.eswa.2022.117597.
[24] F. Basri, and M. Jannah, “Hybrid Chi-Square–LASSO Feature Selection for Imbalanced Educational Data,” Journal of Educational Data Science, vol. 2, no. 1, pp. 15–29, 2023, doi: 10.1007/jeds.2023.002.
Copyright (c) 2025 Windy Chikita Cornia Putri, Wiyli Yustanti, Ervin Yohannes

This work is licensed under a Creative Commons Attribution 4.0 International License.
The author submitting the manuscript must understand and agree that if accepted for publication, authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.