THE IMPACT OF CATEGORICAL DATA ENCODING METHODS ON ARTIFICIAL INTELLIGENCE ALGORITHMS
Ключевые слова:
KNN, Decision Trees, Label Encoding, One-Hot Encoding, Frequency Encoding, Target EncodingАннотация
This study analyzes the effectiveness of categorical data encoding methods in artificial intelligence algorithms. The research examines the operational characteristics and impact on results of four widely used encoding techniques—Label Encoding, One-Hot Encoding, Frequency Encoding, and Target Encoding—applied to Decision Tree and K-Nearest Neighbors (KNN) algorithms. Using a real-world dataset, each encoding method was applied separately and evaluated with both algorithms. Model performance was assessed using conventional evaluation metrics such as accuracy, precision, recall, and F1-score. The results indicate that the combination of encoding method and selected algorithm has a significant effect on model quality. In particular, One-Hot Encoding yielded the best results with Decision Trees, while Target Encoding was found to be most effective for the KNN algorithm. The study concludes by outlining important considerations and practical recommendations for selecting appropriate encoding methods.
Библиографические ссылки
Ali Elfa, Mayssa Ahmad and Dawood, Mina Eshaq Tawfilis (2023) "Using Artificial Intelligence for enhancing Human Creativity," Journal of Art, Design and Music: Vol. 2 : Iss. 2 , Article 3. https://doi.org/10.55554/2785-9649.1017
Filippucci, F. et al. (2024), “The impact of Artificial Intelligence on productivity, distribution and growth: Key mechanisms, initial evidence and policy challenges”, OECD Artificial Intelligence Papers, No. 15, OECD Publishing, Paris, https://doi.org/10.1787/8d900037-en.
Rai, Hari & Pal, Aditya & Khamidov, Munis & Bobokhonov, Akhmadkhon & Ugli, Rashidov. (2025). Computational Intelligence Transforming Healthcare 4.0: Innovations in Medical Image Analysis through AI and IoT Integration. 10.1201/9781003507505-3.
A. Rashidov, D. Mardonov and A. Soliev, "Diagnosis of Diabetes Mellitus Based on Artificial Intelligence Algorithms," 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russian Federation, 2025, pp. 349-353, doi: 10.1109/SmartIndustryCon65166.2025.10986060.
B. Bala and S. Behal, "A Brief Survey of Data Preprocessing in Machine Learning and Deep Learning Techniques," 2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 2024, pp. 1755-1762, doi: 10.1109/I-SMAC61858.2024.10714767.
Maik Frye, Johannes Mohren, Robert H. Schmitt, Benchmarking of Data Preprocessing Methods for Machine Learning-Applications in Production, Procedia CIRP, Volume 104, 2021, Pages 50-55, ISSN 2212-8271, https://doi.org/10.1016/j.procir.2021.11.009
Rashidov, A., & Madaminjonov , A. (2024). Sun’iy intellekt modelini qurishda ma’lumotlarni tozalash bosqichi tahlili: Sun’iy intellekt modelini qurishda ma’lumotlarni tozalash bosqichi tahlili. MODERN PROBLEMS AND PROSPECTS OF APPLIED MATHEMATICS, 1(01). Retrieved from https://ojs.qarshidu.uz/index.php/mp/article/view/473
Amutha, P.; Priya, R. Evaluating the Effectiveness of Categorical Encoding Methods on Higher Secondary Student’s Data for Multi-Class Classification. Tuijin Jishu/J. Propuls. Technol. 2023, 44, 6267–6273, ISSN 1001-4055
Ouahi, M.; Khoulji, S.; Kerkeb, M.L. Advancing Sustainable Learning Environments: A Literature Review on Data Encoding Techniques for Student Performance Prediction using Deep Learning Models in Education. In Proceedings of the International Conference on Smart Technologies and Applied Research (STAR’2023), Istanbul, Turkey, 29–31 October 2023.
Bolikulov, F., Nasimov, R., Rashidov, A., Akhmedov, F., & Cho, Y.-I. (2024). Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms. Mathematics, 12(16), 2553. https://doi.org/10.3390/math12162553
Турсунов Ш. А., Рашидов А. Э. Анализ алгоритмов кодирования категориальных данных //"Проблемы информатики", 2025, № 2, с.5-18 DOI: 10.24412/2073-0667-2025-2-5-18. – EDN: ALXCCT
Parygin, D.S.; Malikov, V.P.; Golubev, A.V.; Sadovnikova, N.P.; Petrova, T.M.; Finogeev, A.G. Categorical data processing for real estate objects valuation using statistical analysis. J. Phys. Conf. Series.; 2018; 1015, 032102. DOI: https://dx.doi.org/10.1088/1742-6596/1015/3/032102
Available online: https://www.kaggle.com/datasets/laotse/credit-risk-dataset/data (accessed on 4 August 2024). (shu shaklda dataset linkini qo’yasan)
Huynh-Cam, T.-T., Chen, L.-S., & Le, H. (2021). Using Decision Trees and Random Forest Algorithms to Predict and Determine Factors Contributing to First-Year University Students’ Learning Performance. Algorithms, 14(11), 318. https://doi.org/10.3390/a14110318
Song YY, Lu Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry. 2015 Apr 25;27(2):130-5. doi: 10.11919/j.issn.1002-0829.215044. PMID: 26120265; PMCID: PMC4466856.
K. Taunk, S. De, S. Verma and A. Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019, pp. 1255-1260, doi: 10.1109/ICCS45141.2019.9065747.
Rashidov, A., Akhatov, A., Nazarov, F. (2023). The Same Size Distribution of Data Based on Unsupervised Clustering Algorithms. In: Hu, Z., Zhang, Q., He, M. (eds) Advances in Artificial Systems for Logistics Engineering III. ICAILE 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 180. Springer, Cham. https://doi.org/10.1007/978-3-031-36115-9_40
Rashidov, A.E.; Sayfullaev, J.S. Selecting methods of significant data from gathered datasets for research. Int. J. Adv. Res. Educ. Technol. Manag.; 2024; 3, pp. 289-296. [DOI: https://dx.doi.org/10.5281/zenodo.10781255]
Rashidov, Akbar & Akhatov, A. & Aminov, I. & Mardonov, Dilmurod & Dagur,. (2024). Distribution of data flows in distributed systems using hierarchical clustering. 10.1201/9781032700502-34.
Hari Mohan Rai, at all., Advanced AI-Powered Intrusion Detection Systems in Cybersecurity Protocols for Network Protection, Procedia Computer Science, Volume 259, 2025, Pages 140-149, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2025.03.315.
Загрузки
Опубликован
Как цитировать
Выпуск
Раздел
Лицензия
Copyright (c) 2025 Tursunov Sherzod Abduvakil o'g'li

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.