STATISTIK POS-TEGLASH ALGORTIMLARI (HMM, CRF) VA MATEMATIK MODELLARI

Authors

  • Botir Elov Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti

Keywords:

POS‑teglash, yashirin Markov modeli, HMM, shartli tasodifiy maydon, CRF, statistik NLP, Viterbi algoritmi, kam resursli til

Abstract

Ushbu maqolada o‘zbek tili uchun statistik part-of‑speech (POS) teglashning ikki asosiy paradigmasi,  yashirin Markov modeli (HMM) va shartli tasodifiy maydon (CRF) matematik jihatdan tahlil qilinadi hamda ularning amaliy samaradorligi tajriba orqali baholanadi. Avvalo har bir modelning ehtimollik-funksional tuzilishi,  o‘tish va  chiqarish  ehtimolliklari (HMM)  hamda xususiyat funksiyalari va vazn koeffitsientlari (CRF) formal ravishda keltiriladi. Tadqiqot doirasida 17038, 56616 va 77821 gapdan iborat iborat bo‘lgan, CONLL‑U formatidagi qo‘lda POS-teglangan datasetlardan foydalanilgan holda, modellar Laplas smoothing + Viterbi (HMM) va L‑BFGS optimizatori + Viterbi (CRF) bilan o‘qitildi. Test to‘plamida HMM 82 % aniqlik, CRF esa 88 % aniqlikka erishdi; CRF ko‘proq kontekst va lingvistik xususiyatlarni aniqlash hisobiga HMMdan 6 punkt yuqori natija ko‘rsatdi. 3 ta datasetga BiLSTM-CRF va BERT-CRF neyron tarmoqli modellarni qo‘llash natijasida 93.4% F1-ko‘rsatkichga erishildi. Natijalar statistik modellar o‘zbek tili kabi aglutinativ va kam resursli tillarda ham barqaror ishlashini, lekin xususiyatlarga sezgirligini ko‘rsatadi. Maqola yakunida model tanlovi, teg ajratishdagi asosiy xatolik turlari va kelgusida chuqur o‘rganishga asoslangan yondashuvlarga o‘tishning afzalliklari muhokama qilinadi.

References

Elov, B., & Xudayberganov, N. (2024). O ‘zbek tili korpusi matnlarini pos teglash usullari. Computer Linguistics: problems, solutions, prospects, 1(1).

Elov, B., Hamroyeva, S., Abdullayeva, O., Xusainova, Z., & Xudayberganov, N. (2023). O ‘zbek, turk va uyg ‘ur tillarida POS teglash va stemming. Uzbekistan: Language and Culture, 1(1).

Sharipov, M., Mattiev, J., Sobirov, J., & Baltayev, R. (2022). Creating a morphological and syntactic tagged corpus for the Uzbek language. arXiv preprint arXiv:2210.15234.

Kumawat, D., & Jain, V. (2015). POS tagging approaches: A comparison. International Journal of Computer Applications, 118(6).

Can, B. (2011). Statistical models for unsupervised learning of morphology and POS tagging (Doctoral dissertation, University of York).

Boltayevich, E. B., Samariddinovich, S. S., Mirdjonovna, K. S., Adalı, E., & Yuldashevna, X. Z. (2023, September). POS taging of Uzbek text using hidden markov model. In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 63-68). IEEE.

https://domino.ai/blog/named-entity-recognition-ner-challenges-and-model

Xuan Bach, N., Khuong Duy, T., & Minh Phuong, T. (2019). A POS tagging model for Vietnamese social media text using BiLSTM-CRF with rich features. In PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26-30, 2019, Proceedings, Part III 16 (pp. 206-219). Springer International Publishing.

Murat, A., & Ali, S. (2024). Low-resource POS tagging with deep affix representation and multi-head attention. IEEE Access.

Bobojonova, L., Akhundjanova, A., Ostheimer, P., & Fellenz, S. (2025). BBPOS: BERT-based Part-of-Speech Tagging for Uzbek. arXiv preprint arXiv:2501.10107.

Bărbulescu, A., & Morariu, D. (2020). Part of Speech Tagging Using Hidden Markov Models. International Journal of Advanced Statistics and IT&C for Economics and Life Sciences, 10(1).

Hoojon, R., & Nath, A. (2023, March). BiLSTM with CRF Part-of-Speech Tagging for Khasi language. In 2023 4th International Conference on Computing and Communication Systems (I3CS) (pp. 1-7). IEEE.

Arslan, S. (2024). Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text. Neural Computing and Applications, 36(15), 8371-8382.

Liu, J., Sun, C., & Yuan, Y. (2020, November). The BERT-BiLSTM-CRF question event information extraction method. In 2020 IEEE 3rd International Conference on Electronic Information and Communication Technology (ICEICT) (pp. 729-733). IEEE.

Hlaing, Z. Z., Thu, Y. K., Supnithi, T., & Netisopakul, P. (2022). Improving neural machine translation with POS-tag features for low-resource language pairs. Heliyon, 8(8).

Zhang, L., & Li, Y. Finding Product Problems from Online Reviews Based on BERT-CRF Model.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).

Li, D., Tu, Y., Zhou, X., Zhang, Y., & Ma, Z. (2022). End-to-end chinese entity recognition based on bert-bilstm-att-crf. ZTE Communications, 20(S1), 27.

Downloads

Published

2025-08-09

How to Cite

Elov, B. (2025). STATISTIK POS-TEGLASH ALGORTIMLARI (HMM, CRF) VA MATEMATIK MODELLARI. DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 3(4), 47–61. Retrieved from https://dtai.tsue.uz/index.php/dtai/article/view/v3i48