TEMATIK MODELLASHTIRISHNING ZAMONAVIY USULLARI

Authors

Keywords:

Tematik modellashtirish, Yashirin Dirixle tadqsimoti (LDA), LSA, PLSA, lda2vec, tBERT, til korpusi, nazoratsiz mashinali o'qitish usullari

Abstract

Tematik modellashtirish – tabiiy tilni qayta ishlash (Natural Language Processing, NLP) vazifalarining muhim komponenti bo`lib, turli sohalarda ishlatilishi mumkin. Tematik modellarida ikkita asosiy muammo mavjud: natijani bashorat qilish uchun ishlatilishi mumkin bo'lgan mavzularni aniqlash va allaqachon topilgan mavzularni tushunishni osonlashtirish. Ushbu maqolada, tematik modellashtirishning zamonaviy usullari hisoblangan manfiy bo'lmagan matritsa faktorizatsiyasi (Non-negative Matrix Factorization), yashirin semantik taqsimot (Latent Semantic Allocation, LSA), ehtimoliy yashirin semantik tahlil (Probabilistic Latent Semantic Analysis,PLSA), lda2vec chuqur o'rganish modeli va tBERT haqida fikr-mulohaza yuritiladi. Tematik modellashtirish odatda katta hajmdagi til korpusiga qo‘llaniladi. Tematik modellashtirish uchta turdagi so‘zlarning klasterlarini hosil qiladi – birgalikda keladigan so‘zlar; so‘zlarning taqsimlanishi va mavzu bo‘yicha so‘zlarning gistogrammasi. Ushbu maqolada o`zbek tili korpusidagi namunaviy matnlarga LDA usulini qo`llash orqali olingan natijalar keltiriladi.

Author Biographies

Elov Botir Boltayevich, Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti

texnika fanlari falsafa doktori, dotsent

Alayev Ruhillo Habibovich, National University of Uzbekistan after Mirzo Ulugbek

texnika fanlari falsafa doktori

References

Gao, Q., Huang, X., Dong, K., Liang, Z., & Wu, J. (2022). Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics, 127(3). https://doi.org/10.1007/s11192-022-04275-z

Zou, X., Zhu, Y., Feng, J., Lu, J., & Li, X. (2019). A novel hierarchical topic model for horizontal topic expansion with observed label information. IEEE Access, 7. https://doi.org/10.1109/ACCESS.2019.2960468

Korencic, D., Ristov, S., Repar, J., & Snajder, J. (2021). A Topic Coverage Approach to Evaluation of Topic Models. IEEE Access, 9. https://doi.org/10.1109/ACCESS.2021.3109425

Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. ACM International Conference Proceeding Series, 382. https://doi.org/10.1145/1553374.1553515

Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference.

Waal, A. de, & Barnard, E. (2008). Evaluating topic models with stability. Annual Symposium of the Pattern Recognition Association of South Africa.

Hou, C. K. J., & Behdinan, K. (2022). Dimensionality Reduction in Surrogate Modeling: A Review of Combined Methods. In Data Science and Engineering (Vol. 7, Issue 4). https://doi.org/10.1007/s41019-022-00193-5

Elov B., Aloyev N., Yuldashev A. SVD va NMF metodlari orqali tematik modellashtirish // Труды XI Международной конференции «Компьютерная обработка тюркских языков» «TURKLANG 2023». Бухара, 20-22 октября 2023 г.

Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of Topic Modelling Approaches in the Banking Context. Applied Sciences (Switzerland), 13(2). https://doi.org/10.3390/app13020797

Qiu, J., Wang, H., Lu, J., Zhang, B., & Du, K.-L. (2012). Neural Network Implementations for PCA and Its Exftensions. ISRN Artificial Intelligence, 2012. https://doi.org/10.5402/2012/847305

Wang, J., & Zhang, X. L. (2023). Deep NMF topic modeling. Neurocomputing, 515. https://doi.org/10.1016/j.neucom.2022.10.002

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11). https://doi.org/10.1007/s11042-018-6894-4

Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009.

Zhuang, F., Karypis, G., Ning, X., He, Q., & Shi, Z. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199. https://doi.org/10.1016/j.ins.2012.02.058

Hofmann, T. (2001). Unsupervised learning by probabilistic Latent Semantic Analysis. Machine Learning, 42(1–2).

https://doi.org/10.1023/A:1007617005950

Tao, R., Wei, Y., & Yang, T. (2021). Metaphor Analysis Method Based on Latent Semantic Analysis. Journal of Donghua University (English Edition), 38(1). https://doi.org/10.19884/j.1672-5220.202010087

Qi, Q., Hessen, D. J., Deoskar, T., & van der Heijden, P. G. M. (2023). A comparison of latent semantic analysis and correspondence analysis of document-term matrices. Natural Language Engineering, 8(10). https://doi.org/10.1017/S1351324923000244

Zhuang, F., Karypis, G., Ning, X., He, Q., & Shi, Z. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199. https://doi.org/10.1016/j.ins.2012.02.058

Hofmann, T. (2001). Unsupervised learning by probabilistic Latent Semantic Analysis. Machine Learning, 42(1–2).

https://doi.org/10.1023/A:1007617005950

Mishra, P. (2020). A Comparative Study for Sentiment Analysis: LDA and LDA2Vec. International Journal of Emerging Trends in Engineering Research, 8(8). https://doi.org/10.30534/ijeter/2020/06882020

Peinelt, N., Nguyen, D., & Liakata, M. (2020). tBERT: Topic models and BERT joining forces for semantic similarity detection. Proceedings of the Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.630

Liu, Z., Zhao, K., & Cheng, J. (2023). TBERT: Dynamic BERT Inference with Top-k Based Predictors. Proceedings -Design, Automation and Test in Europe, DATE, 2023-April.

Downloads

Published

2024-02-16

How to Cite

Elov, B., Ruhillo, A., & Aloyev, N. (2024). TEMATIK MODELLASHTIRISHNING ZAMONAVIY USULLARI. DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 2(1), 8–16. Retrieved from https://dtai.tsue.uz/index.php/dtai/article/view/v2i12