TEMATIK MODELLASHTIRISHNING ZAMONAVIY USULLARI
Keywords:
Tematik modellashtirish, Yashirin Dirixle tadqsimoti (LDA), LSA, PLSA, lda2vec, tBERT, til korpusi, nazoratsiz mashinali o'qitish usullariAbstract
Tematik modellashtirish – tabiiy tilni qayta ishlash (Natural Language Processing, NLP) vazifalarining muhim komponenti bo`lib, turli sohalarda ishlatilishi mumkin. Tematik modellarida ikkita asosiy muammo mavjud: natijani bashorat qilish uchun ishlatilishi mumkin bo'lgan mavzularni aniqlash va allaqachon topilgan mavzularni tushunishni osonlashtirish. Ushbu maqolada, tematik modellashtirishning zamonaviy usullari hisoblangan manfiy bo'lmagan matritsa faktorizatsiyasi (Non-negative Matrix Factorization), yashirin semantik taqsimot (Latent Semantic Allocation, LSA), ehtimoliy yashirin semantik tahlil (Probabilistic Latent Semantic Analysis,PLSA), lda2vec chuqur o'rganish modeli va tBERT haqida fikr-mulohaza yuritiladi. Tematik modellashtirish odatda katta hajmdagi til korpusiga qo‘llaniladi. Tematik modellashtirish uchta turdagi so‘zlarning klasterlarini hosil qiladi – birgalikda keladigan so‘zlar; so‘zlarning taqsimlanishi va mavzu bo‘yicha so‘zlarning gistogrammasi. Ushbu maqolada o`zbek tili korpusidagi namunaviy matnlarga LDA usulini qo`llash orqali olingan natijalar keltiriladi.
References
Gao, Q., Huang, X., Dong, K., Liang, Z., & Wu, J. (2022). Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics, 127(3). https://doi.org/10.1007/s11192-022-04275-z
Zou, X., Zhu, Y., Feng, J., Lu, J., & Li, X. (2019). A novel hierarchical topic model for horizontal topic expansion with observed label information. IEEE Access, 7. https://doi.org/10.1109/ACCESS.2019.2960468
Korencic, D., Ristov, S., Repar, J., & Snajder, J. (2021). A Topic Coverage Approach to Evaluation of Topic Models. IEEE Access, 9. https://doi.org/10.1109/ACCESS.2021.3109425
Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. ACM International Conference Proceeding Series, 382. https://doi.org/10.1145/1553374.1553515
Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference.
Waal, A. de, & Barnard, E. (2008). Evaluating topic models with stability. Annual Symposium of the Pattern Recognition Association of South Africa.
Hou, C. K. J., & Behdinan, K. (2022). Dimensionality Reduction in Surrogate Modeling: A Review of Combined Methods. In Data Science and Engineering (Vol. 7, Issue 4). https://doi.org/10.1007/s41019-022-00193-5
Elov B., Aloyev N., Yuldashev A. SVD va NMF metodlari orqali tematik modellashtirish // Труды XI Международной конференции «Компьютерная обработка тюркских языков» «TURKLANG 2023». Бухара, 20-22 октября 2023 г.
Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of Topic Modelling Approaches in the Banking Context. Applied Sciences (Switzerland), 13(2). https://doi.org/10.3390/app13020797
Qiu, J., Wang, H., Lu, J., Zhang, B., & Du, K.-L. (2012). Neural Network Implementations for PCA and Its Exftensions. ISRN Artificial Intelligence, 2012. https://doi.org/10.5402/2012/847305
Wang, J., & Zhang, X. L. (2023). Deep NMF topic modeling. Neurocomputing, 515. https://doi.org/10.1016/j.neucom.2022.10.002
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11). https://doi.org/10.1007/s11042-018-6894-4
Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009.
Zhuang, F., Karypis, G., Ning, X., He, Q., & Shi, Z. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199. https://doi.org/10.1016/j.ins.2012.02.058
Hofmann, T. (2001). Unsupervised learning by probabilistic Latent Semantic Analysis. Machine Learning, 42(1–2).
https://doi.org/10.1023/A:1007617005950
Tao, R., Wei, Y., & Yang, T. (2021). Metaphor Analysis Method Based on Latent Semantic Analysis. Journal of Donghua University (English Edition), 38(1). https://doi.org/10.19884/j.1672-5220.202010087
Qi, Q., Hessen, D. J., Deoskar, T., & van der Heijden, P. G. M. (2023). A comparison of latent semantic analysis and correspondence analysis of document-term matrices. Natural Language Engineering, 8(10). https://doi.org/10.1017/S1351324923000244
Zhuang, F., Karypis, G., Ning, X., He, Q., & Shi, Z. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199. https://doi.org/10.1016/j.ins.2012.02.058
Hofmann, T. (2001). Unsupervised learning by probabilistic Latent Semantic Analysis. Machine Learning, 42(1–2).
https://doi.org/10.1023/A:1007617005950
Mishra, P. (2020). A Comparative Study for Sentiment Analysis: LDA and LDA2Vec. International Journal of Emerging Trends in Engineering Research, 8(8). https://doi.org/10.30534/ijeter/2020/06882020
Peinelt, N., Nguyen, D., & Liakata, M. (2020). tBERT: Topic models and BERT joining forces for semantic similarity detection. Proceedings of the Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.630
Liu, Z., Zhao, K., & Cheng, J. (2023). TBERT: Dynamic BERT Inference with Top-k Based Predictors. Proceedings -Design, Automation and Test in Europe, DATE, 2023-April.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Elov Botir Boltayevich, Alayev Ruhillo Habibovich, Aloyev Narzillo Raxmatilloyevich
This work is licensed under a Creative Commons Attribution 4.0 International License.