YASHIRIN DIRIXLE TAQSIMOTI USULI YORDAMIDA TIL KORPUSI MATNLARINI TEMATIK MODELLASHTIRISH

Authors

Keywords:

Tematik modellashtirish, tematik modellar, mashinali o‘qitish, Yashirin Dirixle tadqsimoti (LDA), matnlarni klasterlash, mavzular tasnifi, NLP algortimlari

Abstract

Tematik modellashtirish nazoratsiz ma’lumotlarda amalga oshiriladi va matnni tasniflash hamda klasterlash vazifalaridan aniq farq qiladi. Ma’lumotni qidirishni osonlashtirish va hujjatlar klasterlarini yaratishni maqsad qilgan matn tasnifi yoki klasterlashdan farqli o‘laroq, tematik modellashtirish hujjatlardagi o‘xshashliklarni topishni maqsad qilmaydi. Tematik modellashtirish odatda katta hajmdagi til korpusiga qo‘llaniladi. Tematik modellashtirish uchta turdagi so‘zlarning klasterlarini hosil qiladi – birgalikda keladigan so‘zlar; so‘zlarning taqsimlanishi va mavzu bo‘yicha so‘zlarning gistogrammasi. Tematik modellashtirishning bir nechta modellari mavjud: masalan, so‘zlar sumkasi (bag-of-words), unigram modeli, generativ model.  Bugungi kunda tematik modellashtirish vazifalari uchun ishlatiladigan algoritmlar sifatida Yashirin Dirixle taqsimoti (Latent Dirichlet Allocation,LDA), yashirin semantik tahlil (Latent Semantic Analysis,LSA), korrelyatsiya qilingan tematik modellashtirish (Correlated Topic Modeling) va ehtimollik yashirin semantik tahlili (Probabilistic Latent Semantic Analysis,PLSA). Ushbu maqolada LDA usuli yordamida til korpusi matnlarini tematik modellashtirish usuli tahlil qilinadi.

References

B.Elov, Z.Xusainova, N.Xudayberganov. O‘zbek tili korpusi matnlari uchun TF-IDF statistik ko‘rsatkichni hisoblash. SCIENCE AND INNOVATION INTERNATIONAL SCIENTIFIC JOURNAL VOLUME 1 ISSUE 8 UIF-2022: 8.2 | ISSN: 2181-3337

Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-Centric Computing and Information Sciences, 9(1). https://doi.org/10.1186/s13673-019-0192-7

Gao, Q., Huang, X., Dong, K., Liang, Z., & Wu, J. (2022). Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics, 127(3). https://doi.org/10.1007/s11192-022-04275-z

Zou, X., Zhu, Y., Feng, J., Lu, J., & Li, X. (2019). A novel hierarchical topic model for horizontal topic expansion with observed label information. IEEE Access, 7. https://doi.org/10.1109/ACCESS.2019.2960468

Korencic, D., Ristov, S., Repar, J., & Snajder, J. (2021). A Topic Coverage Approach to Evaluation of Topic Models. IEEE Access, 9. https://doi.org/10.1109/ACCESS.2021.3109425

Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. ACM International Conference Proceeding Series, 382. https://doi.org/10.1145/1553374.1553515

Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference.

Waal, A. de, & Barnard, E. (2008). Evaluating topic models with stability. Annual Symposium of the Pattern Recognition Association of South Africa.

Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of topic coherence. NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference.

Elov B., Aloyev N., Yuldashev A. SVD va NMF metodlari orqali tematik modellashtirish // Труды XI Международной конференции «Компьютерная обработка тюркских языков» «TURKLANG 2023». Бухара, 20-22 октября 2023 г.

Deerwester, S. (1988). Improving Information Retrieval with Latent Semantic Indexing. In Proceedings of the 51st ASIS Annual Meeting (ASIS ’88), Vol. 25 (October 1988), 25.

Qiang, J., Qian, Z., Li, Y., Yuan, Y., & Wu, X. (2022). Short Text Topic Modeling Techniques, Applications, and Performance: A Survey. In IEEE Transactions on Knowledge and Data Engineering (Vol. 34, Issue 3). https://doi.org/10.1109/TKDE.2020.2992485

Zhuang, F., Karypis, G., Ning, X., He, Q., & Shi, Z. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199. https://doi.org/10.1016/j.ins.2012.02.058

Hofmann, T. (2001). Unsupervised learning by probabilistic Latent Semantic Analysis. Machine Learning, 42(1–2).

https://doi.org/10.1023/A:1007617005950

Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems, 28(1).

https://doi.org/10.1145/1658377.1658381

Rehman, A. U., Khan, A. H., Aftab, M., Rehman, Z., & Shah, M. A. (2019). Hierarchical topic modeling for Urdu text articles. ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing. https://doi.org/10.23919/IConAC.2019.8895047

Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of Topic Modelling Approaches in the Banking Context. Applied Sciences (Switzerland), 13(2). https://doi.org/10.3390/app13020797

Qiu, J., Wang, H., Lu, J., Zhang, B., & Du, K.-L. (2012). Neural Network Implementations for PCA and Its Extensions. ISRN Artificial Intelligence, 2012. https://doi.org/10.5402/2012/847305

B.Elov, Z.Xusainova, N.Xudayberganov (2022). Tabiiy tilni qayta ishlashda Bag of Words algoritmidan foydalanish. O‘zbekiston: til va madaniyat (Amaliy filologiya), 2022, 5(4).

http://aphil.tsuull.uz/index.php/language-and-culture/article/download/32/29

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11). https://doi.org/10.1007/s11042-018-6894-4

Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009.

Wang, J., & Zhang, X. L. (2023). Deep NMF topic modeling. Neurocomputing, 515. https://doi.org/10.1016/j.neucom.2022.10.002

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

Tao, R., Wei, Y., & Yang, T. (2021). Metaphor Analysis Method Based on Latent Semantic Analysis. Journal of Donghua University (English Edition), 38(1). https://doi.org/10.19884/j.1672-5220.202010087

Qi, Q., Hessen, D. J., Deoskar, T., & van der Heijden, P. G. M. (2023). A comparison of latent semantic analysis and correspondence analysis of document-term matrices. Natural Language Engineering, 8(10). https://doi.org/10.1017/S1351324923000244

Mishra, P. (2020). A Comparative Study for Sentiment Analysis: LDA and LDA2Vec. International Journal of Emerging Trends in Engineering Research, 8(8). https://doi.org/10.30534/ijeter/2020/06882020

Peinelt, N., Nguyen, D., & Liakata, M. (2020). tBERT: Topic models and BERT joining forces for semantic similarity detection. Proceedings of the Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.630

Liu, Z., Zhao, K., & Cheng, J. (2023). TBERT: Dynamic BERT Inference with Top-k Based Predictors. Proceedings -Design, Automation and Test in Europe, DATE, 2023-April.

https://doi.org/10.23919/DATE56975.2023.10136977

He, J., Hu, Z., Berg-Kirkpatrick, T., Huang, Y., & Xing, E. P. (2017). Efficient correlated topic modeling with topic embedding. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F129685. https://doi.org/10.1145/3097983.3098074

Downloads

Published

2024-04-27

How to Cite

Elov, B., Ruhillo, A., & Aloyev, N. (2024). YASHIRIN DIRIXLE TAQSIMOTI USULI YORDAMIDA TIL KORPUSI MATNLARINI TEMATIK MODELLASHTIRISH. DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 2(2), 7–18. Retrieved from https://dtai.tsue.uz/index.php/dtai/article/view/v2i22