TABIIY TILNING STATISTIK MODELLARI
Keywords:
Tinish belgilarini tiklash, punctuation restoration, Automatic Speech Recognition, ASR, og‘zaki nutqni tanib olish, nutq transkriptlari, UzbPunct ma'lumotlar to‘plamiAbstract
Tabiiy tilning statistik modeli (Statistical Language Model, SLM) – tabiiy tilni qayta ishlashda qo‘llaniladigan zamonaviy vosita bo‘lib, u ma’lum tildagi so‘zlar ketma-ketligi ehtimolini bashorat qilishga qaratilgan. SLM asosida gapdagi muayyan ketma-ketlikdan keyingi so‘z bashorat qilinadi. SLM so‘zlarning tabiiy til ma’lumotlari korpusida paydo bo‘lishiga asoslangan ketma-ketlik ehtimolini hisobga oladi. Katta hajmdagi matn ma’lumotlarini tahlil qilish orqali model so‘zlarning tilda qanday qo‘llanilishi qoliplarini o‘rganishi va ushbu qoliplar asosida keyingi ehtimoli yuqori so‘zni bashorat qilishi mumkin. NLP sohasi rivojlanishda davom etar ekan, statistik til modellari tilni tushunish va qayta ishlash uchun asosiy vosita bo‘lib hisoblanadi. SLMlar yordamida tabiiy til texnologiyasida mumkin bo‘lgan chegaralarni kengaytirishni davom ettirishimiz va yanada innovatsion va kuchli NLP ilovalarni yaratishimiz mumkin. Ushbu maqolada tabiiy tilning statistik modellaridan hiosblangan N-gram modelini o‘zbek tili korpusi asosida ishlab chiqish usullari keltiriladi. Shuningdek, N-gram modellarining matematik tavsifi va baholash usullari hamda umumlashtirish, sezgirlik, OOV (noma’lum so‘zlar), maxsus kontekst muammolari va ularni bartaraf qilish yo‘llari keltiriladi.
References
Yi, J., Tao, J., Bai, Y., Tian, Z., & Fan, C. (2020). Adversarial transfer learning for punctuation restoration. arXiv preprint arXiv:2004.00248.
Nguyen, T. B., Nguyen, Q. M., Nguyen, T. T. H., Do, Q. T., & Luong, C. M. (2020). Improving vietnamese named entity recognition from speech using word capitalization and punctuation recovery models. arXiv preprint arXiv:2010.00198.
Sirts, K., & Peekman, K. (2020). Evaluating sentence segmentation and word tokenization systems on Estonian web texts. In Human Language Technologies–The Baltic Perspective (pp. 174-181). IOS Press.
Wang, X. (2020, February). Analysis of Sentence Boundary of the Host's Spoken Language Based on Semantic Orientation Pointwise Mutual Information Algorithm. In 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) (pp. 501-506). IEEE.
Makhija, K., Ho, T. N., & Chng, E. S. (2019, November). Transfer learning for punctuation prediction. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 268-273). IEEE.
Xu, K., Xie, L., & Yao, K. (2016, October). Investigating LSTM for punctuation prediction. In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 1-5). IEEE.
Wu, X., Zhu, S., Wu, Y., & Yu, K. (2016, October). Rich punctuations prediction using large-scale deep learning. In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 1-5). IEEE.
Liu, X., Liu, Y., & Song, X. (2018, November). Investigating for punctuation prediction in Chinese speech transcriptions. In 2018 International Conference on Asian Language Processing (IALP) (pp. 74-78). IEEE.
Silva, A., Theobald, B. J., & Apostoloff, N. (2021, June). Multimodal punctuation prediction with contextual dropout. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3980-3984). IEEE.
Zheng, A., Ye, N., Wang, X., & Song, X. (2020, November). 3r: Word and phoneme edition based data augmentation for lexical punctuation prediction. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 1-5). IEEE.
Fang, M., Zhao, H., Song, X., Wang, X., & Huang, S. (2019, December). Using bidirectional LSTM with BERT for Chinese punctuation prediction. In 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP) (pp. 1-5). IEEE.
Sunkara, M., Ronanki, S., Bekal, D., Bodapati, S., & Kirchhoff, K. (2020). Multimodal semi-supervised learning framework for punctuation prediction in conversational speech. arXiv preprint arXiv:2008.00702.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Botir Elov, Ruhillo Alayev, Abdulla Abdullayev
This work is licensed under a Creative Commons Attribution 4.0 International License.