ABSTRAKTIV ANNOTATSIYALASH YORDAMIDA MATNLARNI ANNOTATSIYALOVCHI MASTAT TIZIMINI YARATISH
Keywords:
abstraktiv, annotatsiyalash, ekstraksiya, MASTAT, tokenizatsiyaAbstract
Jahonda so‘ngi yillarda ilmiy tadqiqot ishlarini rivojlantirish uchun chet tillaridagi elektron matnlarni semantik-morfologik tahlil qiluvchi modellar, algoritmlar va dasturiy vositalar yaratilmoqda. Afsuski, o‘zbek tilidagi elektron matnni avtomat semantik tahlil qilib, uning tarkibidan kalit so‘zlarni ajratib, mazmunidan kelib chiqib avtomatik tarzda annotatsiya yoki xulosa yozib beradigan dasturiy taʼminot hali ishlab chiqilmagan. Shu bois, mazkur anotatsiyalash uchun tanlangan - MASTAT tizimini yaratish o‘ta dolzarbdir. Ma’lumki kompyuter lingvistikasida foydalanadigan tabiiy tilni tushunish (NLU), semantik tahlilga yoki matnning mo'ljallangan ma'nosini aniqlashga qaratilgan va tabiiy tilni yaratish (NLG), bu esa matnni mashinada yaratishga qaratilgandir. NLP so'zlashuv tilini so'zlarga ajratish, tovushni matnga aylantirish va aksincha, nutqni aniqlashdan alohida, lekin ko'pincha nutqni aniqlash bilan birgalikda ishlatiladi.Tarkibiy qismlarni tahlil qilish asosan sintaktik tahlilga qaratilgan bo'lib, bog'liqlikni tahlil qilish jarayonida ham sintaktik, ham semantik tahlilni amalga oshirishi mumkin. Ushbu maqolada tarkibiy qismlarni tahlil qilish va bog'liqlikni tahlil qilishning modellari hamda boy semantikaga ega bo'lgan bog'liqlik tahlili ko'rib chiqiladi. Bundan tashqari, biz maqolada katta va kichik hajmdagi matnlarni tahlil modellari, tokenlarga ajratish usullari, o‘zbek tili uchun korpusini ishlab chiqishlarni ko'rib chiqamiz. Hozirda, oʼzbek tilidagi gapning grammatik strukturasi, xatolarni aniqlash, soʼzlarning sintaktik-morfologik jixatdan avtomatik tahlil qiluvchi vositalarini yaratishga doir bir qator tadqiqotlar amalga oshirilmoqda. Mazkur MASTAT tizimi shunisi bilan ahamiyatliki, matnlarni avtomatik qisqartirish, katta hajmdagi maʼlumotlarni kalit soʼzlar yordamida xulosalash imkonini beradi. Kompyuter lingvistikasi uchun ushbu tadqiqot bir qator nazariy va amaliy vazifalarni yechishga yordam beradi. Shu jihatdan, matnlarni sintaktik tahlil qiluvchi hamda avtomatik qisqartiruvchi dasturiy taʼminot ishlab chiqish, algoritm, modellarni yaratish muhim ahamiyat kasb etadi.
References
FayzullayevaZ.I., KarimovN.N., AbdusattarovA.Sh. //Matndansun’iyintellektyordamidaEkstrativ xulosa olish// Amaliy matematikaning zamonaviy muammolari va istiqbollari. Ilmiy-amaliy konferensiya dasturi. 24-25-may, 2024. Qarshi . 100-103
FayzullayevaZ.I., KarimovN.N., NLPda matnlarni umumiylashtirish va leksik tahlil. «CONTEMPORARY TECHNOLOGIES OF COMPUTATIONAL LINGUISTICS - CTCL.2024». International scientific-practical conference: Vol.2 / No. 22.04(2024)
Alessandro Agostini, Timur Usmanov, Ulugbek Khamdamov, Nilufar Abdurakhmonova, Mukhammadsaid Mamasaidov, A Lexical-Semantic Database for the Uzbek Language.// Proceedings of the 11th Global Wordnet conference. 2021
Nilufar Abdurakhmonova, Ulugbek Tuliyev, Ayrat Gatiatullin, Linguistic functionality of Uzbek Electron Corpus: uzbekcorpus.uz.// 2021 International Conference on Information Science and Communications Technologies (ICISCT). 2021
Khusainov, A., Suleymanov, D., Gilmullin, R., Gatiatullin, A. (2018) Building the Tatar-Russian NMT system based on retranslation of multilingual data Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11107 LNAI, pp. 163–170.
Sheremetyeva S., Nirenburg S., Nirenburg I. (1996). Generating Patent Claims From Interactive Input. In Proceedings of the 8th International Workshop on Natural Language Generation (Herstmonceux, Sussex, June 1996), 61–70.
Goldsmith, J. (2001). Unsupervised Learning of the Morphology of a Natural Language. In Computational Linguistics, 27(2), 153–198.
Ford A., & Singh R. (1991). Propedeutique Morphologique. Folia Linguistica, 25 (3–4), 549–575; Neuvel Sylvain (2002). Whole Word Morphologizer: Expanding the Word-Based Lexicon: A Nonstochastic Computational Approach. In Brain and Language 81, 454–463.
Neural Machine Translation (seq2seq) Tutorial – Режим доступа: https://www.tensorflow.org/tutorials/seq2seq.
Word2vec – Режим доступа: https://ru.wikipedia.org/wiki/Word2vec.
Klymenko, N. F. et al. (2014). Morfemno-slovotvirnyj fond ukrajins’koji movy jak doslidnyc’ka ta informacijno-dovidkova systema. In Klymenko N. F. Vybrani praci, pages 545–558, Kyiv.
Aliqulov AX, Yadgarov TG, Abduganiyeva O,I. Development of information database for providing data connection in work with the moodle cloud system for the credit training. International Conference on Information Science and Communications Technologies ICISCT 2021 Applications, Trends and Opportunities, 3rd - 5th November, 2021, Tashkent and Urgench, Uzbekistan.
S. Sadullaeva, Z. Fayzullaeva and D. Nazirova, "Numerical Analysis of Doubly Nonlinear Reaction-Diffusion System with Distributed Parameters," 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Istanbul, Turkey, 2020, pp. 1-3, doi: 10.1109/ISMSIT50672.2020.9255106.
Abdurakhmonova, N.; Urdishev, K. Corpus Based Teaching Uzbek as A Foreign Language. J. Foreign Lang. Teach. Appl. Linguist. 2019, 6, 131–136.
[Abdurakhmonova2020] Nilufar Abdurakhmonova. 2020. Computational Linguistics. Lambert Academic Publishing, Germany. In Uzbek.
D. Palchunov, E. Akhmedov, IEEE international conference on actual problems of electronic instrument engineering (APEIE), pp. 1460-1465, November, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Fayzullayeva Zarnigor Inatillayevna, Olimova Muxlisa Vohidjon qizi, Karimov Nodirbek Nosirjon o‘g‘li
This work is licensed under a Creative Commons Attribution 4.0 International License.