O‘ZBEK TILIDAGI MATNLARIDAGI PEREFRAZ BIRLIKLARNI ANIQLASH

Authors

  • Axmedova Xusniya Xusanovna Muhammad al-Xorazmiy nomidagi Toshkent axborot texnologiylari universiteti
  • Sultanov Djamshid Baxodirovich Muhammad al-Xorazmiy nomidagi Toshkent axborot texnologiylari universiteti

Keywords:

perefraz gaplar, semantic o‘xshashliklar, Jaccard algoritmi, Glove algoritmi, Mashinali o‘qitish

Abstract

O‘zbek tilidagi matnlari tarkibidagi perefraz juftliklarni aniqlash masalasining yechish ko‘plab NLP masalalarini yechishda qulayliklar yaratib beradi. Bu plagiat tizimlaridan farqli ravish bitta matn tarkibidagi duvlikatlarni aniqlaydi. Perefraz gaplarni aniqlashda ko‘plab ML algoritmlaridan foydalaniladi. Ushbu maqolada gaplarning o‘xshashligini aniqlashda Jaccard algoritmi, so‘zlarning semantic o‘xshashlik to‘plamini tuzishda Glove algoritmidan foydalanish ketma-ketliklari keltirilgan. Perefrazni aniqlash axborot tizimining foydalanuvchilari hamda unda foydalanishda kechadigan jarayonlar haqida ma’lumotlar keltirilgan.

References

Pronoza E., Yagunova E., Kochetkova N. Sentence Paraphrase Graphs: Classification Based on Predictive Models or Annotators’ Decisions?. In: Sidorov G., Herrera-Alcántara O. (eds) Advances in Computational Intelligence. MICAI 2016. Lecture Notes in Computer Science, vol 10061. Springer, Cham (2017).

Pronoza, E., Yagunova, E. Comparison of sentence similarity measures for Russian paraphrase identification. In Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), pp. 74–82, IEEE (2015a).

Pronoza, E., Yagunova, E. Low-Level Features for Paraphrase Identification. In: Sidorov, G., Galicia-Haro, Sofía N. (eds.) MICAI 2015. LNCS, vol. 9413, pp. 59–71. Springer, Cham (2015b).

Pronoza, E., Yagunova, E., Pronoza, A. Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction. Proceedings of the 9th Russian Summer School in Information Retrieval, August 24–28, 2015, Saint-Petersburg, Russia, (RuSSIR 2015, Young Scientist Conference), Springer CCIS (2015).

Androutsopoulos, I., Prodromos Malakasiotis, P. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research, v. 38: 135–187 (2010).

Fernando, S., Stevenson, M. A semantic similarity approach to paraphrase detection. In Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52, (2008).

Pham, N., Bernardi, R., Zhang, Y. Z., Baroni, M. Sentence paraphrase detection: When determiners and word order make the difference. In Proceedings of the Towards a Formal Distributional Semantics Workshop at IWCS 2013, pp. 21–29 (2013).

Rocktäschel, T., Grefenstette, E., Hermann, K. M., Kočiský, T., Blunsom, P. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664. (2015)

He, H., Gimpel, K., Lin, J. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1576–1586 (2015).

M. J. Kusner, Y. Sun, N. I. Kolkin, K. Q. Weinberger, From Word Embeddings to Document Distances, JMLR: W&CP 37 (2015) 957-966.

Y. Zhang, J. Baldridge, L. He, Zhang Y. PAWS: Paraphrase Adversaries from Word Scrambling, ArXiv, 2019. DOI: 10.48550/arXiv.1904.01130.

V.-A. Oliinyk, V. Vysotska, Y. Burov, K. Mykich, V. B. Fernandes, Propaganda Detection in Text Data Based on NLP and Machine Learning, CEUR workshop proceedings 2631 (2020) 132-144.

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, RoBERTa: A Robustly Optimized BERT Pretraining Approach, ArXiv, 2019. DOI: 10.48550/arXiv.1907.11692.

V. Vysotska, Y. Burov, V. Lytvyn, A. Demchuk, Defining Author’s Style for Plagiarism Detection in Academic Environment, in: Proceedings of the International Conference on Data Stream Mining and Processing, DSMP, 2018, pp. 128-133. DOI: 10.1109/DSMP.2018.8478574.

N. Shakhovska, I. Shvorob, The method for detecting plagiarism in a collection of documents, in: International Conference on Computer Sciences and Information Technologies, 2015, pp. 42-145.

O. Karnalim, G. Kurniawati, Programming Style on Source Code Plagiarism and Collusion Detection, International Journal of Computing 19(1) (2020) 27-38. 17. N. Khairova, A. Shapovalova, O. Mamyrbayev, N. Sharonova, K. Mukhsina, Using BERT model to Identify Sentences Paraphrase in the News Corpus, CEUR Workshop Proceedings Vol-3171, (2022) 38-48.

N. Grabar, T. Hamon, Exploitation of the morphology for automatic extraction of general paraphrases of medical terms, Revue Traitement Automatique des Langues 57(1) (2016) 85–109.

I. Eshkol-Taravella, N. Grabar, Paraphrastic reformulations in spoken corpora, Lecture Notes in Computer Science 8686 (2014) 425–437.

Downloads

Published

2025-06-15

How to Cite

Axmedova, X., & Sultanov , D. (2025). O‘ZBEK TILIDAGI MATNLARIDAGI PEREFRAZ BIRLIKLARNI ANIQLASH. DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 3(3), 105–111. Retrieved from https://dtai.tsue.uz/index.php/dtai/article/view/V3I316