KLASTERLASH ASOSIDA IJTIMOIY TARMOQ YOZISHMALARINING XAVF DARAJASINI TAHLIL QILISH YONDASHUVI (K=4)

Authors

  • Babomurodov Ozod Jurayevich Jizzax shaxridagi Qozon federal universiteti filiali
  • Qo’yliyeva Feruzaxon Alisher qizi Toshkent davlat agrar universiteti

Keywords:

K-means, klasterlash, xavfli kontent, ijtimoiy tarmoq, matnni tahlil qilish, semantik guruhlanish, BERT

Abstract

Ushbu maqolada ijtimoiy tarmoqlarda, xususan Telegram platformasida tarqalgan matnli yozishmalarni xavf darajasiga ko‘ra avtomatik guruhlash masalasi ko‘rib chiqiladi. Tadqiqotning asosiy maqsadi K-means klasterlash algoritmining k = 4 parametri asosida ishlash samaradorligini baholash hamda turli hajmdagi datasetlarda klasterlash sifati va barqarorligini aniqlashdan iborat. Tadqiqot
 jarayonida 1000, 5000, 10000 va 20000 ta yozishmadan iborat to‘rtta turli hajmdagi ma’lumotlar to‘plami shakllantirildi va ular asosida klasterlar sifati, semantik chegaralar hamda ichki uyg‘unlik darajasi tahlil qilindi. Olingan natijalar kichik hajmdagi datasetlarda klasterlash natijalari yetarli darajada barqaror bo‘lmasligini, o‘rtacha hajmdagi ma’lumotlar bazasida esa klasterlar shakllanish jarayonida ekanini ko‘rsatdi. 10000 ta yozishmadan iborat datasetda k = 4 parametri bilan eng toza, aniq va barqaror semantik guruhlanish hosil bo‘lgani aniqlandi. 20000 ta yozishmadan iborat datasetda esa semantik variativlikning ortishi klasterlar barqarorligining pasayishiga olib keldi. Tadqiqot natijalari K-means algoritmi asosida olingan klasterlash natijalari keyingi bosqichda transformer modellarini (XLM-R, mBERT, BERTbek) o‘qitish uchun samarali belgilash bazasini yaratishda muhim ahamiyatga ega ekanini ko‘rsatadi. Ushbu yondashuv ijtimoiy tarmoqlarda xavfli kontentni avtomatik aniqlash va monitoring qilish tizimlarini takomillashtirishda amaliy ahamiyat kasb etadi.

References

1. MacQueen J. Some methods for classification and analysis of multivariate observations // Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. — Berkeley, 1967. — P. 281–297.

2. Aggarwal C. C., Zhai C. Mining Text Data. — New York: Springer, 2012. — 543 p.

3. Han J., Kamber M., Pei J. Data Mining: Concepts and Techniques. — 3rd ed. — San Francisco: Morgan Kaufmann, 2011. — 703 p.

4. Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space // arXiv preprint. — 2013. — arXiv:1301.3781.

5. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding // Proceedings of NAACL-HLT. — Minneapolis, 2019. — P. 4171–4186.

6. Conneau A., Khandelwal K., et al. Unsupervised cross-lingual representation learning at scale // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). — Seattle, 2020. — P. 8440–8451.

7. Raschka S., Mirjalili V. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. — Birmingham: Packt Publishing, 2019. — 622 p.

8. Srivastava A., Sahami M. Text Mining: Classification, Clustering, and Applications. — Boca Raton: Chapman & Hall/CRC, 2009. — 344 p.

9. Uzbek Language BERT (BERTbek) Model Description. HuggingFace Model Card. — 2023. — URL: https://huggingface.co (murojaat sanasi: 2025.01.15).

Downloads

Published

2026-02-16

How to Cite

KLASTERLASH ASOSIDA IJTIMOIY TARMOQ YOZISHMALARINING XAVF DARAJASINI TAHLIL QILISH YONDASHUVI (K=4). (2026). DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 4(1), 32-37. https://dtai.tsue.uz/index.php/dtai/article/view/v4i14