SO‘ZLOVCHILARNI AJRATISHDA ECAPA-TDNN ASOSIDAGI XUSUSIYATLAR VEKTORLARIDAN FOYDALANISH

Authors

  • Umidjon Xasanov Komiljon o‘g‘li Muhammad al-Xorazmiy nomidagi Toshkent axborot texnologiyalari universiteti

Keywords:

So‘zlovchilarni ajratish, so‘zlovchilarni ifodalovchi xususiyatlar vektori, ECAPA-TDNN, X-vektor, Spektral klasterlash

Abstract

So‘zlovchilarni ajratishda xususiyatlar vektorlarini shakllantirish eng muhim bosqichlardan biri hisoblanadi. Zamonaviy chuqur neyron tarmoqlar so‘zlovchining individual xususiyatlarini aniqlash va ularni ifodalashda yuqori aniqlikni ta’minlay oladi. Xususan, x-vector kabi mashhur xususiyatlar vektorlari hozirgi tizimlarda asosiy komponent sifatida qo‘llanib kelmoqda. So‘nggi yillarda TDNN arxitekturasi ustida turli takomillashtirishlar ishlab chiqildi. ECAPA-TDNN modeli esa o‘ziga xos kanal e’tibor mexanizmi va ko‘p qatlamli agregatsiya orqali so‘zlovchini tekshirish vazifalarida yuqori samaradorlikni ko‘rsatdi. Tadqiqot mazkur modelni diarizatsiya sohasiga tatbiq etish imkoniyatlarini namoyish etadi.

References

1. X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals, “Speaker diarization: A review of recent research,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 356–370, 2012.

2. T. J. Park, N. Kanda, D. Dimitriadis, K. J. Han, S. Watanabe, and S. Narayanan, “A review of speaker diarization: Recent advances with deep learning,” 2021, arXiv:2101.09624.

3. G. Sell, D. Snyder, A. McCree, D. Garcia-Romero, J. Villalba, M. Maciejewski, V. Manohar, N. Dehak, D. Povey, S. Watanabe, and S. Khudanpur, “Diarization is hard: Some experiences and lessons learned for the JHU team in the inaugural DIHARD challenge,” in Proc. Interspeech, 2018, pp. 2808–2812.

4. N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, and M. Liberman, “The second DIHARD diarization challenge: Dataset, task, and baselines,” in Proc. Interspeech, 2019, pp. 978– 982.

5. L. Wan, Q. Wang, A. Papir, and I. Lopez-Moreno, “Generalized end-to-end loss for speaker verification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4879–4883, 2018.

6. G. Sun, C. Zhang, and P. C. Woodland, “Speaker diarisation using 2d self-attentive combination of embeddings,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5801–5805, 2019.

7. D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust DNN embeddings for speaker recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5329–5333, 2018.

8. B. Desplanques, J. Thienpondt, and K. Demuynck, “ECAPATDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification,” in Proc. Interspeech, 2020, pp. 3830–3834.

9. H. Zeinali, K. A. Lee, J. Alam, and L. Burget, “SdSV challenge 2020: Large-scale evaluation of short-duration speaker verification,” in Proc. Interspeech, 2020, pp. 731–735.

10. J. Thienpondt, B. Desplanques, and K. Demuynck, “Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization,” in Proc. Interspeech, 2020, pp. 756–760.

11. D. Garcia-Romero, G. Sell, and A. McCree, “Magneto: X-vector magnitude estimation network plus offset for improved speaker recognition,” in Proc. Odyssey, 2020, pp. 1–8.

12. J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation networks,” in IEEE/CVF CVPR, 2018, pp. 7132–7141.

13. S. Gao, M.-M. Cheng, K. Zhao, X. Zhang, M.-H. Yang, and P. H. S. Torr, “Res2Net: A new multi-scale backbone architecture,” IEEE TPAMI, pp. 652–662, 2019.

14. Z. Gao, Y. Song, I. McLoughlin, P. Li, Y. Jiang, and L.-R. Dai, “Improving aggregation and loss function for better embedding learning in end-to-end speaker verification system,” in Proc. Interspeech, 2019, pp. 361–365.

15. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition,” in IEEE/CVF CVPR, 2019, pp. 4685–4694.

16. D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” in Proc. Interspeech, 2019, pp. 2613–2617.

17. N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, “Novel architectures for unsupervised information bottleneck based speaker diarization of meetings,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 14–27, 2021.

18. E. Hoffer, T. Ben-Nun, I. Hubara, N. Giladi, T. Hoefler, and D. Soudry, “Augment your batch: better training with larger batches,” CoRR, vol. abs/1901.09335, 2019.

19. Shukurov K.E., Kaxxarov A.A., Xasanov U.K., “O‘qituvchisiz o‘qitish asosida so‘zlovchilarni ajratish”, “Raqamli transformatsiya: axborot texnologiyalari, sun’iy intellekt va iqtisodiyotda yangi davr”, 442-447b, 2025

20. Xasanov U.K., To‘rayev X.Sh., Tolibova N.D., “So‘zlovchilarni ajratish jarayonlarida nutq signallariga dastlabki ishlov berish”, 153-157b, 2025

Downloads

Published

2025-12-25

How to Cite

SO‘ZLOVCHILARNI AJRATISHDA ECAPA-TDNN ASOSIDAGI XUSUSIYATLAR VEKTORLARIDAN FOYDALANISH. (2025). DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 3(6), 152-158. https://dtai.tsue.uz/index.php/dtai/article/view/v3i623