INTELLEKTUAL ALGORITMLAR ASOSIDA SHAXSNI TANIB OLISHDA NUTQ SIGNALLARINING XUSUSIYATLARI VA PARAMETRLARINI SHAKLLANTIRISH
Ключевые слова:
Shaxsni tanib olish, nutq signali, Mel-chastota spektral koeffitsiyentlari (MFCC), Linear Predictive Coding (LPC), prosodik xususiyatlar, mashinali o‘rganish, sun’iy intellekt, neyron tarmoqlar, identifikatsiya, akustik tahlilАннотация
Ushbu maqolada shaxsni tanib olish tizimlarida nutq signallarini qayta ishlash orqali informatsion xususiyatlarni ajratib olish va ularning asosida shaxsga oid parametrlarni shakllantirish masalalari ko‘rib chiqiladi. Tadqiqotda sun’iy intellekt texnologiyalari, xususan, mashinali o‘rganish va chuqur neyron tarmoqlar asosida ishlovchi algoritmlar qo‘llanilib, nutq signalining tembr, ton, formant chastotalar, energiya spektri kabi statistik va spektral xususiyatlari aniqlanadi. Shuningdek, Mel-chastota spektral koeffitsiyentlari (MFCC), Linear Predictive Coding (LPC) va prosodik parametrlar asosida foydalanuvchini identifikatsiyalash uchun optimal atributlar tanlash va klassifikatsiya modellarini qurish yo‘llari tahlil qilinadi. Eksperimental natijalar keltirilgan bo‘lib, ularda ishlab chiqilgan intellektual algoritmlarning aniqligi, barqarorligi va turli akustik sharoitlarga nisbatan moslashuvchanligi ko‘rsatib berilgan. Mazkur tadqiqot natijalari shaxsni tanib olish tizimlarining samaradorligini oshirishda muhim ahamiyatga ega bo‘lishi mumkin. Usullar va algoritmlar yordamida shaxsni tanib olishga qaratilgan muhim ilmiy ishlanmalar tahlil qilingan.
Библиографические ссылки
Bai, S., Zhang, C. and Koishida, K., 2023. Speaker verification based on depthwise separable convolutions and channel attention. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Bhattacharya, G., Alam, J. and Kenny, P., 2017. Deep speaker embeddings for short-duration speaker verification. In: Proceedings of Interspeech.
Cai, W., Cai, D. and Li, M., 2020. Deep speaker embedding learning with multi-level pooling for text-independent speaker verification. In: Proceedings of Interspeech.
Cummins, N., Schuller, B. and Christensen, H., 2018. Low-level descriptors for computational paralinguistics: The INTERSPEECH 2018 baseline. In: Proceedings of Interspeech.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P. and Ouellet, P., 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), pp.788–798.
Desplanques, B., Thienpondt, J. and Demuynck, K., 2020. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN-Based Speaker Verification. In: Proceedings of Interspeech.
Heigold, G., Moreno, P., Bengio, S. and Shazeer, N., 2016. End-to-end text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Huang, Z., Wang, S. and Liu, Z., 2020. Knowledge distillation for speaker recognition. In: Proceedings of Interspeech.
Jung, J., Heo, H.S., Yang, I. and Han, S., 2022. Large margin softmax loss for speaker verification. IEEE Signal Processing Letters.
Kanagasundaram, A., Sridharan, S. and Dean, D., 2019. Improving speaker recognition performance in noisy conditions using multi-condition training. Computer Speech & Language.
Koluguri, S.K. and Sahu, S., 2020. Residual LSTM network for speaker verification. IEEE Access.
Nagrani, A., Chung, J.S. and Zisserman, A., 2017. VoxCeleb: A large-scale speaker identification dataset. In: Proceedings of Interspeech.
Okabe, K., Koshinaka, T. and Shinoda, K., 2018. Attentive statistics pooling for deep
speaker embedding. In: Proceedings of Interspeech.
Panayotov, V., Chen, G., Povey, D. and Khudanpur, S., 2015. LibriSpeech: An ASR corpus based on public domain audio books. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Ravanelli, M. and Bengio, Y., 2021. Speaker recognition from raw waveform with SincNet. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. and Khudanpur, S., 2018. X-vectors: Robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Wang, D., Li, J. and Liu, Y., 2021. Noise robust speaker verification with multi-task learning. IEEE Access.
Xie, W., Nagrani, A., Chung, J.S. and Zisserman, A., 2019. Utterance-level aggregation for speaker recognition using attentive pooling. In: Proceedings of Interspeech.
Yadav, S. and Rai, R., 2021. Deep learning techniques for speaker recognition: A review. Journal of Intelligent Systems.
Zhang, C. and Koishida, K., 2017. End-to-end text-independent speaker verification with triplet loss on short utterances. In: Proceedings of Interspeech. Rahman, A.A.A., Hamid, N.A. and Ahmad, A., 2021. Speaker Identification Using LPC and MFCC Methods. Malaysian Journal of Computer Science, [online] Available at: https://mjcs.fsktm.um.edu.my/article/view/34091 [Accessed 5 Apr. 2025].
Загрузки
Опубликован
Как цитировать
Выпуск
Раздел
Лицензия
Copyright (c) 2025 Raxmatov Furqat Abdurazzoqovich, Abdirazakov Faxriddin Bekpulatovich, Temirov Azizbek Abdumannob o‘g‘li, Nasirov Sulton Uali o‘g‘li

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.