SVM ALGORITMI ASOSIDA SUN’IY NUTQNI ANIQLASH VA IDENTIFIKATSIYA QILISHNING INTELLEKTUAL YONDASHUVI
Keywords:
SVM (RBF), TTS, MFCC±Δ, F₀, jitter, HNR, H (entropiya), E_log, ZCR, z-score, CV, ROC-AUC, EER/HTERAbstract
Ushbu maqolada sun’iy nutqni aniqlash va uni identifikatsiya qilish uchun SVM (Support Vector Machine) algoritmiga asoslangan yondashuv taklif etiladi. Taklif etilgan pipeline 8 kHz chastotada yozilgan nutq yozuvlarini qayta ishlashga moslashtirilgan bo‘lib, akustik xususiyatlar sifatida quyidagi parametrlar tanlangan: MFCC₁–MFCC₁₃, F₀, H, Eₗₒg, ZCR, ΔMFCC, Jitter va HNR. Ushbu parametrlar nutq signalining kepstral, fonatsion va prosodik xususiyatlarini qamrab olib, model uchun informativ vakillikni ta’minlaydi. Xususiyatlar freym darajasida ajratilib, utterance (gap) darajasida statistik agregatsiyalar (o‘rtacha qiymat, dispersiya, percentillar, diapazon) yordamida fiks uzunlikdagi vektorlarga aylantiriladi. Olingan vektorlar z-score normalizatsiyasi orqali standartlashtiriladi. Klassifikatsiya bosqichida esa RBF yadrosiga ega SVM modeli qo‘llaniladi. Model gipermetrlari nested k-fold cross-validation yordamida optimallashtiriladi, sinflar o‘rtasidagi nomutanosiblik esa sinf og‘irliklari orqali balanslanadi. Baholash jarayonida Accuracy, Macro-F1, ROC-AUC hamda anti-spoofing vazifasiga xos EER/HTER mezonlari qo‘llanilgan. Shuningdek, ablyatsiya tahlili(modelning qaysi qismi yoki qaysi xususiyat(lar) haqiqatan foyda berayotganini aniqlash uchun u(lar)ni tizimli ravishda olib tashlab (yoki alohida qo‘shib) ko‘rsatkichlar qanday o‘zgarishini solishtirish usuli) orqali har bir parametrning model natijalariga qo‘shgan hissasi alohida o‘rganilgan. Eksperimental natijalar shuni ko‘rsatdiki, MFCC parametrlarini ΔMFCC, Jitter va HNR kabi qo‘shimcha xususiyatlar bilan boyitish modelning diskriminativ qobiliyatini sezilarli darajada oshiradi. Taklif etilgan yondashuv yengil, barqaror va real vaqtli tizimlarga integratsiyaga tayyor bo‘lib, sun’iy nutqni aniqlashda yuqori aniqlikni ta’minlaydi.
References
1. Mahmoud M. Selim, Mohammed S. Assiri, «Enhancing Arabic text-to-speech synthesis for emotional expression in visually impaired individuals using the artificial hummingbird and hybrid deep learning model,» Alexandria Engineering Journal, pp. Pages 493-502, 2025.
2. Seyed Reza Shahamiri, Siti Salwah Binti Salim, «Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach,» Advanced Engineering Informatics, Т. %1 из %2Volume 28, Issue 1, № ISSN 1474-0346, https://doi.org/10.1016/j.aei.2014.01.001, pp. Pages 102-110,, 2014.
3. Useung Lee, Eun-Seok Jeon, Shin Hur, Chang-Soo Han,, «Artificial basilar membrane/hair cell integrated acoustic system for keyword spotting in noisy environments inspired by human cochlea,» Measurement, т. 115722, № ISSN 0263-2241, p. Volume 241, 2025.
4. Seung-Min Jeong, Young-Do Song, Chae-Lin Seok, Jun-Young Lee, Eui Chul Lee, Han-Joon Kim,, «Machine learning-based classification of Parkinson's disease using acoustic features: Insights from multilingual speech tasks,» Computers in Biology and Medicine, т. 109078, № ISSN 0010-4825, https://doi.org/10.1016/j.compbiomed.2024.109078, p. Volume 182, 2024.
5. Ruichen Ming, XiaoXiong Liu, Yu Li, Wei Huang, Weiguo Zhang, «DRNAS: Differentiable RBF neural architecture search method considering computation load in adaptive control,,» Engineering Applications of Artificial Intelligence, т. Part B, № ISSN 0952-1976,https://doi.org/10.1016/j.engappai.2023.107326, p. Volume 127, 2024.
6. Arnab Kumar Das, Ruchira Naskar,, «A deep learning model for depression detection based on MFCC and CNN generated spectrogram features,» Biomedical Signal Processing and Control, т. 105898, № ISSN 1746-8094, https://doi.org/10.1016/j.bspc.2023.105898, p. Volume 90, 2024.
8. S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan,, «Significance of chirp MFCC as a feature in speech and audio applications,» Computer Speech & Language, т. 101713, № ISSN 0885-2308, https://doi.org/10.1016/j.csl.2024.101713, pp. Volume 89,, 2025.
9. Vitor Lavor, Jianjian Wei, Omduth Coceal, Sue Grimmond, Zhiwen Luo,, «Quanta emission rate during speaking and coughing mediated by indoor temperature and humidity,,» Environment International, т. 109379, № ISSN 0160-4120, https://doi.org/10.1016/j.envint.2025.109379, p. Volume 198, 2025.
10. João Paulo Teixeira, Paula Odete Fernandes,, «Jitter, Shimmer and HNR Classification within Gender, Tones and Vowels in Healthy Voices,» Procedia Technology, т. Volume 16, № ISSN 2212-0173, https://doi.org/10.1016/j.protcy.2014.10.138, pp. Pages 1228-1237, 2014.
11. Anu Nair, Srikanta K Mishra, Sajana Aryal, Qian-jie Fu, John J. Galvin,, «On the cocktail-party problem: Do children use their exquisite hearing at frequencies above 8 kHz?,» Hearing Research, т. 109327, № ISSN 0378-5955, https://doi.org/10.1016/j.heares.2025.109327, pp. Volume 464,, 2025.
12. Shengjie Qin, Yuezhou Zhang, Yuliang Ma, Hui Li, Xingxing Li, Bin Lian, Weiming Cai, Jialin Cui, Xianghong Zhao,, «A cross-linguistic depression detection method based on speech data,» Journal of Affective Disorders, т. 119739, № ISSN 0165-0327, https://doi.org/10.1016/j.jad.2025.119739, p. Volume 390, 2025.
13. Angshuman Khan, Rohit Kumar Shaw, Ali Newaz Bahar,, «A neural cantonese speech converter using QCA for nanocomputing,,» Computers and Electrical Engineering,, т. 110536, № ISSN 0045-7906, https://doi.org/10.1016/j.compeleceng.2025.110536, p. Volume 126, 2025.
14. Józef Kotus, Grzegorz Szwoch, «Speech intelligibility improvement for public address systems in noisy environments based on automatic gain selection in octave band,» Applied Acoustics, Т. %1 из %2110683,, № ISSN 0003-682X, https://doi.org/10.1016/j.apacoust.2025.110683, p. Volume 235, 2025.
15. Yuanyuan Zhou, Zhuoying Fei, Jun Yang, Demei Kong, «Serve with voice: The role of agents’ vocal cues in the call center service,» Journal of Business Research, т. 115282, № ISSN 0148-2963, https://doi.org/10.1016/j.jbusres.2025.115282, p. Volume 192, 2024.
16. I Gusti Agung Gede Arya Kadyanan, Ngurah Agus Sanjaya ER, Anak Agung Istri Ngurah Eka Karyawati, I Gede Ngurah Arya Wira Putra, I Made Suma Gunawan, Ni Made Julia Budiantari, Hana Christine Octavia,, «Balinese text-to-speech dataset as digital cultural heritage,» Data in Brief, т. 111528, № ISSN 2352-3409, https://doi.org/10.1016/j.dib.2025.111528, pp. Volume 60,, 2025.
17. Manisa Manoswini, Biswajit Sahoo, Aleena Swetapadma,, «A novel speech signal feature extraction technique to detect speech impairment in children accurately,» Computers in Biology and Medicine, т. 110681, № ISSN 0010-4825, https://doi.org/10.1016/j.compbiomed.2025.110681, p. Volume 195, 2025.
18. Laura E. Toles, Olivia Murton,, «Estimating Pressed and Breathy Phonation From Cepstral and Spectral Measures,,» Journal of Voice, № ISSN 0892-1997, https://doi.org/10.1016/j.jvoice.2025.02.017, 2025.
19. Ariel Cohen, Denis Shyrman, Aleksandr Solonskyi, Roman Frenkel, Arkady Krishtul, Oren Gal, «Robust prosody modeling for synthetic speech detection,» Speech Communication,, т. 103283, № ISSN 0167-6393, https://doi.org/10.1016/j.specom.2025.103283, p. Volume 174, 2025.
20. Bahram Kouhi-Jelehkaran, Hamidreza Bakhshi, Farbod Razzazi,, «Phone-based filter parameter optimization of filter and sum robust speech recognition using likelihood maximization,» AEU - International Journal of Electronics and Communications, Т. %1 из %2Volume 64, Issue 12,, № ISSN 1434-8411, https://doi.org/10.1016/j.aeue.2009.11.014, pp. Pages 1167-1172, 2010.
21. Mohammed Mohammed, Jawad K. Oleiwi, Aeshah M. Mohammed, Azlin F. Osman, Tijjani Adam, Bashir O. Betar, Subash C.B. Gopinath,, «Artificial intelligence approaches in predicting the mechanical properties of natural fiber-reinforced concrete: A comprehensive review,,» Engineering Applications of Artificial Intelligence, т. 110933, № ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2025.110933, p. Volume 153, 2025.
22. Manila Kodali, Luna Ansari, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku,, «Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models,,» Speech Communication, т. 103288, № ISSN 0167-6393, https://doi.org/10.1016/j.specom.2025.103288, p. Volume 174, 2025.
23. Xueying Zhang, Xiaofeng Liu, Zizhong John Wang,, «Evaluation of a set of new ORF kernel functions of SVM for speech recognition,,» Engineering Applications of Artificial Intelligence,, Т. %1 из %2Volume 26, Issue 10,, № ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2013.04.008, pp. Pages 2574-2580,, 2013.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Raxmatov Furqat Abdurazzoqovich, Abdirazakov Faxriddin Bekpulatovich, Nasirov Sulton Uali o‘g‘li, Javliyev Shahzod Alisher o‘g‘li, Sayfullaeva Nargiza Akromovna

This work is licensed under a Creative Commons Attribution 4.0 International License.







