1.Neural network for auditory speech enhancement featuring feedback-driven attention and lateral inhibition.
Yudong CAI ; Xue LIU ; Xiang LIAO ; Yi ZHOU
Journal of Biomedical Engineering 2025;42(1):82-89
The processing mechanism of the human brain for speech information is a significant source of inspiration for the study of speech enhancement technology. Attention and lateral inhibition are key mechanisms in auditory information processing that can selectively enhance specific information. Building on this, the study introduces a dual-branch U-Net that integrates lateral inhibition and feedback-driven attention mechanisms. Noisy speech signals input into the first branch of the U-Net led to the selective feedback of time-frequency units with high confidence. The generated activation layer gradients, in conjunction with the lateral inhibition mechanism, were utilized to calculate attention maps. These maps were then concatenated to the second branch of the U-Net, directing the network's focus and achieving selective enhancement of auditory speech signals. The evaluation of the speech enhancement effect was conducted by utilising five metrics, including perceptual evaluation of speech quality. This method was compared horizontally with five other methods: Wiener, SEGAN, PHASEN, Demucs and GRN. The experimental results demonstrated that the proposed method improved speech signal enhancement capabilities in various noise scenarios by 18% to 21% compared to the baseline network across multiple performance metrics. This improvement was particularly notable in low signal-to-noise ratio conditions, where the proposed method exhibited a significant performance advantage over other methods. The speech enhancement technique based on lateral inhibition and feedback-driven attention mechanisms holds significant potential in auditory speech enhancement, making it suitable for clinical practices related to artificial cochleae and hearing aids.
Humans
;
Attention/physiology*
;
Speech Perception/physiology*
;
Neural Networks, Computer
;
Speech
;
Noise
;
Feedback
2.Research on bimodal emotion recognition algorithm based on multi-branch bidirectional multi-scale time perception.
Peiyun XUE ; Sibin WANG ; Jing BAI ; Yan QIANG
Journal of Biomedical Engineering 2025;42(3):528-536
Emotion can reflect the psychological and physiological health of human beings, and the main expression of human emotion is voice and facial expression. How to extract and effectively integrate the two modes of emotion information is one of the main challenges faced by emotion recognition. In this paper, a multi-branch bidirectional multi-scale time perception model is proposed, which can detect the forward and reverse speech Mel-frequency spectrum coefficients in the time dimension. At the same time, the model uses causal convolution to obtain temporal correlation information between different scale features, and assigns attention maps to them according to the information, so as to obtain multi-scale fusion of speech emotion features. Secondly, this paper proposes a two-modal feature dynamic fusion algorithm, which combines the advantages of AlexNet and uses overlapping maximum pooling layers to obtain richer fusion features from different modal feature mosaic matrices. Experimental results show that the accuracy of the multi-branch bidirectional multi-scale time sensing dual-modal emotion recognition model proposed in this paper reaches 97.67% and 90.14% respectively on the two public audio and video emotion data sets, which is superior to other common methods, indicating that the proposed emotion recognition model can effectively capture emotion feature information and improve the accuracy of emotion recognition.
Humans
;
Emotions
;
Algorithms
;
Facial Expression
;
Time Perception
;
Neural Networks, Computer
;
Speech
3.Perception of Mandarin aspirated/unaspirated consonants in children with cochlear implants.
Yani LI ; Qun LI ; Jian WEN ; Lin LI ; Yun ZHENG
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(4):312-318
Objective:This study aims to investigate the perception of Mandarin aspirated and unaspirated consonants by children with cochlear implants (CIs) under quiet and noisy conditions. It also examines factors that may affect their acquisition, such as auditory conditions, place of articulation, manner of articulation, chronological age, age at implantation, and non-verbal intelligence. Methods:Twenty-eight CI children aged 3 to 5 years who received implantation from 2018 to 2023 were recruited. Additionally, 88 peers with normal hearing (NH) were recruited as controls. Both groups participated in a perception test for aspirated/unaspirated consonants under quiet and noisy conditions, along with tests for speech recognition, speech production, and non-verbal intelligence. The study analyzed the effects of group (CI vs. NH), auditory condition, and consonant characteristics on children's perception of aspirated/unaspirated consonants in Mandarin, as well as the factors contributing to CI children's acquisition of these consonants. Results:①CI children's ability to perceive aspirated/unaspirated consonants was significantly poorer than that of their NH peers (χ²= 14.16, P<0.01), and their perception accuracy was influenced by the acoustic features of consonants (P<0.01); ②CI children's consonant perception abilities were adversely affected by noise (P<0.01), with accuracy in noisy conditions particularly influenced by the manner of articulation (P<0.05); ③The age at implantation significantly affected CI children's ability to perceive aspirated/unaspirated consonants (β= -0.223, P=0.012), with earlier implantation associated with better performance. Conclusion:It takes time for CI children to acquire Mandarin aspirated/unaspirated consonants, and early implantation shows many advantages, especially for the perception ability of fine speech features.
Humans
;
Cochlear Implants
;
Child, Preschool
;
Speech Perception
;
Cochlear Implantation
;
Male
;
Female
;
Language
4.Comparison and study of multiple scales results in children with cochlear reimplantation, mainly the speech, spatial, and other qualities of hearing scale for parents.
Tian NI ; Jinyuan SI ; Haotian LIU ; Xinyi YAO ; Xiangling ZHANG ; Huilin YIN ; Lin ZHANG ; Xiuyong DING ; Yu ZHAO
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(5):433-442
Objective:To compare the outcomes of multiple scales, primarily the speech, spatial, and other qualities of hearing scale for parents(SSQ-P), in children with ipsilateral vs. Contralateral cochleareimplantat ion(CRI). Methods: A total of 69 children who received cochlear implantation surgery from April 1999 to June 2024 were included. Patients were divided into two groups based on whether the implantation was on the same side. General information such as gender, age, age at initial implantation and reimplantation was collected. The primary caregivers of the children were followed up by telephone using the categories of auditory performance(CAP), speech intelligibility rating(SIR), and SSQ-P questionnaires. Statistical methods including stepwise regression, linear regression, and permutation tests were employed to investigate if there were any statistically significant differences in the scores of CAP, SIR, SSQ-P total, SSQ-P speech perception, SSQ-P spatial hearing, and SSQ-P auditory quality dimensions between the ipsilateral and contralateral reimplantation groups. Results:Of the 69 children included, 62 were in the ipsilateral reimplantation group with a mean age of 11.1 years, and 7 were in the contralateral reimplantation group with a mean age of 11.7 years. Statistical analysis showed that patients in the contralateral reimplantation group had significantly lower SSQ-P total scores (P<0.05) and spatial hearing dimension scores (P<0.05) than those in the ipsilateral reimplantation group after controlling for the corresponding confounders. Conclusion:The effect of ipsilateral reimplantation of cochlear implants is superior to that of contralateral reimplantation in terms of overall auditory function and spatial hearing in daily life for children, but the mechanisms require further investigation.
Humans
;
Cochlear Implantation
;
Child
;
Parents
;
Speech Perception
;
Male
;
Cochlear Implants
;
Female
;
Hearing
;
Surveys and Questionnaires
;
Speech
;
Child, Preschool
5.Analyzing the factors influencing speech recognition ability in patients with age-related hearing loss.
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(7):657-666
Objective:To explore various factors influencing speech recognition ability in patients with age-related hearing loss(ARHL) and to investigate the correlation between speech recognition ability and cognitive function. Methods:This case-control study enrolled 150 ARHL patients(experimental group) and 132 normal-hearing controls. Participants underwent relevant assessments of auditory function, cognitive function, and tinnitus severity. Various statistical analyses were performed to evaluate the results. Results:①The PBmax and MoCA scores were significantly lower in the ARHL group compared to the control group(P<0.05). ②PBmax in the ARHL group was significantly influenced by multiple factors(P<0.05). ③Negative correlations were observed between PBmax in the ARHL group and age, degree of hearing loss, duration of the disease, duration of the worst hearing loss, smoking status, and tinnitus severity(P<0.05), while positive correlations were found between PBmax and education level, occupation type, frequency of verbal communication, and cognitive function level(P<0.05). ④Higher education level, frequent verbal communication, and high cognitive function level were protective factors for PBmax in ARHL patients(P<0.05), whereas the other factors were independent risk factors(P<0.05). ⑤A significant correlation was found between PBmax and MoCA scores in the ARHL group, and this correlation between cognitive function and speech recognition ability remained significant across different degrees of hearing loss(<0.05). Conclusion:Speech recognition ability in ARHL patients is influenced by multiple factors. Cognitive function demonstrates a robust, bidirectional association with speech recognition ability, even after adjusting for hearing loss severity.
Humans
;
Case-Control Studies
;
Middle Aged
;
Male
;
Female
;
Aged
;
Speech Perception
;
Cognition
;
Presbycusis/physiopathology*
;
Adult
;
Hearing Loss
6.Compensation or Preservation? Different Roles of Functional Lateralization in Speech Perception of Older Non-musicians and Musicians.
Xinhu JIN ; Lei ZHANG ; Guowei WU ; Xiuyi WANG ; Yi DU
Neuroscience Bulletin 2024;40(12):1843-1857
Musical training can counteract age-related decline in speech perception in noisy environments. However, it remains unclear whether older non-musicians and musicians rely on functional compensation or functional preservation to counteract the adverse effects of aging. This study utilized resting-state functional connectivity (FC) to investigate functional lateralization, a fundamental organization feature, in older musicians (OM), older non-musicians (ONM), and young non-musicians (YNM). Results showed that OM outperformed ONM and achieved comparable performance to YNM in speech-in-noise and speech-in-speech tasks. ONM exhibited reduced lateralization than YNM in lateralization index (LI) of intrahemispheric FC (LI_intra) in the cingulo-opercular network (CON) and LI of interhemispheric heterotopic FC (LI_he) in the language network (LAN). Conversely, OM showed higher neural alignment to YNM (i.e., a more similar lateralization pattern) compared to ONM in CON, LAN, frontoparietal network (FPN), dorsal attention network (DAN), and default mode network (DMN), indicating preservation of youth-like lateralization patterns due to musical experience. Furthermore, in ONM, stronger left-lateralized and lower alignment-to-young of LI_intra in the somatomotor network (SMN) and DAN and LI_he in DMN correlated with better speech performance, indicating a functional compensation mechanism. In contrast, stronger right-lateralized LI_intra in FPN and DAN and higher alignment-to-young of LI_he in LAN correlated with better performance in OM, suggesting a functional preservation mechanism. These findings highlight the differential roles of functional preservation and compensation of lateralization in speech perception in noise among elderly individuals with and without musical expertise, offering insights into successful aging theories from the lens of functional lateralization and speech perception.
Humans
;
Speech Perception/physiology*
;
Music
;
Male
;
Functional Laterality/physiology*
;
Female
;
Aged
;
Adult
;
Young Adult
;
Aging/physiology*
;
Middle Aged
;
Magnetic Resonance Imaging
;
Brain/physiology*
7.Influence of hearing aid on speech recognition ability, psychology and cognitive function of presbycusis.
Lin Lan JIANG ; Yue Nong JIAO ; Jin Yu WANG ; Mei Chan ZHU ; Ying LIN
Chinese Journal of Otorhinolaryngology Head and Neck Surgery 2023;58(2):160-165
Humans
;
Presbycusis
;
Speech Perception
;
Hearing Aids
;
Cognition
;
Noise
8.A multiscale feature extraction algorithm for dysarthric speech recognition.
Jianxing ZHAO ; Peiyun XUE ; Jing BAI ; Chenkang SHI ; Bo YUAN ; Tongtong SHI
Journal of Biomedical Engineering 2023;40(1):44-50
In this paper, we propose a multi-scale mel domain feature map extraction algorithm to solve the problem that the speech recognition rate of dysarthria is difficult to improve. We used the empirical mode decomposition method to decompose speech signals and extracted Fbank features and their first-order differences for each of the three effective components to construct a new feature map, which could capture details in the frequency domain. Secondly, due to the problems of effective feature loss and high computational complexity in the training process of single channel neural network, we proposed a speech recognition network model in this paper. Finally, training and decoding were performed on the public UA-Speech dataset. The experimental results showed that the accuracy of the speech recognition model of this method reached 92.77%. Therefore, the algorithm proposed in this paper can effectively improve the speech recognition rate of dysarthria.
Humans
;
Dysarthria/diagnosis*
;
Speech
;
Speech Perception
;
Algorithms
;
Neural Networks, Computer
9.Intervention effects of bone conduction hearing aids in patients with single-sided deafness and asymmetric hearing loss.
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2023;37(11):927-933
The incidence of single-sided deafness(SSD) is increasing year by year. Due to the hearing defects of one ear, the ability of sound localization, speech recognition in noise, and quality of life of patients with single-sided deafness will be affected to varying degrees. This article reviews the intervention effects of different types of bone conduction hearing aids in patients with single-sided deafness and asymmetric hearing loss, and the differences of intervention effects between bone conduction hearing aids, contralateral routing of signal(CROS) aids, and cochlea implant(CI), to provide a reference for the auditory intervention and clinical treatment of single-sided deafness and asymmetric hearing loss.
Humans
;
Quality of Life
;
Bone Conduction
;
Hearing Loss, Unilateral/therapy*
;
Speech Perception
;
Hearing Aids
;
Hearing Loss
;
Sound Localization
;
Deafness
;
Treatment Outcome
10.Analysis of rehabilitation effects of cochlear implantation in elderly patients with prelingual deafness.
Haijuan WU ; Tongli LI ; Guodong LI ; Jingjing HUO
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2023;37(6):478-482
Objective:The auditory and speech rehabilitation effects were assessed by the Categories of Auditory Performance(CAP) and the speech intelligibility rating scale(SIR) after cochlear implantation(CI) in prelingually elderly patients by telephone follow-up or face-to-face conversation. Methods:The clinical data of the prelingually deaf patients who underwent unilateral CI in the Department of Otorhinolaryngology and Head and Neck Surgery, Shanxi People's Hospital, from December 2016 to December 2021 were collected. Thirty-eight patients were divided into Group A(SIR 1, 17 cases), Group B(SIR 2, 10 cases) and Group C(SIR 3, 11 cases) according to the preoperative SIR Score. Nineteen patients with post-lingual hearing impairment were selected as the control group(Group D, 19 cases). The effects of hearing and speech rehabilitation were evaluated using CAP and SIR Scores before surgery, 6 months after startup, and 1 year after startup. Results:There were no significant differences in CAP scores among the three groups of patients with prelingually deaf patients at 6 months and 1 year after startup(P>0.05), but there were significant differences between group A and group D at 6 months and 1 year after startup(P<0.05); the SIR Score of group A had statistical difference before surgery and 6 months after startup(P<0.05), group B had statistical difference before surgery and 1 year after startup(P<0.05), and group C and D had no statistical difference before surgery and 6 months and 1 year after startup, respectively(P>0.05). Conclusion:For the prelingually deaf elderly patients, hearing will develop rapidly 6 months after startup, and the effect of postoperative auditory rehabilitation was positively correlated with the preoperative speech ability. In the aspect of speech, the prelingually dear elderly patients who have poor preoperative speech ability could benefit more from CI early after surgery. CI is not contraindicated in prelingually deaf elderly patients, even those with poor preoperative speech function.
Humans
;
Aged
;
Cochlear Implantation/methods*
;
Cochlear Implants
;
Speech Perception
;
Deafness/rehabilitation*
;
Hearing Tests
;
Speech Intelligibility
;
Treatment Outcome

Result Analysis
Print
Save
E-mail