1.Neural network for auditory speech enhancement featuring feedback-driven attention and lateral inhibition.
Yudong CAI ; Xue LIU ; Xiang LIAO ; Yi ZHOU
Journal of Biomedical Engineering 2025;42(1):82-89
The processing mechanism of the human brain for speech information is a significant source of inspiration for the study of speech enhancement technology. Attention and lateral inhibition are key mechanisms in auditory information processing that can selectively enhance specific information. Building on this, the study introduces a dual-branch U-Net that integrates lateral inhibition and feedback-driven attention mechanisms. Noisy speech signals input into the first branch of the U-Net led to the selective feedback of time-frequency units with high confidence. The generated activation layer gradients, in conjunction with the lateral inhibition mechanism, were utilized to calculate attention maps. These maps were then concatenated to the second branch of the U-Net, directing the network's focus and achieving selective enhancement of auditory speech signals. The evaluation of the speech enhancement effect was conducted by utilising five metrics, including perceptual evaluation of speech quality. This method was compared horizontally with five other methods: Wiener, SEGAN, PHASEN, Demucs and GRN. The experimental results demonstrated that the proposed method improved speech signal enhancement capabilities in various noise scenarios by 18% to 21% compared to the baseline network across multiple performance metrics. This improvement was particularly notable in low signal-to-noise ratio conditions, where the proposed method exhibited a significant performance advantage over other methods. The speech enhancement technique based on lateral inhibition and feedback-driven attention mechanisms holds significant potential in auditory speech enhancement, making it suitable for clinical practices related to artificial cochleae and hearing aids.
Humans
;
Attention/physiology*
;
Speech Perception/physiology*
;
Neural Networks, Computer
;
Speech
;
Noise
;
Feedback
2.Research on bimodal emotion recognition algorithm based on multi-branch bidirectional multi-scale time perception.
Peiyun XUE ; Sibin WANG ; Jing BAI ; Yan QIANG
Journal of Biomedical Engineering 2025;42(3):528-536
Emotion can reflect the psychological and physiological health of human beings, and the main expression of human emotion is voice and facial expression. How to extract and effectively integrate the two modes of emotion information is one of the main challenges faced by emotion recognition. In this paper, a multi-branch bidirectional multi-scale time perception model is proposed, which can detect the forward and reverse speech Mel-frequency spectrum coefficients in the time dimension. At the same time, the model uses causal convolution to obtain temporal correlation information between different scale features, and assigns attention maps to them according to the information, so as to obtain multi-scale fusion of speech emotion features. Secondly, this paper proposes a two-modal feature dynamic fusion algorithm, which combines the advantages of AlexNet and uses overlapping maximum pooling layers to obtain richer fusion features from different modal feature mosaic matrices. Experimental results show that the accuracy of the multi-branch bidirectional multi-scale time sensing dual-modal emotion recognition model proposed in this paper reaches 97.67% and 90.14% respectively on the two public audio and video emotion data sets, which is superior to other common methods, indicating that the proposed emotion recognition model can effectively capture emotion feature information and improve the accuracy of emotion recognition.
Humans
;
Emotions
;
Algorithms
;
Facial Expression
;
Time Perception
;
Neural Networks, Computer
;
Speech
3.Analysis on trend of hearing changes in infants with p.V37I mutation in GJB2 gene at different months of age.
Shan GAO ; Cheng WEN ; Yiding YU ; Yue LI ; Lin DENG ; Yu RUAN ; Jinge XIE ; Lihui HUANG
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(1):10-18
Objective:To explore the trend of hearing changes in infants with GJB2 gene p.V37I mutation at different months. Methods:The subjects were 54 children(108 ears) with p.V37I homozygous or compound heterozygous mutation in GJB2 gene. All the subjects underwent auditory brainstem response, auditory steady-state response, acoustic immittance and other audiological tests. Children were divided into three groups according to their age, 26 cases in group A were ≤3 months old, 17 cases in group B were>3~≤6 months old, and 11 cases in group C were>6 months old. Statistical analysis was performed on the three groups of ABR response threshold, hearing degree, the ASSR average response threshold of four frequencies and the ASSR response thresholds for each frequency of 500, 1 000, 2 000 and 4 000 Hz. Results:Among the 54 cases, 35 were male and 19 were female, with an age rang of 2-27 months and a median age of 4 months. The ABR response threshold of the three groups were ranked from low to high as group A, group B and group C, and the difference was statistically significant(P<0.05). The ABR response thresholds of the three groups were ranked from low to high as group A, group B, and group C. The comparison between groups showed that the ABR response thresholds of group C was higher than that of group A(P=0.006). The proportion of confirmed hearing loss in the three groups was 34.61%, 50.00% and 63.64%, respectively, and the difference of hearing level among the three groups was statistically significant(P<0.05). The comparison between groups showed that the difference between group A and group C was statistically significant(P=0.012), normal hearing accounted for the highest proportion in group A(65.39%), while mild hearing loss accounted for the highest proportion in group C(45.46%). The ASSR average response thresholds of the four frequencies in the three groups were ranked from low to high as group A, group B and group C, and the difference is statistically significant(P<0.05). The comparison between groups showed that response ASSR thresholds of group C was higher than that of group A(P=0.002). Response thresholds of ASSR in each frequency in the three groups were all ranked from low to high as in group A, group B and group C, and the differences were statistically significant(P<0.05). Compared with each other between groups, response ASSR thresholds of group C was higher than those of group A(P=0.003) and group B(P=0.015) at 500 Hz, while response ASSR thresholds of group C was higher than group A at 1 000 Hz(P=0.010) and 2 000 Hz(P<0.001), and there was no statistical difference at 4 000 Hz. Conclusion:The incidence of hearing loss in GJB2 gene p.V37I mutation increased with age, and the degree of hearing loss increased, the hearing progression was mainly 500, 1 000 and 2 000 Hz suggesting regular follow-up and alert to hearing changes.
Humans
;
Connexin 26
;
Male
;
Female
;
Infant
;
Child, Preschool
;
Mutation
;
Evoked Potentials, Auditory, Brain Stem
;
Connexins/genetics*
;
Auditory Threshold
;
Hearing/genetics*
;
Hearing Loss/genetics*
4.Amplification effect of hearing mechanics in unilateral hearing loss.
Quanran LIN ; Kai FANG ; Wendi SHI ; Yuan WANG ; Shihua ZHA ; Yang LI ; Yonghua WANG ; Zhengnong CHEN
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(3):239-242
Objective:To evaluate the effectiveness of amplification intervention with hearing aids for restoring binaural auditory function in patients with unilateral moderate to severe sensorineural hearing loss. Methods:This study selected 30 patients with normal hearing in one ear and moderate to severe sensorineural hearing loss in the other ear. They were fitted with hearing aids for the worse ear and underwent more than half a year and one year of adaptation training. The Chinese translation of the Twelve-item version of SSQ(C-SSQ12), angle identification test, speech recognition score(SRS) at different signal-to-noise ratios(SNR=5 and SNR=10) and audiometric thresholds were used to compare the results before and after hearing aid use to evaluate the effectiveness of the unilateral hearing loss intervention. Results:The results of the audiometric thresholds, C-SSQ12 scores, angle identification test, and SRS at SNR=5 and SNR=10 in the worse ear of the unilateral hearing loss patients after hearing aid use were all statistically significant compared to before hearing aid use(P<0.01). Conclusion:Amplification intervention with hearing aids has significant effects on restoring binaural auditory function in patients with unilateral moderate to severe sensorineural hearing loss.
Humans
;
Hearing Aids
;
Hearing Loss, Unilateral/therapy*
;
Middle Aged
;
Hearing Loss, Sensorineural/rehabilitation*
;
Adult
;
Female
;
Male
;
Auditory Threshold
;
Young Adult
;
Aged
5.Perception of Mandarin aspirated/unaspirated consonants in children with cochlear implants.
Yani LI ; Qun LI ; Jian WEN ; Lin LI ; Yun ZHENG
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(4):312-318
Objective:This study aims to investigate the perception of Mandarin aspirated and unaspirated consonants by children with cochlear implants (CIs) under quiet and noisy conditions. It also examines factors that may affect their acquisition, such as auditory conditions, place of articulation, manner of articulation, chronological age, age at implantation, and non-verbal intelligence. Methods:Twenty-eight CI children aged 3 to 5 years who received implantation from 2018 to 2023 were recruited. Additionally, 88 peers with normal hearing (NH) were recruited as controls. Both groups participated in a perception test for aspirated/unaspirated consonants under quiet and noisy conditions, along with tests for speech recognition, speech production, and non-verbal intelligence. The study analyzed the effects of group (CI vs. NH), auditory condition, and consonant characteristics on children's perception of aspirated/unaspirated consonants in Mandarin, as well as the factors contributing to CI children's acquisition of these consonants. Results:①CI children's ability to perceive aspirated/unaspirated consonants was significantly poorer than that of their NH peers (χ²= 14.16, P<0.01), and their perception accuracy was influenced by the acoustic features of consonants (P<0.01); ②CI children's consonant perception abilities were adversely affected by noise (P<0.01), with accuracy in noisy conditions particularly influenced by the manner of articulation (P<0.05); ③The age at implantation significantly affected CI children's ability to perceive aspirated/unaspirated consonants (β= -0.223, P=0.012), with earlier implantation associated with better performance. Conclusion:It takes time for CI children to acquire Mandarin aspirated/unaspirated consonants, and early implantation shows many advantages, especially for the perception ability of fine speech features.
Humans
;
Cochlear Implants
;
Child, Preschool
;
Speech Perception
;
Cochlear Implantation
;
Male
;
Female
;
Language
6.Comparison and study of multiple scales results in children with cochlear reimplantation, mainly the speech, spatial, and other qualities of hearing scale for parents.
Tian NI ; Jinyuan SI ; Haotian LIU ; Xinyi YAO ; Xiangling ZHANG ; Huilin YIN ; Lin ZHANG ; Xiuyong DING ; Yu ZHAO
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(5):433-442
Objective:To compare the outcomes of multiple scales, primarily the speech, spatial, and other qualities of hearing scale for parents(SSQ-P), in children with ipsilateral vs. Contralateral cochleareimplantat ion(CRI). Methods: A total of 69 children who received cochlear implantation surgery from April 1999 to June 2024 were included. Patients were divided into two groups based on whether the implantation was on the same side. General information such as gender, age, age at initial implantation and reimplantation was collected. The primary caregivers of the children were followed up by telephone using the categories of auditory performance(CAP), speech intelligibility rating(SIR), and SSQ-P questionnaires. Statistical methods including stepwise regression, linear regression, and permutation tests were employed to investigate if there were any statistically significant differences in the scores of CAP, SIR, SSQ-P total, SSQ-P speech perception, SSQ-P spatial hearing, and SSQ-P auditory quality dimensions between the ipsilateral and contralateral reimplantation groups. Results:Of the 69 children included, 62 were in the ipsilateral reimplantation group with a mean age of 11.1 years, and 7 were in the contralateral reimplantation group with a mean age of 11.7 years. Statistical analysis showed that patients in the contralateral reimplantation group had significantly lower SSQ-P total scores (P<0.05) and spatial hearing dimension scores (P<0.05) than those in the ipsilateral reimplantation group after controlling for the corresponding confounders. Conclusion:The effect of ipsilateral reimplantation of cochlear implants is superior to that of contralateral reimplantation in terms of overall auditory function and spatial hearing in daily life for children, but the mechanisms require further investigation.
Humans
;
Cochlear Implantation
;
Child
;
Parents
;
Speech Perception
;
Male
;
Cochlear Implants
;
Female
;
Hearing
;
Surveys and Questionnaires
;
Speech
;
Child, Preschool
7.Analyzing the factors influencing speech recognition ability in patients with age-related hearing loss.
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(7):657-666
Objective:To explore various factors influencing speech recognition ability in patients with age-related hearing loss(ARHL) and to investigate the correlation between speech recognition ability and cognitive function. Methods:This case-control study enrolled 150 ARHL patients(experimental group) and 132 normal-hearing controls. Participants underwent relevant assessments of auditory function, cognitive function, and tinnitus severity. Various statistical analyses were performed to evaluate the results. Results:①The PBmax and MoCA scores were significantly lower in the ARHL group compared to the control group(P<0.05). ②PBmax in the ARHL group was significantly influenced by multiple factors(P<0.05). ③Negative correlations were observed between PBmax in the ARHL group and age, degree of hearing loss, duration of the disease, duration of the worst hearing loss, smoking status, and tinnitus severity(P<0.05), while positive correlations were found between PBmax and education level, occupation type, frequency of verbal communication, and cognitive function level(P<0.05). ④Higher education level, frequent verbal communication, and high cognitive function level were protective factors for PBmax in ARHL patients(P<0.05), whereas the other factors were independent risk factors(P<0.05). ⑤A significant correlation was found between PBmax and MoCA scores in the ARHL group, and this correlation between cognitive function and speech recognition ability remained significant across different degrees of hearing loss(<0.05). Conclusion:Speech recognition ability in ARHL patients is influenced by multiple factors. Cognitive function demonstrates a robust, bidirectional association with speech recognition ability, even after adjusting for hearing loss severity.
Humans
;
Case-Control Studies
;
Middle Aged
;
Male
;
Female
;
Aged
;
Speech Perception
;
Cognition
;
Presbycusis/physiopathology*
;
Adult
;
Hearing Loss
8.Prediction of hearing change in children with enlarged vestibular aqueduct with different genotypes by linear mixed-effects model.
Lin DENG ; Lihui HUANG ; Xiaohua CHENG ; Yiding YU ; Yue LI ; Shan GAO ; Yu RUAN ; Jinge XIE
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(8):717-723
Objective:To explore the hearing changes of children with different genotypes of SLC26A4 with enlarged vestibular aqueduct(EVA) using the linear mixed effect model(LMM), providing evidence for the risk prediction of progressive hearing loss. Methods:A total of 48 children with EVA diagnosed in our hospital from January 2017 to January 2024. All subjects underwent two or more auditory tests. According to the results of deafness gene screening and sequencing, the genotypes are divided into: type A: homozygous mutation of c. 919-2A>G, type B: compound heterozygous or heterozygous mutation containing c. 919-2A>G, and type C: no mutation site of c. 919-2A>G of SLC26A4 gene. LMM was used to analyze the hearing thresholds change of 500 Hz, 1 000 Hz, 2 000 Hz, 4 000 Hz and the average in children with different genotypes with age. Results:A total of 92 ears, 314 audiograms of 48 children were included, the median number of audiograms was 3, the median age of initial diagnosis was 4 months, and the median follow-up time was 13 months. According to LMM, the standard deviation of random effects between patients and ears was large. There was no significant difference in hearing thresholds of different frequencies and the average in genotype A, genotype B, and genotype C, indicating that genotype had no effect on hearing threshold. There is an interaction between age and genotype. Taking genotype C as the reference, children with genotype B had the lowest increase in 500 Hz, 1000 Hz, and the average hearing threshold, followed by type A. Conclusion:EVA children exhibit substantial inter-individual/ear hearing threshold variability. Low-frequency thresholds progress slower than high frequencies. Genotype modulates progression rates, with wild-type(Type C) demonstrating fastest deterioration, supporting personalized auditory monitoring strategies.
Humans
;
Vestibular Aqueduct/abnormalities*
;
Genotype
;
Sulfate Transporters
;
Mutation
;
Auditory Threshold
;
Hearing Loss, Sensorineural/genetics*
;
Male
;
Female
;
Child
;
Child, Preschool
;
Hearing Loss/genetics*
;
Hearing Tests
;
Linear Models
;
Infant
9.Application of P1 response threshold of cortical auditory evoked potential in rehabilitation evaluation of young children with cochlear implant.
Hui JI ; Yaofeng JIANG ; Fei ZHONG ; Baona LI ; Ye FAN ; Shiyu TAO ; Liping MENG
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(10):962-966
Objective:To explore the application value of P1 response threshold of cortical auditory evoked potential(CAEP) in evaluating the rehabilitation effect of cochlear implant in young children. Methods:Thirty-three young children after cochlear implantation were divided into groups according to hearing age: Group A(hearing age 1-<2 years old) 10 people; Group B(hearing age 2-<3 years old) 13 people; Group C(hearing age 3-<4 years old) 10 people. The subjective assessment was carried out using the assessment tool for hearing-impaired children- "Criteria and Methods for assessing Auditory and language ability of hearing-impaired children" and objective electrophysiological examination was carried out using CAEP to evaluate the rehabilitation effect. SPSS 25.0 software was used for statistical analysis. Results:The results of subjective assessment of auditory ability and language ability in each group showed an increasing trend with the increase of auditory age. In this study, the P1 response threshold of CAEP in CI implanted children had a significant positive correlation with the 2 kHz hearing threshold after intervention, and the P1 response threshold of CAEP was negatively correlated with many items in subjective auditory ability and language ability assessment. Conclusion:The P1 response threshold of CAEP has a stable correlation with the results of speech audiometry, which can effectively and objectively evaluate the postoperative rehabilitation effect of young children with cochlear implantation.
Humans
;
Child, Preschool
;
Infant
;
Male
;
Female
;
Evoked Potentials, Auditory
;
Cochlear Implantation/rehabilitation*
;
Cochlear Implants
;
Auditory Threshold
10.Rhythm Facilitates Auditory Working Memory via Beta-Band Encoding and Theta-Band Maintenance.
Suizi TIAN ; Yu-Ang CHENG ; Huan LUO
Neuroscience Bulletin 2025;41(2):195-210
Rhythm, as a prominent characteristic of auditory experiences such as speech and music, is known to facilitate attention, yet its contribution to working memory (WM) remains unclear. Here, human participants temporarily retained a 12-tone sequence presented rhythmically or arrhythmically in WM and performed a pitch change-detection task. Behaviorally, while having comparable accuracy, rhythmic tone sequences showed a faster response time and lower response boundaries in decision-making. Electroencephalographic recordings revealed that rhythmic sequences elicited enhanced non-phase-locked beta-band (16 Hz-33 Hz) and theta-band (3 Hz-5 Hz) neural oscillations during sensory encoding and WM retention periods, respectively. Importantly, the two-stage neural signatures were correlated with each other and contributed to behavior. As beta-band and theta-band oscillations denote the engagement of motor systems and WM maintenance, respectively, our findings imply that rhythm facilitates auditory WM through intricate oscillation-based interactions between the motor and auditory systems that facilitate predictive attention to auditory sequences.
Humans
;
Memory, Short-Term/physiology*
;
Male
;
Beta Rhythm/physiology*
;
Female
;
Theta Rhythm/physiology*
;
Young Adult
;
Auditory Perception/physiology*
;
Adult
;
Electroencephalography
;
Acoustic Stimulation
;
Reaction Time/physiology*
;
Brain/physiology*
;
Attention/physiology*

Result Analysis
Print
Save
E-mail