1.Speech Perception in Older Listeners with Normal Hearing:Conditions of Time Alteration, Selective Word Stress, and Length of Sentences.
Soojin CHO ; Jyaehyoung YU ; Hyungi CHUN ; Hyekyung SEO ; Woojae HAN
Korean Journal of Audiology 2014;18(1):28-33
BACKGROUND AND OBJECTIVES: Deficits of the aging auditory system negatively affect older listeners in terms of speech communication, resulting in limitations to their social lives. To improve their perceptual skills, the goal of this study was to investigate the effects of time alteration, selective word stress, and varying sentence lengths on the speech perception of older listeners. SUBJECTS AND METHODS: Seventeen older people with normal hearing were tested for seven conditions of different time-altered sentences (i.e., +/-60%, +/-40%, +/-20%, 0%), two conditions of selective word stress (i.e., no-stress and stress), and three different lengths of sentences (i.e., short, medium, and long) at the most comfortable level for individuals in quiet circumstances. RESULTS: As time compression increased, sentence perception scores decreased statistically. Compared to a natural (or no stress) condition, the selectively stressed words significantly improved the perceptual scores of these older listeners. Long sentences yielded the worst scores under all time-altered conditions. Interestingly, there was a noticeable positive effect for the selective word stress at the 20% time compression. CONCLUSIONS: This pattern of results suggests that a combination of time compression and selective word stress is more effective for understanding speech in older listeners than using the time-expanded condition only.
Aging
;
Auditory Perception
;
Hearing
;
Speech Perception*
2.The review of categorization features of tone perception.
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2015;29(15):1396-1400
Categorical perception (CP) is the unique phenomenon that gradually morphed physical feature in a stimulus continuum tends to be perceived as discrete representations. CP has been evidenced in several modalities in the sensor perception. The first study of CP of phonetic perception was performed in 1957. However, the early CP studies focused on segmental features. The first study of CP of pitch contours was performed until 1976. This article will review the results of previous studies focus on the categorical perception applies to the lexical tone perception.
Humans
;
Language
;
Phonetics
;
Speech Perception
3.Temporal Processing in the Auditory System.
Korean Journal of Otolaryngology - Head and Neck Surgery 2011;54(9):585-591
The auditory system recognized sound waves. The sound waves are longitudinal waves in the air, where the pressure varies in time. It is distinguished as the rapid pressure changes, referred to as 'fine structure', and slower overall changes of amplitude fluctuations, as 'envelope'. The auditory system has a limited ability to follow the time-varying envelope, and this ability is known as 'temporal resolution'. Our auditory system analyzes sound waves in frequency, intensity, and time domain. The understanding about frequency and intensity domain is relatively easy compare to time domain. Hearing threshold is measured by sound intensity in frequency domain. However the speech discrimination and understanding of the sentence in quiet and noise are associated with temporal resolution. So for the comprehensive understanding about the auditory system and hearing ability, we must extend our knowledge to the temporal ability of the auditory system.
Hearing
;
Noise
;
Sound
;
Speech Perception
4.Pure tone hearing threshold and maximum speech discrimination scorein sensori-neural hearing loss.
Korean Journal of Otolaryngology - Head and Neck Surgery 1992;35(2):242-247
No abstract available.
Hearing Loss*
;
Hearing*
;
Speech Perception*
5.Comparison of Speech Rate and Long-Term Average Speech Spectrum between Korean Clear Speech and Conversational Speech
Jeeun YOO ; Hongyeop OH ; Seungyeop JEONG ; In Ki JIN
Journal of Audiology & Otology 2019;23(4):187-192
BACKGROUND AND OBJECTIVES: Clear speech is an effective communication strategy used in difficult listening situations that draws on techniques such as accurate articulation, a slow speech rate, and the inclusion of pauses. Although too slow speech and improperly amplified spectral information can deteriorate overall speech intelligibility, certain amplitude of increments of the mid-frequency bands (1 to 3 dB) and around 50% slower speech rates of clear speech, when compared to those in conversational speech, were reported as factors that can improve speech intelligibility positively. The purpose of this study was to identify whether amplitude increments of mid-frequency areas and slower speech rates were evident in Korean clear speech as they were in English clear speech. SUBJECTS AND METHODS: To compare the acoustic characteristics of the two methods of speech production, the voices of 60 participants were recorded during conversational speech and then again during clear speech using a standardized sentence material. RESULTS: The speech rate and long-term average speech spectrum (LTASS) were analyzed and compared. Speech rates for clear speech were slower than those for conversational speech. Increased amplitudes in the mid-frequency bands were evident for the LTASS of clear speech. CONCLUSIONS: The observed differences in the acoustic characteristics between the two types of speech production suggest that Korean clear speech can be an effective communication strategy to improve speech intelligibility.
Acoustics
;
Rehabilitation
;
Speech Acoustics
;
Speech Intelligibility
;
Speech Perception
;
Voice
6.Research progress of microphone array based front-end speech enhancement technology for cochlear implant.
Yousheng CHEN ; Weifang CHEN ; Pu ZHANG ; Peipei CHEN
Journal of Biomedical Engineering 2019;36(4):696-704
Microphone array based methods are gradually applied in the front-end speech enhancement and speech recognition improvement for cochlear implant in recent years. By placing several microphones in different locations in space, this method can collect multi-channel signals containing a lot of spatial position and orientation information. Microphone array can also yield specific beamforming mode to enhance desired signal and suppress ambient noise, which is particularly suitable to be applied in face-to-face conversation for cochlear implant users. And its application value has attracted more and more attention from researchers. In this paper, we describe the principle of microphone array method, analyze the microphone array based speech enhancement technologies in present literature, and further present the technical difficulties and development trend.
Cochlear Implantation
;
Cochlear Implants
;
Humans
;
Speech
;
Speech Perception
7.Research of front-end speech enhancement and beamforming algorithm based on dual microphoneforcochlear implant.
Journal of Biomedical Engineering 2019;36(3):468-477
Speech enhancement methods based on microphone array adopt many microphones to record speech signal simultaneously. As spatial information is increased, these methods can increase speech recognition for cochlear implant in noisy environment. Due to the size limitation, the number of microphones used in the cochlear implant cannot be too large, which limits the design of microphone array beamforming. To balance the size limitation of cochlear implant and the spatial orientation information of the signal acquisition, we propose a speech enhancement and beamforming algorithm based on dual thin uni-directional / omni-directional microphone pairs (TP) in this paper. Each TP microphone contains two sound tubes for signal acquisition, which increase the overall spatial orientation information. In this paper, we discuss the beamforming characteristics with different gain vectors and the influence of the inter-microphone distance on beamforming, which provides valuable theoretical analysis and engineering parameters for the application of dual microphone speech enhancement technology in cochlear implants.
Algorithms
;
Cochlear Implants
;
Equipment Design
;
Humans
;
Noise
;
Speech
;
Speech Perception
8.Psychosis speech recognition algorithm based on deep embedded sparse stacked autoencoder and manifold ensemble.
Yi ZHANG ; Xiaolin QIN ; Yuan LIN ; Yongming LI ; Pin WANG ; Zuwei ZHANG ; Xiaofei LI
Journal of Biomedical Engineering 2021;38(4):655-662
Speech feature learning is the core and key of speech recognition method for mental illness. Deep feature learning can automatically extract speech features, but it is limited by the problem of small samples. Traditional feature extraction (original features) can avoid the impact of small samples, but it relies heavily on experience and is poorly adaptive. To solve this problem, this paper proposes a deep embedded hybrid feature sparse stack autoencoder manifold ensemble algorithm. Firstly, based on the prior knowledge, the psychotic speech features are extracted, and the original features are constructed. Secondly, the original features are embedded in the sparse stack autoencoder (deep network), and the output of the hidden layer is filtered to enhance the complementarity between the deep features and the original features. Third, the L1 regularization feature selection mechanism is designed to compress the dimensions of the mixed feature set composed of deep features and original features. Finally, a weighted local preserving projection algorithm and an ensemble learning mechanism are designed, and a manifold projection classifier ensemble model is constructed, which further improves the classification stability of feature fusion under small samples. In addition, this paper designs a medium-to-large-scale psychotic speech collection program for the first time, collects and constructs a large-scale Chinese psychotic speech database for the verification of psychotic speech recognition algorithms. The experimental results show that the main innovation of the algorithm is effective, and the classification accuracy is better than other representative algorithms, and the maximum improvement is 3.3%. In conclusion, this paper proposes a new method of psychotic speech recognition based on embedded mixed sparse stack autoencoder and manifold ensemble, which effectively improves the recognition rate of psychotic speech.
Algorithms
;
Databases, Factual
;
Humans
;
Psychotic Disorders
;
Speech
;
Speech Perception
9.The neural encoding of continuous speech - recent advances in EEG and MEG studies.
Xun-Yi PAN ; Jia-Jie ZOU ; Pei-Qing JIN ; Nai DING
Acta Physiologica Sinica 2019;71(6):935-945
Speech comprehension is a central cognitive function of the human brain. In cognitive neuroscience, a fundamental question is to understand how neural activity encodes the acoustic properties of a continuous speech stream and resolves multiple levels of linguistic structures at the same time. This paper reviews the recently developed research paradigms that employ electroencephalography (EEG) or magnetoencephalography (MEG) to capture neural tracking of acoustic features or linguistic structures of continuous speech. This review focuses on two questions in speech processing: (1) The encoding of continuously changing acoustic properties of speech; (2) The representation of hierarchical linguistic units, including syllables, words, phrases and sentences. Studies have found that the low-frequency cortical activity tracks the speech envelope. In addition, the cortical activities on different time scales track multiple levels of linguistic units and constitute a representation of hierarchically organized linguistic units. The article reviewed these studies, which provided new insights into the processes of continuous speech in the human brain.
Acoustic Stimulation
;
Electroencephalography
;
Humans
;
Magnetoencephalography
;
Speech
;
physiology
;
Speech Perception
10.A multiscale feature extraction algorithm for dysarthric speech recognition.
Jianxing ZHAO ; Peiyun XUE ; Jing BAI ; Chenkang SHI ; Bo YUAN ; Tongtong SHI
Journal of Biomedical Engineering 2023;40(1):44-50
In this paper, we propose a multi-scale mel domain feature map extraction algorithm to solve the problem that the speech recognition rate of dysarthria is difficult to improve. We used the empirical mode decomposition method to decompose speech signals and extracted Fbank features and their first-order differences for each of the three effective components to construct a new feature map, which could capture details in the frequency domain. Secondly, due to the problems of effective feature loss and high computational complexity in the training process of single channel neural network, we proposed a speech recognition network model in this paper. Finally, training and decoding were performed on the public UA-Speech dataset. The experimental results showed that the accuracy of the speech recognition model of this method reached 92.77%. Therefore, the algorithm proposed in this paper can effectively improve the speech recognition rate of dysarthria.
Humans
;
Dysarthria/diagnosis*
;
Speech
;
Speech Perception
;
Algorithms
;
Neural Networks, Computer