Improving children’s speech recognition under mismatched condition using artificial band width extension

Show simple item record Sunil Y 2018-05-30T09:33:33Z 2018-05-30T09:33:33Z 2017
dc.identifier.other ROLL NO.08610211
dc.description.abstract Children’s speech production system distinguishes itself from the adults’ by shorter vocal tract length and higher pitch value. Due to shorter vocal tract length, formant frequency values shift to higherband (3400-8000 Hz) region. The higher pitch value results in relatively more fluctuations in the spectrum compared to adults. Narrowband (NB, 300-3400 Hz) automatic speech recognition (ASR) performance of children’s speech degrades significantly due to loss of information in higher band. This work develops artificial bandwidth extension (ABWE) methods that restore higher band spectral information. The ASR is a connected digit recognition task which has models trained using adults’ speech and tested using children’s speech, termed as mismatched condition. The ABWE methods using class-specific, age-specific and delta features are developed and used in the children’s speech recognition under mismatched condition. All of them show improvement in performance. A computationally efficient architecture for mel frequency cepstral coefficients (MFCC) based ABWE for ASR is developed that avoids vocoder framework for bandwidth extension. In the proposed method, the narrowband MFCC is directly converted into wideband MFCC thus avoiding the synthesis process. Sparse representation based ABWE (SR-ABWE) algorithm is proposed using coupled dictionaries. To further enhance SR-ABWE, least square transformation has been developed to estimate wideband codes from NB interpolated codes. Existing semi-coupled dictionary learning (SCDL) method has been explored for ABWE (SC-ABWE). An improvement in the performance of SC-ABWE is observed in terms of objective quality measures. The significance of SR-ABWE is also demonstrated in children’s ASR. en_US
dc.description.sponsorship Supervisors: Rohit Sinha and S. R. Mahadeva Prasanna en_US
dc.language.iso en en_US
dc.relation.ispartofseries TH-1705;
dc.title Improving children’s speech recognition under mismatched condition using artificial band width extension en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record



My Account