Representation Learning for Spoken term Detection

Reddy, Pappagari R (2015) Representation Learning for Spoken term Detection. Masters thesis, Indian Institute of Technology Hyderabad.

[img] Text
EE12M1023.pdf - Submitted Version
Restricted to Registered users only until 15 July 2018.

Download (2MB) | Request a copy

Abstract

Spoken Term Detection (STD) is the task of searching a given spoken query word in large speech database. Applications of STD include speech data indexing, voice dialling, telephone monitoring and data mining. Performance of STD depends mainly on representation of speech signal and matching of represented signal. This work investigates methods for robust representation of speech signal, which is invariant to speaker variability, in the context of STD task. Here the representation is in the form of templates, a sequence of feature vectors. Typical representation in speech community Mel-Frequency CepstralCoe cients (MFCC) carry both speech-specific and speaker-specific information, so the need for better representation. Searching is done by matching sequence of feature vectors of query and reference utterances by using Subsequence Dynamic Time Warping (DTW). The performance of the proposed representation is evaluated on Telugu broadcast news data. In the absence of labelled data i.e., in unsupervised setting, we propose to capture joint density of acoustic space spanned by MFCCs using Gaussian Mixture Models (GMM) and Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBM). Posterior features extracted from trained models are used to search the query word. It is noticed that 8% and 12% improvement in STD performance compared to MFCC by using GMM and GBRBM posterior features respectively. As transcribed data is not required, this approach is optimal solution to low-resource languages. But due to it’s intermediate performance, this method cannot be immediate solution to high resource languages

[error in script]
IITH Creators:
IITH CreatorsORCiD
Item Type: Thesis (Masters)
Uncontrolled Keywords: Spoken term Detection, Mel-Frequency Cepstral Coefficients, Gaussian Mixture Models, TD406
Subjects: Others > Electricity
Others > Electronic imaging & Singal processing
Divisions: Department of Electrical Engineering
Depositing User: Library Staff
Date Deposited: 30 Jul 2015 03:58
Last Modified: 06 Aug 2015 06:15
URI: http://raiith.iith.ac.in/id/eprint/1707
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 1707 Statistics for this ePrint Item