Spoken Term Detection in Continuous Speech

Rout, K (2014) Spoken Term Detection in Continuous Speech. Masters thesis, Indian Institute of Technology Hyderabad.

Full text not available from this repository. (Request a copy)


This thesis aims a speaker independent spoken term detection (STD) using supervised technique. The goal of STD is to retrieve the occurrence of the user-spoken-term from the given speech database. An MLP is trained in a supervised manner using labeled speech data from a large number of speakers. The trained multi-layer perceptron (MLP) is used to generate phoneme posterior features, i.e., conditional probability of each phoneme for every frame in the speech utterance. The dimension of the posterior feature depends on the number of phoneme classes considered during training. The sequence posterior features obtained from the test utterance are matched with those obtained from query word using subsequence dynamic-time warping (subDTW). The distance along the bestaligned path is used to make decision on presence/absence of the query word in the given test utterance. The performance of the proposed method is evaluated on Telugu broadcast news database collected from several television channels. It is observed that performance of posterior features is signi cantly better than the conventional mel-frequency cepstral coecients (MFCCs) features. A comparison study is done using both supervised and unsupervised techniques. The performance of the supervised methods like MLP improves signi cant amount compared to unsupervised methods like Gaussian Mixture Model (GMM). Performance accuracy of the STD is signi cantly improved by supervised method compared to unsupervised method. E ects of two kinds of query words are analyzed - those recorded in isolation and those cut out from continuous speech. As the duration of the phonemes in the query word greatly vary between these two mode, the sequence matching technique subDTW plays an important role to nd the true hits. This can be achieved by taking di erent local weights in subDTW for di erent recording modes. Experiments are conducted with respect to the query words recorded in isolated manner and words cut out from continuous speech. It is found that the isolated query detection performed worse than detection of query cut out of continuous speech, owing to the channels mismatch and lack of disparities in terms of number of frames.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Item Type: Thesis (Masters)
Uncontrolled Keywords: TD246
Subjects: Others > Electricity
Divisions: Department of Electrical Engineering
Depositing User: Team Library
Date Deposited: 29 Dec 2014 11:09
Last Modified: 08 Jul 2015 09:33
URI: http://raiith.iith.ac.in/id/eprint/1272
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 1272 Statistics for this ePrint Item