Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications

Bhati, Saurabhchand and Nayak, Shekhar and Murty, K. Sri Rama (2017) Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 20-24 August 2017, Stockholm; Sweden.

Full text not available from this repository. (Request a copy)

Abstract

Zero resource speech processing refers to a scenario where no or minimal transcribed data is available. In this paper, we propose a three-step unsupervised approach to zero resource speech processing, which does not require any other information/dataset. In the first step, we segment the speech signal into phonemelike units, resulting in a large number of varying length segments. The second step involves clustering the varying-length segments into a finite number of clusters so that each segment can be labeled with a cluster index. The unsupervised transcriptions, thus obtained, can be thought of as a sequence of virtual phone labels. In the third step, a deep neural network classifier is trained to map the feature vectors extracted from the signal to its corresponding virtual phone label. The virtual phone posteriors extracted from the DNN are used as features in the zero resource speech processing. The effectiveness of the proposed approach is evaluated on both ABX and spoken term discovery tasks (STD) using spontaneous American English and Tsonga language datasets, provided as part of zero resource 2015 challenge. It is observed that the proposed system outperforms baselines, supplied along the datasets, in both the tasks without any task specific modifications.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Conference or Workshop Item (Other)
Additional Information: ISSN: 2308457X
Uncontrolled Keywords: ABX, Deep neural network, Phonetic segmentation, Pitman-Yor language model, Spoken term discovery, Unsupervised learning
Subjects: Electrical Engineering > Wireless Communication
Electrical Engineering > Process Control
Electrical Engineering > Power System
Electrical Engineering > Automation & Control Systems
Electrical Engineering > Electrical and Electronic
Electrical Engineering > Instruments and Instrumentation
Divisions: Department of Electrical Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 25 May 2021 06:27
Last Modified: 25 May 2021 06:27
URI: http://raiith.iith.ac.in/id/eprint/7788
Publisher URL: http://doi.org/10.21437/Interspeech.2017-1476
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 7788 Statistics for this ePrint Item