Unsupervised Speech Signal-to-Symbol Transformation for Language Identification

Bhati, Saurabhchand and Nayak, Shekhar and Kodukula, Sri Rama Murty (2020) Unsupervised Speech Signal-to-Symbol Transformation for Language Identification. Circuits, Systems, and Signal Processing, 39 (10). pp. 5169-5197. ISSN 0278-081X

Full text not available from this repository. (Request a copy)


This paper presents a new approach for unsupervised segmentation and labeling of acoustically homogeneous segments from the speech signals. The virtual labels, thus obtained, are used to build unsupervised acoustic models in the absence of manual transcriptions. We refer to this approach as unsupervised speech signal-to-symbol transformation. This approach mainly involves three steps: (i) segmenting the speech signal into acoustically homogeneous regions, (ii) assigning consistent labels to the acoustic segments with similar characteristics and (iii) iterative modeling of the acoustic segments sharing the same label. This work focuses on improving initial segmentation and acoustic segment labeling. A new kernel-Gram matrix-based approach is proposed for segmentation. The number of segments is automatically determined using this approach, and performance comparable to the state-of-the-art algorithms is achieved. The segment labeling is formulated in a graph clustering framework. Graph clustering methods require extensive computational resources for large datasets. A new graph growing-based strategy is proposed to make the algorithm scalable. A two-stage iterative modeling is used to refine the segment boundaries and segment labels alternately. The proposed method achieves highest normalized mutual information and purity on TIMIT dataset. Quality assessment of the virtual labels is performed by building a language identification (LID) system for Indian languages. A bigram language model is built using these virtual phones. The LID system built using these virtual labels and corresponding language model performs very close to the system trained using manual labels and an i-vector-based LID system. The fusion of unsupervised LID system scores from our approach and the i-vector approach outperforms the LID system built under the supervision of manual labels by a relative margin of 31.19% demonstrating the effectiveness of unsupervised LID systems that can be at par with supervised systems by using virtual labels.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Article
Uncontrolled Keywords: Graph clustering, Graph growing, Language identification, Speech segmentation, Unsupervised segment labeling, Virtual phonemes
Subjects: Electrical Engineering > Wireless Communication
Electrical Engineering > Electrical and Electronic
Divisions: Department of Electrical Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 30 Mar 2021 05:04
Last Modified: 30 Mar 2021 05:04
URI: http://raiith.iith.ac.in/id/eprint/7722
Publisher URL: http://doi.org/10.1007/s00034-020-01408-8
OA policy: https://v2.sherpa.ac.uk/id/publication/15622
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 7722 Statistics for this ePrint Item