Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings

Bhati, Saurabhchand and Nayak, Shekhar and Kodukula, Sri Rama Murty and Dehak, Najim (2019) Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings. In: 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH, 15-19 September 2019, Graz, Austria.

Full text not available from this repository. (Request a copy)


Unsupervised discovery of acoustic units from the raw speech signal forms the core objective of zero-resource speech processing. It involves identifying the acoustic segment boundaries and consistently assigning unique labels to acoustically similar segments. In this work, the possible candidates for segment boundaries are identified in an unsupervised manner from the kernel Gram matrix computed from the Mel-frequency cepstral coefficients (MFCC). These segment boundary candidates are used to train a siamese network, that is intended to learn embeddings that minimize intrasegment distances and maximize the intersegment distances. The siamese embeddings capture phonetic information from longer contexts of the speech signal and enhance the intersegment discriminability. These properties make the siamese embeddings better suited for acoustic segmentation and clustering than the raw MFCC features. The Gram matrix computed from the siamese embeddings provides unambiguous evidence for boundary locations. The initial candidate boundaries are refined using this evidence, and siamese embeddings are extracted for the new acoustic segments. A graph growing approach is used to cluster the siamese embeddings, and a unique label is assigned to acoustically similar segments. The performance of the proposed method for acoustic segmentation and clustering is evaluated on Zero Resource 2017 database.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Representation learning, Siamese network, Spoken term discovery, Zero resource speech processing, Indexed in Scopus
Subjects: Electrical Engineering
Divisions: Department of Electrical Engineering
Depositing User: Team Library
Date Deposited: 18 Nov 2019 05:46
Last Modified: 18 Nov 2019 05:46
URI: http://raiith.iith.ac.in/id/eprint/7025
Publisher URL: http://doi.org/10.21437/Interspeech.2019-2981
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 7025 Statistics for this ePrint Item