Support vector machine with dynamic shifting window for continuous speech recognition

Support Vector Machine (SVM) is excellence in classification owing to its discriminative trait. Automatic speech recognition (ASR), amongst many areas in pattern recognition could well benefit from this phenomenon. Conventional method applied in continuous speech recognition (CSR) uses frame-based s...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmad, Abdul Manan, Salam, Md. Sah, Samaon, Den Fairol
Format: Conference or Workshop Item
Published: 2006
Subjects:
Online Access:http://eprints.utm.my/25604/
http://eprints.utm.my/25604/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Support Vector Machine (SVM) is excellence in classification owing to its discriminative trait. Automatic speech recognition (ASR), amongst many areas in pattern recognition could well benefit from this phenomenon. Conventional method applied in continuous speech recognition (CSR) uses frame-based statistical framework, namely the Hidden Markov Model (HMM). Frame-based approach promises accurate segmentation of the sub-word units (eg: phoneme, syllable) which eventually contributes to the recognition accuracy. Despite these advantages, as each frame size is incredibly small, the system consumes valuable time to build-up or recognizes individual word from several frames. Although segment-based HMM overcomes this hindrance, complexity of the training procedure increases as well as being higher on computational load. Our solution is to extend the size of acoustic event that expands over several frames incrementally thus making the SVM dynamic. We refer the procedure as SVM-DSW (dynamic shifting window). Recognition is determined via voting the highest posterior probability score provided by SVM for a word segment. Artifacts collected from SVM-DSW are the word segmentation. Whereas, when the score deteriorates it signifies the start of a new word hence explaining the segmentation module. Based on preliminary result on a subset of 16 Malay sentences, we manage to outperform HTK’s HMM both in terms of recognition and segmentation.