Detail publikace

Robust ASR front-end using spectral-based and discriminant features: experiments on Aurora tasks

BENITEZ, C. BURGET, L. CHEN, B. DUPONT, S. GARUDADRI, H. HERMANSKY, H. JAIN, P. KAJAREKAR, S. MORGAN, N. SIVADAS, S.

Anglický název

Robust ASR front-end using spectral-based and discriminant features: experiments on Aurora tasks

Typ

článek ve sborníku ve WoS nebo Scopus

Jazyk

en

Originální abstrakt

This paper describes an automatic speech recognition front-end that combines low-level robust ASR feature extraction tech-niques, and higher-level linear and non-linear feature transformations. The low-level algorithms use data-derived filters, mean and variance normalization of the feature vectors, and dropping of noise frames. The feature vectors are then linearly transformed using Principal Components Analysis (PCA). An Artificial Neural Network (ANN) is also used to compute features that are useful for classification of speech sounds. It is trained for phoneme probability estimation on a large corpus of noisy speech. These transformations lead to two feature streams whose vectors are concatenated and then used for speech recognition. This method was tested on the set of speech corpora used for the "Aurora" evaluation. Using the feature stream generated without the ANN yields an overall 41% reduction of the error rate over Mel-Frequency Cepstral Coefficients (MFCC) reference features. Adding the ANN stream further reduces the error rate yielding a 46% reduction over the reference features.

Anglický abstrakt

This paper describes an automatic speech recognition front-end that combines low-level robust ASR feature extraction tech-niques, and higher-level linear and non-linear feature transformations. The low-level algorithms use data-derived filters, mean and variance normalization of the feature vectors, and dropping of noise frames. The feature vectors are then linearly transformed using Principal Components Analysis (PCA). An Artificial Neural Network (ANN) is also used to compute features that are useful for classification of speech sounds. It is trained for phoneme probability estimation on a large corpus of noisy speech. These transformations lead to two feature streams whose vectors are concatenated and then used for speech recognition. This method was tested on the set of speech corpora used for the "Aurora" evaluation. Using the feature stream generated without the ANN yields an overall 41% reduction of the error rate over Mel-Frequency Cepstral Coefficients (MFCC) reference features. Adding the ANN stream further reduces the error rate yielding a 46% reduction over the reference features.

Klíčová slova anglicky

speech recognition, Aurora task

Rok RIV

2001

Vydáno

01.01.2001

Místo

Aalborg

ISBN

87-90834-09-7

Kniha

Proc. EUROSPEECH

Počet stran

4

BIBTEX


@inproceedings{BUT3689,
  author="Carmen {Benitez} and Lukáš {Burget} and Barry {Chen} and Stephane {Dupont} and Harinath {Garudadri} and Hynek {Hermansky} and Pratibha {Jain} and Sachin {Kajarekar} and Nelson {Morgan} and Sunil {Sivadas},
  title="Robust ASR front-end using spectral-based and discriminant features: experiments on Aurora tasks",
  booktitle="Proc. EUROSPEECH",
  year="2001",
  month="January",
  address="Aalborg",
  isbn="87-90834-09-7"
}