Analysis of context-dependent segmental duration for automatic speech recognition

Xue Wang, Louis C. W. Pols & Louis F. M. ten Bosch

(Postscript (168k) and RTF (153k) versions are available)


This paper presents research on integrating context-dependent durational knowledge into HMM-based speech recognition. The first part of the paper presents work on obtaining relations between the parameters of the context-free HMMs and their durational behaviour, in preparation for the context-dependent durational modelling presented in the second part. Duration integration is realised via rescoring in the post-processing step of our N-best monophone recogniser. We use the multi-speaker TIMIT database for our analyses.

  1. introduction
  2. Phone duration distributions
    1. Vowel duration distribution affected by stressing and location
    2. Effect of post-vocalic plosives on vowel duration
    3. Effect of speaking rate on vowel duration
  3. Hand-labelling vs. automatic segmentation
  4. Analysis of variance
  5. discussion