Exploiting statistical characteristics of word sequences for the efficient coding of speech.
Abstract
Characteristics of natural language can be illuminated through the application of well known tools in Information Theory. This paper shows how some of these characteristics can be exploited in the development of automated speech and language processing applications. The explicit representation of discontinuities in a temporal sequence of sounds, such as pauses in speech, can be utilized to improve the transmission of information. Arguments based on comparative entropy measures are used.