W3C Working Draft 3 January 2001
Andreas Kellner, Philips Research Labs
This document defines syntax for representing N-Gram (Markovian) stochastic grammars within the W3C Speech Interface Framework. The use of stochastic N-Gram models has a long and successful history in the research community and is now more and more effecting commercial systems, as the market asks for more robust and flexible solutions. The primary purpose of specifying a stochastic grammar format is to support large vocabulary and open vocabulary applications. In addition, stochastic grammars can be used to represent concepts or semantics. This specification defines the mechanism for combining stochastic and structured (in this case Context-Free) grammars as well as methods for combined semantic definitions.
Status of this Document
This document is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C Working Drafts can be found at http://www.w3.org/TR .
This specification describes markup for representing statistical language models, and forms part of the proposals for the W3C Speech Interface Framework. This document has been produced as part of the W3C Voice Browser Activity. following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review, and comments and discussion are welcomed on the public mailing list <firstname.lastname@example.org >. To subscribe, send an email to <email@example.com > with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). The archive for the list is accessible online.
Table of Contents
This document defines syntax for representing N-Gram (Markovian) stochastic grammars within the W3C Voice Browser Markup Language. The parent language for specification of a stochastic grammar is XML, however for efficiency some variance from strict XML syntax will be used. Elements of the grammar specification already defined in the XML specification will not be repeated here (e.g. character encoding), thus avoiding any potential inconsistency with the current or future XML specifications.
The primary purpose of specifying a stochastic grammar format is to support large vocabulary and open vocabulary applications. In addition, stochastic grammars can be used to represent concepts or semantics. This specification defines the mechanism for combining stochastic and structured (in this case Context-Free) grammars as well as methods for combined semantic definitions. Since some structured grammars are also stochastic, we will avoid confusion from here on by only referring to these grammars as N-Gram grammars, or in some cases simply N-Grams.
An N-Gram grammar is a representation of an N-th order Markov language model in which the probability of occurrence of a symbol is conditioned upon the prior occurrence of N-1 other symbols. N-Gram grammars are typically constructed from statistics obtained from a large corpus of text using the co-occurrences of words in the corpus to determine word sequence probabilities. N-Gram grammars have the advantage of be able to cover a much larger language than
would normally be derived directly from a corpus. Open vocabulary applications are easily supported with N-Gram grammars.
This specification is influenced by a variety of preceding N-Gram grammar formats. This specification is not explicitly based on any particular preceding format. Concepts are similar but the syntax is largely original in this specification due to the XML parent language.
This specification is written to be consistent with the corresponding Context-Free Grammar (CFG) XML format specified in a companion document entitled "Speech Recognition Grammar Specification for the W3C Speech Interface Framework". At some point in the near future it is expected that these documents will be unified to ensure consistency among the common components of the specifications. To simplify this unification this document also borrows from some of the CFG examples. In maintaining such consistency the XML form of the deterministic grammar format will the primary definition followed in this specification to maintain compatibility with the XML based N-Gram format defined here. Specifications will be defined in lavender boxes and examples will be given in green boxes .
In simple speech recognition/speech understanding systems, the expected input sentences are often modeled by a strict grammar (such as a CFG). In this case, the user is only allowed to utter those sentences, that are explicitly covered by the (often hand-written) grammar. Experience shows that a context free grammar with reasonable complexity can never foresee all the different sentence patterns, users come up with in spontaneous speech input. This approach is therefore not sufficient for robust speech recognition/understanding tasks or free text input applications such as dictation.
N-Gram language models are traditionally used in large vocabulary speech recognition systems to provide the recognizer with an a-priori likelihood P(W) of a given word sequence W. The N-Gram language model is usually derived from large training texts that share the same language characteristics as expected input. This information complements the acoustic model P(W|O) that models the articulatory features of the speakers. Together, these two components allow a system to compute the most likely input sequence W' = argmaxW P(W|O). where O is the input signal observations as W' = argmaxW P(O|W) P(W).
In contrast, N-Gram language models rely on the likelihood of sequences of words, such as word pairs (in the case of bigrams) or word triples (in the case of trigrams) and are therefore less restrictive. The use of stochastic N-Gram models has a long and successful history in the research community and is now more and more effecting commercial systems, as the market asks for more robust and flexible solutions.
There are many possible ways to combine N-Gram models and context free grammars within a single voice browser system such as
- using an N-Gram model in the recognizer and a CFG in a (separate) understanding component
- integrating special N-Gram rules at various levels in a CFG to allow for flexible input in specific context
- using a CFG to model the structure of phrases (e.g. numeric expressions) that incorporated in a higher-level N-Gram model (class N-Grams)
For this reason, cross-referencing between N-Gram models and CFGs is an important feature of the markup described below.
List of tags and Attributes