next up previous
Next: The Architecture of the SCREEN System Up: Abstract Previous: Processing Spoken Language

3. Flat Category Representation: An Intermediate Connecting Representation



In this section we will describe our flat category representations. First, we will show the categories for the syntactic analysis before we will depict the categories for the semantic analysis.

3.1 Categories for Flat Syntactic Analysis

Flat syntactic analysis is the assignment of syntactic categories to a sequence of words, e.g., the word hypothesis sequence generated by a speech recognizer. Flat representations up to the phrase group level support local structural decisions. Local structural decisions deal with the problem of which phrase group (abstract syntactic category) a word belongs to. In this case the local, directly preceding words and their phrase group can influence the current decision. For instance, a determiner ``the" could be part of a prepositional group ``in the mine" or part of a starting noun group ``the old mine". That is, local structural decisions depending on local context will be made based on a flat analysis.

For flat syntactic analysis we have developed a level of basic syntactic categories and abstract syntactic categories. These syntactic categories may vary depending on the language, and the degree of detail of the intended structural representation. However, the general approach is rather independent of the specifically used categories. In fact, we have used the same syntactic categories for two different domains: railway counter interactions and business meeting arrangements. The basic syntactic categories we used were noun, verb, preposition, pronoun, numeral, past participle, pause, adjective, adverb, conjunction, determiner, interjection and other. They are shown with their abbreviations in Table 1.

Basic syntactic categories
Category Examples
noun (N) date, April
adjective (J) late
verb (V) meet, choose
adverb (A) often
preposition (R) at, in
conjunction (C) and, but
pronoun (U) I, you
determiner (D) the, a
numeral (M) fourteenth
interjection (I) eh, oh
participle (P) taken
other (O) particles
pause (/) pause

Table 1:

The abstract syntactic categories we used are verb group, noun group, adverbial group, prepositional group, conjunction group, modus group, special group and interjection group. These abstract syntactic categories are shown in Table 2.

Category Examples
verb group (VG) mean, would propose
noun group (NG) a date, the next possible slot
adverbial group (AG) later, as early as possible
prepositional group (PG) in the dining hall
conjunction group (CG) and, either ... or
modus group (MG) interrogatives, confirmations: when, how long, yes
special group (SG) additives like politeness: please, then
interjection group (IG) interjections, pauses: eh, oh

Table 2: Abstract syntactic categories

The categories should express main syntactic properties of the phrases. Most of our basic and abstract syntactic categories are widely used in different parsers. However, the approach of flat representations does not crucially rely on this specific set of basic and abstract syntactic categories. Our goal is to train, learn and generalize a flat syntactic analysis based on abstract syntactic categories and basic syntactic categories. Local syntactic decisions should be made as far as possible. Local syntactic ambiguities up to the phrase group level (abstract syntactic categories) can be dealt with but more global ambiguities like prepositional phrase attachment will not be dealt with since they will need additional knowledge, e.g., from a semantics module. While complete syntax trees have a certain preference (which might turn out to be wrong based on semantic knowledge), a flat syntactic representation goes as far as possible using only local syntactic knowledge for disambiguation.

3.2 Categories for Flat Semantic Analysis

Since semantic analysis is domain-dependent, the semantic categories can differ for different domains. We have worked particularly on two domains: railway counter interactions (called: Regensburg train corpus) and business meeting arrangements (called: Blaubeuren meeting corpus). There was about 3/4 overlap between the semantic categories of the train corpus and the meeting corpus (Wermter Weber, 1996b). Differences occurred mainly for verbs, e.g., NEED-events are very frequent in the railway counter interactions while SUGGEST-events are frequent in the business meeting interactions. The semantic categories of the railway counter interactions were described in previous work (Weber Wermter, 1995). Here we will primarily focus on the semantic categories of the meeting corpus. The basic semantic categories for a word are shown in Table 3.

Category Examples
select (SEL) select, choose
suggest (SUG) propose, suggest
meet (MEET) meet, join
utter (UTTER) say, think
is (IS) is, was
have (HAVE) had, have
move (MOVE) come, go
aux (AUX) would, could
question (QUEST) question words: where, when
physical (PHYS) physical objects: building, office
animate (ANIM) animate objects: I, you
abstract (ABS) abstract objects: date
here (HERE) time or location state words, prepositions: at, in
source (SRC) time or location source words, prepositions: from
destination (DEST) time or location destination words, prepositions: to
location (LOC) Hamburg, Pittsburgh
time (TIME) tomorrow, at 3 o'clock, April
negative evaluation (NO) no, bad
positive evaluation (YES) yes, good
nil (NIL) words "without" specific semantics, e.g. determiner: a

Table 3: Basic semantic categories

At a higher level of abstraction, each word can belong to an abstract semantic category. The possible abstract semantic categories are shown in Table 4.

Category Examples
action (ACT) action for full verb events: meet, select
aux-action (AUX) auxiliary action for auxiliary events: would like
agent (AGENT) agent of an action: I
object (OBJ) object of an action: a date
recipient (RECIP) recipient of an action: to me
instrument (INSTR) instrument for an action: using an elevator
manner (MANNER) how to achieve an action: without changing rooms
time-at (TM-AT) at what time: in the morning
time-from (TM-FRM) start time: after 6 am
time-to (TM-TO) end time: before 8 pm
loc-at (LC-AT) at which location: in Frankfurt, in New York
loc-from (LC-FRM) start location: from Boston, from Dortmund
loc-to (LC-TO) end location: to Hamburg
confirmation (CONF) confirmation phrase: ok great, yes wonderful
negation (NEG) negation phrase: no stop, not
question (QUEST) question phrase: at what time
misc (MISC) miscellaneous words, e.g., for politeness: please, eh

Table 4: Abstract semantic categories

In summary, these categories provide a basis for a flat analysis. Each word is represented syntactically and semantically in its context by four categories at two basic and two abstract levels.



next up previous
Next: The Architecture of the SCREEN System Up: Abstract Previous: Processing Spoken Language


SCREEN (screen@nats5.informatik.uni-hamburg.de)
Mon Dec 16 15:33:13 MET 1996