Description of Features

The set of features defined for the training of the system is described in Figure 9 and is based on the features described by Ng1996 and Escudero2000ecml. These features represent words, collocations, and POS tags in the local context. Both ``collapsed'' and ``non-collapsed'' functions are used.

Figure 9: Features Used for the Training of the System
\begin{figure} % latex2html id marker 440 \small \par \begin{itemize} \item ... ...ions (see Equation \ref{relaxedF})\vspace{-3pt} \end{itemize} \end{figure}

Actually, each item in Figure 9 groups several sets of features. The majority of them depend on the nearest words (e.g., $s$ comprises all possible features defined by the words occurring in each sample at positions $w_{-3}$, $w_{-2}$, $w_{-1}$, $w_{+1}$, $w_{+2}$, $w_{+3}$ related to the ambiguous word). Types nominated with capital letters are based on the ``collapsed'' function form; that is, these features simply recognize an attribute belonging to the training data.

Keyword features ($k$m) are inspired by Ng1996 work. Noun filtering is done using frequency information for nouns co-occurring with a particular sense. For example, let us suppose $m = 10$ for a set of 100 examples of interest#4: if the noun bank is found 10 times or more at any position, then a feature is defined.

Moreover, new features have also been defined using other grammatical properties: relationship features ($r$) that refer to the grammatical relationship of the ambiguous word (subject, object, complement, ...) and dependency features ($d$ and $D$) that extract the word related to the ambiguous one through the dependency parse tree.