Disambiguation Algorithm


In this section, we formally describe the SM algorithm which consists of the following five steps:

Step 1:
All nouns are extracted from a given context. These nouns constitute the input context, $Context=\{w_{1}, w_{2},..., w_{n}\}$. For example, $Context=\{plant, tree, perennial, leaf \}$.

Step 2:
For each noun $w_{i}$ in the context, all its possible senses $S_{i}=\{S_{i1},S_{i2},..., S_{in}\}$ are obtained from WordNet. For each sense $S_{ij}$, the hypernym chain is obtained and stored in order into stacks. For example, Table 1 shows all the hypernyms synsets for each sense of the word Plant.


Table: Hypernyms synsets of plant

plant#1 plant#2 plant#3 plant#4

building complex#1

life form#1 contrivance#3 actor#1
structure#1 entity#1 scheme#1 performer#1
artifact#1 plan of action#1 entertainer#1
object#1 plan#1 person#1
entity#1 idea#1 life form#1
content#5 entity#1
cognition#1
psychological feature#1


Step 3:
To each sense appearing in the stacks, the method associates the list of subsumed senses from the context (see Figure 2, which illustrates the list of subsumed senses for plant#1 and plant#2).

Figure 2: Data Structure for Senses of the Word Plant
\includegraphics[width=12cm, height=5.00cm,clip]{paso_3.eps}

Step 4:
Beginning from the initial specification marks (the top synsets), the program descends recursively through the hierarchy, from one level to another, assigning to each specification mark the number of context words subsumed.

Figure 3 shows the word counts for plant#1 through plant#4 located within the specification mark entity#1, ..., life form#1, flora#2. For the entity#1 specification mark, senses #1, #2, and #4 have the same maximal word counts (4). Therefore, it is not possible to disambiguate the word plant using the entity#1 specification mark, and it will be necessary to go down one level of the hyponym hierarchy by changing the specification mark. Choosing the specification mark life form#1, senses #2 and #4 of plant have the same maximal word counts (3). Finally, it is possible to disambiguate the word plant with the sense #2 using the {plant#2, flora#2} specification mark, because of this sense has the higher word density (in this case, 3).

Figure 3: Word Counts for Four Senses of the Word Plant
\includegraphics[width=8.50cm, height=7cm,clip]{paso_4.eps}

Step 5:
In this step, the method selects the word sense(s) having the greatest number of words counted in Step 4. If there is only one sense, then that is the one that is obviously chosen. If there is more than one sense, we repeat Step 4, moving down each level within the taxonomy until a single sense is obtained or the program reach a leaf specification mark. Figure 3 shows the word counts for each sense of plant (#1 through #4) located within the specification mark entity#1, ..., life form#1, flora#2. If the word cannot be disambiguated in this way, then it will be necessary to continue the disambiguation process applying a complementary set of heuristics.