Next: Query Packs Up: Improving the Efficiency of Previous: Introduction

Inductive Logic Programming

Inductive logic programming [24] is situated in the intersection of machine learning or data mining on the one hand, and logic programming on the other hand. It shares with the former fields the goal of finding patterns in data, patterns that can be used to build predictive models or to gain insight in the data. With logic programming it shares the use of clausal first order logic as a representation language for both data and hypotheses. In the remainder of this text we will use some basic notions from logic programming, such as literals, conjunctive queries, and variable substitutions. We will use Prolog notation throughout the paper. For an introduction to Prolog and logic programming see [10].

Inductive logic programming can be used for many different purposes, and the problem statements found in ILP papers consequently vary. In this article we consider the so-called learning from interpretations setting [16,14]. It has been argued elsewhere that this setting, while slightly less powerful than the standard ILP setting (it has problems with, e.g., learning recursive predicates), is sufficient for most practical purposes and scales up better [6].

We formulate the learning task in such a way that it covers a number of different problem statements. More specifically, we consider the problem of detecting for a set of conjunctive queries for which instantiations of certain variables each query succeeds. These variables are called key variables, and a grounding substitution for them is called a key instantiation. The intuition is that an example in the learning task is uniquely identified by a single key instantiation.

The link with ILP systems that learn clauses is then as follows. The search performed by an ILP system is directed by regularly evaluating candidate clauses. Let us denote such a candidate clause by $Head(X) \leftarrow Body(X,Y)$ where represents a vector of variables appearing in the head of the clause and represents additional variables that occur in the body. We assume that the head is a single literal and that a list of examples is given, where each example is of the form $Head(X)\theta$ with $\theta$ a substitution that grounds . Examples may be labelled (e.g., as positive or negative), but this is not essential in our setting. While an example can be represented as a fact $Head(X)\theta$ when learning definite Horn clauses, we can also consider it just a tuple $X\theta$ . Both notations will be used in this paper.

Intuitively, when positive and negative examples are given, one wants to find a clause that covers as many positive examples as possible, while covering few or no negatives. Whether a single example $Head(X)\theta$ is covered by the clause or not can be determined by running the query $?- Body(X,Y)\theta$ . In other words, evaluating a clause boils down to running a number of queries consisting of the body of the clause. For simplicity of notation, we will often denote a conjunctive query by just the conjunction (without the symbol).

In some less typical ILP settings, the ILP algorithm does not search for Horn clauses but rather for general clauses, e.g., CLAUDIEN [15] or for frequent patterns that can be expressed as conjunctive queries, e.g., WARMR[18]. These settings can be handled by our approach as well: all that is needed is a mapping from hypotheses to queries that allow to evaluate these hypotheses. Such a mapping is defined by [15] for CLAUDIEN; for WARMR it is trivial.

Given a set of queries and a set of examples , the main task is to determine which queries $Q \in S$ cover which examples $e \in E$ . We formalise this using the notion of a result set:

Definition 1 (Result set) The result set of a set of queries S in a deductive database D for key

and example set

, is

$\begin{displaymath}RS(S,K,D,E) = \{ (K\theta,i) \vert Q_i \in S \mbox{ and } K\theta \in E \mbox{ and } Q_i\theta \mbox{ succeeds in }D\}\end{displaymath}$

Similar to the learning from interpretations setting defined in [14], the problem setting can now be stated as:

Given: a set of conjunctive queries

, a deductive database

, a tuple

of variables that occur in each query in S, and an example set

Find: the result set

; i.e., find for each query

those ground instantiations $\theta$ of

for which $K\theta \in E$ and $Q\theta$ succeeds in

Example 1 Assume an ILP system learning a definition for grandfather/2 wants to evaluate the following hypotheses:

      grandfather(X,Y) :- parent(X,Z), parent(Z,Y), male(X).
      grandfather(X,Y) :- parent(X,Z), parent(Z,Y), female(X).

Examples are of the form grandfather(,) where and are constants; hence each example is uniquely identified by a ground substitution of the tuple . So in the above problem setting the set of Prolog queries S equals $\{($ ?- parent(X,Z), parent(Z,Y), male(X), ?- parent(X,Z), parent(Z,Y), female(X) $) \}$ and the key equals . Given a query $Q_i \in S$ , finding all tuples for which $((x,y), i) \in R$ (with the result set as defined above) is equivalent to finding which of the grandfather(,) facts in the example set are predicted by the clause grandfather(X,Y) :- .

The generality of our problem setting follows from the fact that once it is known which queries succeed for which examples, the statistics and heuristics that typical ILP systems use can be readily obtained from this. A few examples:

discovery of frequent patterns [18]: for each query the number of key instantiations for which it succeeds just needs to be counted, i.e., $freq(Q_i) = \vert\{K\theta \vert (K\theta,i) \in R\}\vert$ with the result set.
induction of Horn clauses [23,26]: the accuracy of a clause :- (defined as the number of examples for which body and head hold, divided by the number of examples for which the body holds) can be computed as ${ \vert\{K\theta \vert (K\theta,i) \in R \land D \models H\theta\}\vert \over \vert\{K\theta \vert (K\theta,i) \in R\}\vert}$ with the result set.
induction of first order classification or regression trees [21,5,7]: the class entropy or variance of the examples covered (or not covered) by a query can be computed from the probability distribution of the target variable; computing this distribution involves simple counts similar to the ones above.

After transforming the grandfather/2 clauses into

      grandfather((X,Y)),I) :- parent(X,Z), parent(Z,Y), male(X), I = 1.
      grandfather((X,Y)),I) :- parent(X,Z), parent(Z,Y), female(X), I = 2.

the result set can clearly be computed by collecting for all grounding $\theta$ 's where $K\theta \in E$ the answers to the query ?- grandfather(,I) . In Section 3 the queries will have a literal I = i at the end or another goal which by side-effects results in collecting the result set.

In practice, it is natural to compute the result set using a double loop: one over examples and one over queries and one has the choice as to which is the outer loop. Both the ``examples in outer loop'' and the ``queries in outer loop'' have been used in data mining systems; in the context of decision trees, see for instance [25] and [22]. We shall see further that the redundancy removal approach we propose uses the ``examples in outer loop'' strategy. In both approaches however, given a query and a key instantiation, we are interested only in whether the query succeeds for that key instantiation. This implies that after a particular query has succeeded on an example, its execution can be stopped.

In other words: computing the result set defined above boils down to evaluating each query on each example, where we are only interested in the existence of success for each such evaluation. Computing more than one solution for one query on one example is unnecessary.

Next: Query Packs Up: Improving the Efficiency of Previous: Introduction

Hendrik Blockeel 2002-02-26