Boolean Satisfiability

Definition 2.1 A variable is simply a letter (e.g.,

) that can be either true or false. A literal is either a variable or the negation of a variable. A clause is a disjunction of literals, and a Boolean satisfiability problem (in conjunctive normal form), or a SAT problem, is a conjunction of clauses.

A solution to a SAT problem is an assignment of values to each of the letters so that every clause in is satisfied.

None of this should be new. Satisfiability of SAT instances is well-known to be NP-complete [CookCook1971], and the language is a reasonably natural one for encoding real-world problems. As we remarked in the introduction, the classic algorithm for solving these problems is depth-first search augmented with an ability to set variables whose values are forced:

Procedure 2.2 (Davis-Putnam-Logemann-Loveland) Given a SAT problem

and a partial assignment

of values to variables, to compute $% latex2html id marker 2063 $Procedure~\ref{p.dpll}(C,P)$$ :

$\begin{codebox} % latex2html id marker 266\li $P \gets Procedure~\ref{p.Unit-P... ...n $Procedure~\ref{p.dpll}(C,P \cup \{l = \mbox{\tt false}\})$ \End \end{codebox}$

Variables are assigned values in two ways. In the first, unit propagation, the clause set is examined under the existing partial assignment and new consequential assignments are identified. Somewhat more specifically (see below), clauses are found that have no satisfied literals and exactly one unvalued literal. In each such clause, the unvalued literal is valued so as to satisfy the clause. This process is repeated until a contradiction is encountered, a solution is found, or no more clauses meet the necessary conditions. If the unit propagation function terminates without reaching a contradiction or finding a solution, then a variable is selected and assigned a value, and the procedure recurs.

In practice, the choice of branch literal is crucial to the performance of the algorithm. (Note that by choosing a branch literal $\neg l$ instead of

, we can also select the order in which values are tried for the underlying variable.) Relatively early work on DPLL focused on the selection of branch variables that produced as many unit propagations as possible, thus reducing the size of the residual problems that had to be solved recursively. As we will see in Section 2.3, however, more recent ideas appear to be more effective.

Missing from Procedure 2.2, however, is a description of the propagation process. Here it is:

Procedure 2.3 (Unit propagation) To compute $% latex2html id marker 2083 $Procedure~\ref{p.Unit-Propagate}(P)$$ :
$\begin{codebox} \li \While no contradiction is found \kw{and} there is a $c\in C... ...$V$\ is selected so that $c$ is satisfied} \}$ \End \li \Return $P$\end{codebox}$

DPLL has a variety of well-known theoretical properties. It is sound and complete in that every candidate solution returned is a solution of the original problem, and such a solution will be returned if one exists (and failure eventually reported if no solution exists).

From a practical point of view, the running time of DPLL is obviously potentially exponential in the size of the problem, since at each iteration, possibly only a single variable is assigned a value before the routine is invoked recursively. In practice, of course, unit propagation can reduce the number of branch points substantially, but the running time remains exponential on difficult instances [Crawford AutonCrawford Auton1996].

We also point out that on difficult problem instances, most of the running time is necessarily spent exploring portions of the search space in which there are no solutions. After all, if

is a partial assignment that can be extended to a solution, the algorithm will never backtrack away from

. (Indeed, it cannot and retain completeness, since

may be the only variable assignment that extends to a full solution at this point.) Given that there can be no backtrack away from a satisfiable partial assignment, and that the number of backtracks is exponential in the problem size, it is clear that most of the time spent by the program is indeed in evaluating unsatisfiable regions of the search space.