4.1 The Proof Trace Extension

Definition 12   Let F be a CNF formula and $ \pi$ be a RES refutation of it. Let the last step of $ \pi$ resolve v with $ \neg$v. Let S = $ \pi$ $ \setminus$ (F $ \cup$ {$ \neg$v,$ \Lambda$}). The proof trace extension PT(F,$ \pi$) of F is a CNF formula over variables of F and new trace variables tC for clauses C $ \in$ S. The clauses of PT(F,$ \pi$) are all initial clauses of F together with a trace clause ($ \neg$x $ \vee$ tC) for each clause C $ \in$ S and each literal x $ \in$ C.

We first show that if a formula has a short RES refutation, then the corresponding proof trace extension has a short CL proof. Intuitively, the new trace variables allow us to simulate every resolution step of the original proof individually, without worrying about extra branches left over after learning a derived clause.

Lemma 1   Suppose a formula F has a RES refutation $ \pi$. Let F' = PT(F,$ \pi$). Then $ \mathcal {C}$$\scriptstyle \mathsf {\scriptstyle CL}$(F') < size($ \pi$) when CL uses the FirstNewCut scheme and no restarts.

Proof. Suppose $ \pi$ contains a derived clause Ci whose strict subclause Ci' can be derived by resolving two previously occurring clauses. We can replace Ci with Ci', do trivial simplifications on further derivations that used Ci and obtain a simpler proof $ \pi{^\prime}$ of F. Doing this repeatedly will remove all such redundant clauses and leave us with a simplified proof no larger in size. Hence we will assume without loss of generality that $ \pi$ has no such clause.

Viewing $ \pi$ as a sequence of clauses, its last two elements must be a literal, say v, and $ \Lambda$. Let S = $ \pi$ $ \setminus$ (F $ \cup$ {v,$ \Lambda$}). Let (C1, C2,..., Ck) be the subsequence of $ \pi$ that has precisely the clauses in S. Note that Ci $ \equiv$ $ \neg$v for some i, 1$ \le$i$ \le$k. We claim that the branching sequence $ \sigma$ = (tC1, tC2,..., tCk) induces a CL proof of F of size k using the FirstNewCut scheme. To prove this, we show by induction that after i branching steps, the clause learning procedure branching according to $ \sigma$ has learned clauses C1, C2,..., Ci, has trace variables tC1, tC2,..., tCi set to TRUE, and is at decision level i.

The base case for induction, i = 0, is trivial. The clause learning procedure is at decision level zero and no clauses have been learned. Suppose the inductive claim holds after branching step i - 1. Let Ci = (x1 $ \vee$ x2 $ \vee$...$ \vee$ xl). Ci must have been derived in $ \pi$ by resolving two clauses (A $ \vee$ y) and (B $ \vee$ $ \neg$y) coming from F $ \cup$ {C1, C2,..., Ci-1}, where Ci = (A $ \vee$ B). The ith branching step sets tCi = FALSE. Unit propagation using trace clauses ($ \neg$xj $ \vee$ tCi), 1$ \le$j$ \le$l, sets each xj to FALSE, thereby falsifying all literals of A and B. Further unit propagation using (A $ \vee$ y) and (B $ \vee$ $ \neg$y) implies y as well as $ \neg$y, leading to a conflict. The cut in the conflict graph containing y and $ \neg$y on the conflict side and everything else on the reason side yields Ci as the FirstNewCut clause, which is learned from this conflict. The process now backtracks and flips the branch on tCi by setting it to TRUE. At this stage, the clause learning procedure has learned clauses C1, C2,..., Ci, has trace variables tC1, tC2,..., tCi set to TRUE, and is at decision level i. This completes the inductive step.

The inductive proof above shows that when the clause learning procedure has finished branching on all k literals in $ \sigma$, it will have learned all clauses in S. Adding to this the initial clauses F that are already known, the procedure will have as known clauses $ \neg$v as well as the two unit or binary clauses used to derive v in $ \pi$. These immediately generate $ \Lambda$ in the residual formula by unit propagation using variable v, leading to a conflict at decision level k. Since this conflict does not use any decision variable, fast backtracking retracts all k branches. The conflict, however, still exists at decision level zero, thereby concluding the clause learning procedure and finishing the CL proof.$ \Box$

Lemma 2   Let S be an f (n)-proper natural refinement of RES whose weakness is witnessed by a family {Fn} of formulas. Let {$ \pi_{n}^{}$} be the family of shortest RES proofs of {Fn}. Let {F'n} = {PT(Fn,$ \pi_{n}^{}$)}. For CL using the FirstNewCut scheme and no restarts, $ \mathcal {C}$S(F'n)$ \ge$f (n) . $ \mathcal {C}$$\scriptstyle \mathsf {\scriptstyle CL}$(F'n).

Proof. Let $ \rho_{n}^{}$ the restriction that sets every trace variable of F'n to TRUE. We claim that $ \mathcal {C}$S(F'n)$ \ge$$ \mathcal {C}$S(F'n|$\scriptstyle \rho_{n}$) = $ \mathcal {C}$S(Fn)$ \ge$f (n) . $ \mathcal {C}$$\scriptstyle \mathsf {\scriptstyle RES}$(Fn) > f (n) . $ \mathcal {C}$$\scriptstyle \mathsf {\scriptstyle CL}$(F'n). The first inequality holds because S is a natural proof system. The following equality holds because $ \rho_{n}^{}$ keeps the original clauses of Fn intact and trivially satisfies all trace clauses, thereby reducing the initial clauses of F'n to precisely Fn. The next inequality holds because S is an f (n)-proper refinement of RES. The final inequality follows from Lemma 1.$ \Box$

This gives our first main result and its corollary using Proposition 2:

Theorem 1   For any f (n)-proper natural refinement S of RES and for CL using the FirstNewCut scheme and no restarts, there exist formulas {Fn} such that $ \mathcal {C}$S(Fn)$ \ge$f (n) . $ \mathcal {C}$$\scriptstyle \mathsf {\scriptstyle CL}$(Fn).

Corollary 1   CL can provide exponentially shorter proofs than tree-like, regular, and Davis-Putnam resolution.

Remark. As clause learning yields resolution proofs of unsatisfiable formulas, CL is a refinement of RES. However, it is not necessarily a natural proof system. If it were shown to be natural, Theorem 1, by a contradiction argument, would imply that CL using the FirstNewCut scheme and no restarts is as powerful as RES itself.


Journal of Artificial Intelligence Research 22 (2004). Copyright AI Access Foundation. All rights reserved.