Contents of 4.1 The Proof Trace Extension

Definition 12 Let F be a CNF formula and $\pi$ be a RES refutation of it. Let the last step of $\pi$ resolve v with $\neg$ v. Let S = $\pi$ $\setminus$ (F $\cup$ { $\neg$ v, $\Lambda$ }). The proof trace extension PT(F, $\pi$ ) of F is a CNF formula over variables of F and new trace variables t_C for clauses C $\in$ S. The clauses of PT(F, $\pi$ ) are all initial clauses of F together with a trace clause ( $\neg$ x $\vee$ t_C) for each clause C $\in$ S and each literal x $\in$ C.

We first show that if a formula has a short RES refutation, then the corresponding proof trace extension has a short CL proof. Intuitively, the new trace variables allow us to simulate every resolution step of the original proof individually, without worrying about extra branches left over after learning a derived clause.

Lemma 1 Suppose a formula F has a RES refutation $\pi$ . Let F' = PT(F, $\pi$ ). Then $\mathcal {C}$ (F') < size( $\pi$ ) when CL uses the FirstNewCut scheme and no restarts.

Proof. Suppose $\pi$ contains a derived clause C_i whose strict subclause C_i' can be derived by resolving two previously occurring clauses. We can replace C_i with C_i', do trivial simplifications on further derivations that used C_i and obtain a simpler proof $\pi{^\prime}$ of F. Doing this repeatedly will remove all such redundant clauses and leave us with a simplified proof no larger in size. Hence we will assume without loss of generality that $\pi$ has no such clause.

Viewing $\pi$ as a sequence of clauses, its last two elements must be a literal, say v, and $\Lambda$ . Let S = $\pi$ $\setminus$ (F $\cup$ {v, $\Lambda$ }). Let (C₁, C₂,..., C_k) be the subsequence of $\pi$ that has precisely the clauses in S. Note that C_i $\equiv$ $\neg$ v for some i, 1 $\le$ i $\le$ k. We claim that the branching sequence $\sigma$ = (t_C₁, t_C₂,..., t_{C_k}) induces a CL proof of F of size k using the FirstNewCut scheme. To prove this, we show by induction that after i branching steps, the clause learning procedure branching according to $\sigma$ has learned clauses C₁, C₂,..., C_i, has trace variables t_C₁, t_C₂,..., t_{C_i} set to TRUE, and is at decision level i.

The base case for induction, i = 0, is trivial. The clause learning procedure is at decision level zero and no clauses have been learned. Suppose the inductive claim holds after branching step i - 1. Let C_i = (x₁ $\vee$ x₂ $\vee$ ... $\vee$ x_l). C_i must have been derived in $\pi$ by resolving two clauses (A $\vee$ y) and (B $\vee$ $\neg$ y) coming from F $\cup$ {C₁, C₂,..., C_i-1}, where C_i = (A $\vee$ B). The i^th branching step sets t_{C_i} = FALSE. Unit propagation using trace clauses ( $\neg$ x_j $\vee$ t_{C_i}), 1 $\le$ j $\le$ l, sets each x_j to FALSE, thereby falsifying all literals of A and B. Further unit propagation using (A $\vee$ y) and (B $\vee$ $\neg$ y) implies y as well as $\neg$ y, leading to a conflict. The cut in the conflict graph containing y and $\neg$ y on the conflict side and everything else on the reason side yields C_i as the FirstNewCut clause, which is learned from this conflict. The process now backtracks and flips the branch on t_{C_i} by setting it to TRUE. At this stage, the clause learning procedure has learned clauses C₁, C₂,..., C_i, has trace variables t_C₁, t_C₂,..., t_{C_i} set to TRUE, and is at decision level i. This completes the inductive step.

The inductive proof above shows that when the clause learning procedure has finished branching on all k literals in $\sigma$ , it will have learned all clauses in S. Adding to this the initial clauses F that are already known, the procedure will have as known clauses $\neg$ v as well as the two unit or binary clauses used to derive v in $\pi$ . These immediately generate $\Lambda$ in the residual formula by unit propagation using variable v, leading to a conflict at decision level k. Since this conflict does not use any decision variable, fast backtracking retracts all k branches. The conflict, however, still exists at decision level zero, thereby concluding the clause learning procedure and finishing the CL proof. $\Box$

Lemma 2 Let S be an f (n)-proper natural refinement of RES whose weakness is witnessed by a family {F_n} of formulas. Let { $\pi_{n}^{}$ } be the family of shortest RES proofs of {F_n}. Let {F'_n} = {PT(F_n, $\pi_{n}^{}$ )}. For CL using the FirstNewCut scheme and no restarts, $\mathcal {C}$ _S(F'_n) $\ge$ f (n)^. $\mathcal {C}$ (F'_n).

Proof. Let $\rho_{n}^{}$ the restriction that sets every trace variable of F'_n to TRUE. We claim that $\mathcal {C}$ _S(F'_n) $\ge$ $\mathcal {C}$ _S(F'_n|) = $\mathcal {C}$ _S(F_n) $\ge$ f (n)^. $\mathcal {C}$ (F_n) > f (n)^. $\mathcal {C}$ (F'_n). The first inequality holds because S is a natural proof system. The following equality holds because $\rho_{n}^{}$ keeps the original clauses of F_n intact and trivially satisfies all trace clauses, thereby reducing the initial clauses of F'_n to precisely F_n. The next inequality holds because S is an f (n)-proper refinement of RES. The final inequality follows from Lemma 1. $\Box$

Theorem 1 For any f (n)-proper natural refinement S of RES and for CL using the FirstNewCut scheme and no restarts, there exist formulas {F_n} such that $\mathcal {C}$ _S(F_n) $\ge$ f (n)^. $\mathcal {C}$ (F_n).

Corollary 1 CL can provide exponentially shorter proofs than tree-like, regular, and Davis-Putnam resolution.

Remark. As clause learning yields resolution proofs of unsatisfiable formulas, CL is a refinement of RES. However, it is not necessarily a natural proof system. If it were shown to be natural, Theorem 1, by a contradiction argument, would imply that CL using the FirstNewCut scheme and no restarts is as powerful as RES itself.