ASSOCIATION FOR AUTOMATED REASONING Newsletter

No. 36, April 1997

From the AAR President, Larry Wos...

Featured in this newsletter are two articles. The first, by Maria Paola Bonacina, discusses the relatively new area of strategy analysis. Bonacina proposes six problems that I encourage AAR researchers to solve. This is not a contest (for that, I suggest you enter one of the contests in CADE-14), but a challenge: success in answering such questions may advance our field. The second article, by William McCune, presents two programs written in Java to implement automated deduction GUIs. McCune notes that the programs may be useful to beginning logic students and to those trying to make sense of Otter proofs.

Reminder: Early CADE registration ends May 12. For further information, see the CADE-14 web page.

IMPORTANT ANNOUNCEMENT:
AAR Newsletter to Be Distributed via the Web

Starting with the next issue, the AAR Newsletter will be distributed via the Web rather than through paper copies. The URL is http://www.mcs.anl.gov/AAR/. The newsletter will be available in html form and Postscript form, for the convenience of the reader. (For those who do not have reliable Web access, we can arrange to send the newsletter by e-mail; contact pieper@mcs.anl.gov.)

As always, AAR members will continue to be eligible for a substantially reduced subscription to the Journal of Automated Reasoning as well as a substantial deduction in the registration fee for the (now annual) CADE conference.

We will put articles on the Web as they are accepted. Periodically (approximately four times a year) we will send out a notice by e-mail that a new ``issue" (identified by number) is on the Web. Our intention here is twofold: to have the newsletter maintain currency, and to provide a means of citing a specific issue.

Having the newsletter on the Web will provide contributors with far greater flexibility and capabilities. For example, one will be able to include links to projects for more detailed description or to extended proofs or additional challenge problems. The Web will also allow AAR newsletter contributors to use color graphics and multimedia facilities.

AAR members are encouraged to contribute to this new AAR newsletter. Submissions are invited in html (standard LaTeX is also acceptable). Please send them to our editor Gail Pieper, pieper@mcs.anl.gov.

A Note on the Analysis of Theorem-Proving Strategies

Maria Paola Bonacina
(bonacina@cs.uiowa.edu)

In recent work [3] David Plaisted proposed an approach to analyze the complexity of theorem-proving strategies, and applied it to Horn propositional logic. Most of the research in complexity of theorem proving studies the complexity of propositional proofs, while most of the work on search concentrates on the design of heuristics. (See 3 and 1 for most references to this work.) To my knowledge, Plaisted's approach has been the first one to treat the complexity of searching for a proof. This note is a comment on this ground-breaking work. The framework of [3] is a starting point to discuss open problems that need to be addressed in order to analyze theorem-proving strategies in first-order logic. These include the formalization of the search plan, the representation of contraction, the duality of forward and backward reasoning, and the definition of complexity measures for infinite search spaces.

A theorem-proving strategy is made of an inference system and a search plan. The inference system defines the possible inferences, and the search plan selects at each step of a derivation the inference rule and the premises for the next step. Contraction-based strategies include the strategies that integrate resolution, Knuth-Bendix completion, and term rewriting. These strategies work primarily by forward reasoning. They use expansion rules such as resolution to generate consequences from existing clauses. Forward-reasoning methods do not generally distinguish goal clauses from the others, seeking to obtain a contradiction from the whole set. Generated clauses are kept and used for further inferences, so that the strategy works on a database of clauses. A major source of redundancy for these methods is that they generate and retain clauses that do not contribute to proving the theorem. Contraction-based strategies counter this problem by using contraction rules, such as simplification and subsumption, to delete redundant clauses. Since the inference system features multiple inference rules, and generated clauses are kept, creating many possible choices of premises, the search plan plays an important role. Forward-reasoning strategies admit various search plans, which sort the inferences by different criteria. Contraction-based strategies employ eager-contraction search plans that give priority to contraction over expansion, in order to maintain the database minimal with respect to the contraction rules.

Subgoal-reduction strategies include model elimination, tableau-based methods, Prolog technology theorem proving, the MESON strategy, and the problem-reduction-format strategies. These strategies work by reducing the goal to subgoals. Each inference step involves the current goal and at most another premise. The current goal is initially a selected input clause and then the most recently generated goal. If no rule applies to the current goal, the strategy backtracks to the previous one. Since generated clauses are goals retained for backtracking, the strategy works on a stack of goals. Since there is no database, the application of contraction is limited, and the search plan is fixed, usually depth-first search with iterative deepening. A source of redundancy for these strategies is that by focusing on the most recently generated goal, they may reduce repeatedly the same subgoals. Subgoal-reduction methods counter this problem by using techniques for lemmaizing or caching that enable them to keep track of already solved or failed subgoals.

Plaisted's Approach to the Analysis of Strategies

The model of the search space

A theorem-proving strategy is formalized in [3] as a 5-tuple <S,V,i,E,u>, where S is a set of states, and E is a set of pairs of states, in such a way that <S,E> forms a directed graph. The component V is a set of labels for the states in S. The component i is a mapping from the set of input clauses to S, which identifies the initial state(s). The component u is a function from S to tex2html_wrap_inline296, where u(s)=true if and only if s is a state where unsatisfiability is detected. It is assumed that u is an almost trivial test, such as recognizing that a set of clauses contains the empty clause. A more precise definition of V and u depends on the specific strategy. A function label, such that label(s) denotes the label of state s, is also used. It is required that no two distinct edges tex2html_wrap_inline314 and tex2html_wrap_inline316 in E have tex2html_wrap_inline320. This implies that the directed graph is a set of trees. A strategy is said to be linear if for all states s, there is a unique state t such that tex2html_wrap_inline326, that is, every state has unique successor.

Given a strategy G=<S,V,i,E,u> and an input set of clauses R, a state tex2html_wrap_inline332 is reachable from R if there is a path from a state in i(R) to s. S(R) denotes the subset of S containing all states that are reachable from R, and E(R) denotes the restriction of E to S(R). Then, G(R)=<S(R),E(R)> is the search space for the theorem-proving problem R according to the strategy G.

For resolution, the labels of the states are finite sets of clauses. If R is the input set of clauses, and tex2html_wrap_inline360 is the state with label R, the input function tex2html_wrap_inline364 is the function such that tex2html_wrap_inline366 for all tex2html_wrap_inline368. In other words, all input clause are mapped to a single initial state, whose label is the set of input clauses. It follows that G(R) is a tree with R as the label of the root. An arc connects state s to state t if the label of t is the union of the label of s and all the resolvents in s according to the inference system. This means that each state has a unique successor, that is, the tree G(R) for resolution degenerates to a list. Accordingly, all resolution-based strategies are linear in [3]. The function u is defined by u(s)=true if and only if the label of s contains the empty clause.

For model elimination, each state is labeled by a single chain (a chain is a sorted clause with plain literals, called B-literals, and framed literals, called A-literals for ancestors) which is the current goal in that state. If tex2html_wrap_inline392 is the input set of clauses, there are states tex2html_wrap_inline394 with labels tex2html_wrap_inline396, respectively, and tex2html_wrap_inline398, for all tex2html_wrap_inline400. In other words, there is an initial state for each input clause. This captures the fact that any input clause can be chosen as the initial chain. It follows that G(R) is a set of trees, one for each possible initial chain. An arc connects state s to state t if the chain labeling t is generated from the chain labeling s in one ME-step (ME-extension or ME-reduction or ME-contraction). The function u is defined in the same way as for resolution. The strategies based on model elimination are not linear, because an arc represents a single step, and more than one step may be applicable to a chain (e.g., ME-extension steps with different input clauses).

The complexity measures to evaluate and compare the strategies

The measures of complexity of search proposed in [3] aim at measuring the total size of the search space. The total size of G(R) is defined as tex2html_wrap_inline416. For resolution, the labels are finite sets of clauses, and therefore tex2html_wrap_inline418 is the sum of the cardinalities of the sets of clauses labeling the nodes of G(R). For model elimination, the labels are singleton sets, so that tex2html_wrap_inline418 is equal to the number of nodes in G(R).

The measure tex2html_wrap_inline418 is refined into three measures, called duplication by iteration, duplication by case analysis and duplication by combination. Duplication by iteration is the maximum length of a path in G(R). Duplication by case analysis is the maximum size of a subset of S(R) no two elements of which are on the same path. Duplication by combination is the maximum cardinality of a label of a state in S(R), i.e., tex2html_wrap_inline434. Since G(R) is a tree or a set of trees, duplication by case analysis reduces to the number of paths, and tex2html_wrap_inline418 is bounded by the product of duplication by iteration, duplication by case analysis and duplication by combination. The analysis proceeds by establishing whether these measures are exponential or linear or constant in the length of the input set R read as a single string.

The analysis of the strategies in Horn propositional logic

According to the analysis in [3], basic resolution strategies have linear duplication by iteration and exponential duplication by combination. The duplication by case analysis is trivially constant (more precisely, equal to 1), because each state has unique successor. The exponential duplication by combination captures the complexity of generating and keeping clauses, since resolvents are generated by combining in different ways the input literals. Since one of the measures is exponential, resolution strategies are regarded as inefficient. For positive hyperresolution, however, the duplication by combination is linear, because positive hyperresolvents in Horn logic are unit clauses, so that all generated clauses are unit clauses, and literals are not combined. Also, the duplication by combination of positive resolution reduces from exponential to linear if the strategy is equipped with an ordering on predicate symbols, and only the negative literals with smallest predicate are resolved upon. In summary, resolution strategies are either efficient, but not goal-sensitive (e.g., positive hyperresolution and positive resolution), or goal-sensitive, but not efficient (e.g., negative resolution), or neither efficient nor goal-sensitive (e.g., ordered resolution). These results are worst-case results. For instance, ordered resolution is efficient for some sets of clauses and orderings (e.g., all well-ordered sets (a set of Horn clauses is well-ordered if there is a partial ordering < such that if P : -P 1, ..., P n is a clause in the set, then P i < P for all i, 1 less than or equal to i less than or equal to m), but there are sets for which an ordering that makes the strategy efficient cannot be found. The same is true for goal-sensitivity.

Model elimination has exponential duplication by iteration and exponential duplication by case analysis. The duplication by combination is trivially constant (equal to 1), because each state is a singleton. The exponential duplication by iteration captures the redundancy of solving subgoals repeatedly. Duplication by case analysis is also exponential because a state may have multiple successors. Thus, resolution and model elimination are sort of dual: resolution has low duplication by case analysis and iteration, but high duplication by combination, and vice versa for model elimination.

If model elimination is enriched with unit lemmaizing (in Horn logic all lemmas generated by lemmaizing are unit lemmas), duplication by iteration becomes linear, because lemmaizing prevents solving repeatedly the same successful subgoals. Lemmatization adds a forward-reasoning character to the strategy, because lemmas are generated and retained. In the framework of [3] this means exponential duplication by combination and constant duplication by case analysis, so that model elimination with lemmaizing has the same duplication as the resolution strategies. In Horn logic model elimination can be enhanced with caching. (Caching is not consistent with ME-reduction, which is necessary for the completeness of model elimination in first-order logic. In first-order logic one may use lemmaizing or more complicated caching schemes.) The use of caching reduces the duplication further, because not only successful subgoals (success caching), but also failed subgoals (failure caching) are not repeated. Assuming depth-first search with iterative deepening, duplication by combination becomes linear (more precisely, quadratic). Similar results apply to the other subgoal-reduction strategies. In summary, subgoal-reduction strategies are inefficient, but goal-sensitive. Subgoal-reduction strategies with caching are efficient and goal-sensitive.

A caveat

Working by worst-case analysis means exhibiting a set of clauses where a strategy performs poorly. Examples are the parametric sets tex2html_wrap_inline452 and tex2html_wrap_inline454 that give the exponential upper bounds on duplication by combination for negative resolution and ordered resolution, respectively. Worst-case sets may display regularities of structure, such as symmetries, or recurrence relations. The worst-case sets used in [3] incorporate recurrence relations on the indices of the literals in the clauses. The recurrence relations have exponential solution, and this is used in the proofs of the exponential behavior of the strategies. It is not known whether these patterns occur in real derivations, and how often. Also, worst-case sets are hard from a combinatorial point of view for the targeted strategy, but may be easy in the practice of theorem proving. tex2html_wrap_inline452 is satisfiable and contains no positive clauses, so that positive hyperresolution or positive resolution would not apply a single inference and establish that the set is satisfiable in the time needed to read it. Similarly, tex2html_wrap_inline454 is satisfiable and contains neither positive nor negative clauses, so that it is trivial also for negative resolution.

Discussion

The following comment is written having first-order logic in mind, which is also the final purpose of the approach of [3].

Infinite search spaces

The complexity of search is measured in [3] by measuring the total size of the search space in the worst-case. The total size of the search space can be measured only if the search space is finite, as in the propositional problems considered in [3]. Thus, the first problem is

The search plan

Since the intent of [3] is to measure the total size of the search space, and the latter depends on the inference system, not on the search plan, the model of search in [3] does not represent the search plan and its application. However, the search plan is important in the practice of first-order theorem proving. A prover may find a proof with a certain search plan and run out of memory with another, or find a proof hours later. Therefore, the following questions remain open: