next up previous
Next: Problems Up: A Critical Assessment of Previous: Planning Competitions and Other


Assumptions of Direct Comparison

A canonical planner evaluation experiment follows the procedure in Table 1. The procedure is designed to compare performance of a new planner to the previous state of the art and highlight superior performance in some set of cases for the new planner. The exact form of an experiment depends on its purpose, e.g., showing superiority on a class of problem or highlighting the effect of some design decision.


Table 1: Canonical comparative planner evaluation experiment.
  1. Select and/or construct a subset of planner domains
  2. Construct problem set by:
    • running large set of benchmark problems
    • selecting problems with desirable features
    • varying some facet of the problem to increase difficulty (e.g., number of blocks)
  3. Select other planners that are:
    • representative of the state of the art on the problems OR
    • similar to or distinct from the new planner, depending on the point of the comparison or advance of the new planner OR
    • available and able to parse the problems
  4. Run all problems on all planners using default parameters and setting an upper limit on time allowed
  5. Record which problems were solved, how many plan steps/actions were in the solution and how much CPU time was required to either solve the problem, fail or time out
  


The protocol depends on three selections: problems, planners and evaluation metrics. It is simply not practical or even desirable to run all available planners on all available problems. Thus, one needs to make informed decisions about which to select. A purpose of this paper is to examine the assumptions underlying these decisions to help make them more informed. Every planner comparison does not adopt every one of these assumptions, but the assumptions are ones commonly found in planner comparisons. For example, those comparisons designed for a specific purpose (e.g., to show scale-up on certain problems or suitability of the planner for logistics problems) will carefully select particular types of problems from the benchmark sets.



Subsections
next up previous
Next: Problems Up: A Critical Assessment of Previous: Planning Competitions and Other
©2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.