Evaluating our Abstraction Techniques

To evaluate the performance of our two abstraction methods, we compare four setups of MACRO-FF. In all four setups, the planner includes the implementation enhancements described in Section 4. Setup 1 is the planner with no macros. Setup 2 includes CA-ED, the method described in Section 2. Setup 3 uses SOL-EP, the method described in Section 3. Setup 4 is a combination of 2 and 3. Since both methods have benefits and limitations, it is interesting to analyze how they perform when applied together. In setup 4, we first run CA-ED, obtaining an enhanced domain. Next we treat this as a regular domain, and apply SOL-EP to generate a list of SOL-EP macros. Finally, the enhanced planner uses as input the enhanced domain, the list of SOL-EP macros, and regular problem instances.

Since component abstraction can currently be applied only to STRIPS domains with static facts in their formulation, we used as testbeds domains that satisfy these constraints. We ran this experiment on Rovers (20 problems), Depots (22 problems), and Satellite (36 problems). These domains were used in the third international planning competition IPC-3, and Satellite was re-used in IPC-4 with an extended problem set. In our experiments, the Rovers and Depots problems sets are the same as in IPC-3, and the Satellite problem set is the same as in IPC-4.

Figures 20 - 23 and Table 4 summarize our results. The performance consistently improves when macros are used. Interestingly, combining CA-ED and SOL-EP often leads to better performance than each abstraction method taken separately. In Rovers, all three abstraction setups produce quite similar results, with a slight plus for the combined setup. In Depots, CA-ED is more effective than SOL-EP in terms of expanded nodes. The differences in CPU time become smaller, since adding new operators to the original domain significantly increases the cost per node in Depots (see the discussion below). Again, the overall winner in this domain is the combined setup. In Satellite, adding macros to the domain reduces the number of expanded nodes, but has significant impact in cost per node (see Table 4 later in this section) and memory requirements. Setups 2 and 4, which add macros to the original domain, fail to solve three problems (32, 33, and 36) because of the large memory requirements in FF's preprocessing phase.

Table 4 evaluates how macros can affect the cost per node in the search. The cost per node is defined as the total search time divided by the number of evaluated states. A value in the table is the cost per node in the corresponding setup (i.e., CA-ED or SOL-EP) divided by the cost per node in the setup with no macros. For each of the two methods we show the minimum, the maximum, and the average value. When macros are added to the original domain (i.e., the domain is enhanced), the increase in cost per node can be significant. The average rate is 7.70 in Satellite, and 6.06 in Depots. It is interesting to notice that this cost is less than 1 in Rovers. This is an effect of solving a problem with less nodes expanded. Operations such as managing the open list and the closed list have costs that increase with the size of the lists at a rate that can be higher than linear. The right part of the table shows much better values for the cost rate when macros are used in an enhanced planner.

It is important to analyze why macros added as new operators generate such an increase in cost per node in Satellite and Depots. The overhead is mostly present in the relaxed graphplan algorithm that computes the heuristic value of a state. The complexity of this algorithm depends upon the total number of actions that have been instantiated during preprocessing for a given problem. Adding new operators to a domain increases the number of pre-instantiated actions. Since macros tend to have more variables than a regular operator, the corresponding number of instantiations can be significantly larger. Let the action instantiation rate be the number of actions instantiated for a problem when macros are used divided by the number of actions instantiated in the original domain formulation. Our statistics show that the average action instantiation rate is 6.03 in Satellite, 3.20 in Depots, and 1.04 in Rovers.

The results show no significant impact of macro-operators on the solution quality. When macros are used, the length of a plan slightly varies in both directions, with an average close to the value of the original FF system.

Figure 20: Evaluating abstraction techniques in Rovers. We show the number of expanded nodes (top), and the CPU time (bottom).

$\resizebox{140mm}{!}{\includegraphics{roversnodesbars.ps}}$

$\resizebox{140mm}{!}{\includegraphics{roverstimebars.ps}}$

Figure 21: Evaluating abstraction techniques in Depots. We show the number of expanded nodes (top), and the CPU time (bottom).

$\resizebox{140mm}{!}{\includegraphics{depotsnodesbars.ps}}$

$\resizebox{140mm}{!}{\includegraphics{depotstimebars.ps}}$

**Figure 22:** Evaluating abstraction techniques in Satellite. We show the number of expanded nodes.
$\includegraphics[angle=90,width=.95\textwidth]{satellitenodesbars.ps}$

**Figure 23:** Evaluating abstraction techniques in Satellite (continued). We show the CPU time.
$\includegraphics[angle=90,width=.95\textwidth]{satellitetimebars.ps}$

Table 4: Relative cost per node.

Domain	CA-ED			SOL-EP
	Min	Max	Avg	Min	Max	Avg
Depots	3.27	8.56	6.06	0.93	1.14	1.04
Rovers	0.70	0.90	0.83	0.85	1.14	1.00
Satellite	0.98	14.38	7.70	0.92	1.48	1.11