next up previous
Next: Evaluating our Abstraction Techniques Up: Experimental Results Previous: Enhanced FF


Evaluating Macros in the Competition System

In this subsection we evaluate how SOL-EP macros can improve performance in the competition system. We compare the planner with implementation enhancements against the planner with both implementation enhancements and SOL-EP macros.

For each of the seven test domains, we show the number of expanded nodes and the total CPU time, again, on a logarithmic scale. A CPU time chart shows no distinction between a problem solved very quickly (within a time close to 0) and a problem that could not be solved. To determine what the case is, check the corresponding node chart, where the absence of a data point always means no solution.

Figure 17 summarizes the results in Satellite, Promela Optical Telegraph, and Promela Dining Philosophers. In Satellite and Promela Optical Telegraph, macros greatly improve performance over the whole problem sets, allowing MACRO-FF to win these domain formulations in the competition. In Promela Optical Telegraph macros led to solving 12 additional problems. The savings in Promela Dining Philosophers are limited, resulting in one more problem solved.

Figure 18 shows the results in the ADL version of Airport. The savings in terms of expanded nodes are significant, but they have little effect on the total running time. In this domain, the preprocessing costs dominate the total running time.

The complexity of preprocessing in Airport also limits the number of solved problems to 21. The planner can solve more problems when the STRIPS version of Airport is used, but no macros could be generated for this domain version. STRIPS Airport contains one domain definition for each problem instance, whereas our learning method requires several training problems for a domain definition.

Figure 19 contains the results in Pipesworld Non-Temporal No-Tankage, Pipesworld Non-Temporal Tankage, and PSR. In Pipesworld Non-Temporal No-Tankage, macros often lead to significant speed-up. As a result, the system solves four new problems. On the other hand, the system with macros fails in three previously solved problems. The contribution of macros is less significant in Pipesworld Non-Temporal Tankage. The system with macros solves two new problems and fails in one previously solved instance. Out of all seven benchmarks, PSR is the domain where macros have the smallest impact. Both systems solve 29 problems using similar amounts of resources. In the competition official run, MACRO-FF solved 32 problems in this domain formulation.

Figure 17: Comparison of our enhanced version of FF with and without macros in Satellite (36 problems), Promela Optical Telegraph (48 problems) and Promela Dining Philosophers (48 problems).
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-satellite-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-satellite-cpu.ps}}
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-optical-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-optical-cpu.ps}}
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-philosophers-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-philosophers-cpu.ps}}

Figure 18: Comparison of our enhanced version of FF with and without macros in Airport (50 problems in total).
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-airport-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-airport-cpu.ps}}

Figure 19: Comparison of our enhanced version of FF with and without macros in Pipesworld No-Tankage Non-Temporal, Pipesworld Tankage Non-Temporal and PSR (50 problems for each domain).
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-pipesnn-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-pipesnn-cpu.ps}}
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-pipestn-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-pipestn-cpu.ps}}
\resizebox{75mm}{!}{\includegraphics{ffenh-macros-psr-nodes.ps}} \resizebox{75mm}{!}{\includegraphics{ffenh-macros-psr-cpu.ps}}


Table 3: Summary of training in each domain. TP is the number of training problems and TT is the total training time in seconds. The last column shows the macros selected for each domain. For simplicity, we do not show the variable mapping of each macro.
Domain TP TT Macros
Airport 10 365 MOVE MOVE
PUSHBACK MOVE
PUSHBACK PUSHBACK
MOVE TAKEOFF
Promela Optical Telegraph 5 70 QUEUE-WRITE ADVANCE-EMPTY-QUEUE-TAIL
ACTIVATE-TRANS QUEUE-WRITE
ACTIVATE-TRANS ACTIVATE-TRANS
PERFORM-TRANS ACTIVATE-TRANS
Promela Dining Philosophers 6 10 ACTIVATE-TRANS QUEUE-READ
ACTIVATE-TRANS ACTIVATE-TRANS
QUEUE-READ ADVANCE-QUEUE-HEAD
Satellite 10 8 TURN-TO SWITCH-ON
SWITCH-ON TURN-TO
SWITCH-ON CALIBRATE
TURN-TO TAKE-IMAGE
TURN-TO CALIBRATE
TAKE-IMAGE TURN-TO
Pipesworld Non-Temporal No-Tankage 10 250 POP-START POP-END
PUSH-START PUSH-END
PUSH-START POP-START
Pipesworld Non-Temporal Tankage 5 4,206 PUSH-START PUSH-END
PUSH-START POP-END
PUSH-END POP-START
POP-END PUSH-START
PUSH-END PUSH-START
PUSH-START POP-START
POP-START PUSH-START
PSR 10 1,592 AXIOM AXIOM
CLOSE AXIOM


Table 3 shows the number of training problems, the total training time, and the selected macros in each domain. The training phase uses 10 problems for each of Airport, Satellite, Pipesworld Non-Temporal No-Tankage, and PSR. We reduced the training set to 5 problems for Promela Optical Telegraph, 6 problems for Promela Dining Philosophers, and 5 problems for Pipesworld Non-Temporal Tankage. In Promela Optical Telegraph, the planner with no macros solves 13 problems, and using most of them for training would leave little room for evaluating the learned macros. The situation is similar in Promela Dining Philosophers; the planner with no macros solves 12 problems. In Pipesworld Non-Temporal Tankage, the smaller number of training problems is caused by both the long training time and the structure of the competition problem set. The first 10 problems use only a part of the domain operators, so we did not include these into the training set. Out of the remaining problems, the planner with no macros solves 11 instances. The large training times in Pipesworld Non-Temporal Tankage and PSR are caused by the increased difficulty of the training problems.


next up previous
Next: Evaluating our Abstraction Techniques Up: Experimental Results Previous: Enhanced FF
Adi Botea 2005-08-01