Metric Assumption 1: Does Performance Vary between Planners When Run on Different Hardware Platforms?

Next: Metric Assumption 2: Do Up: Performance Metrics Previous: Performance Metrics

Metric Assumption 1: Does Performance Vary between Planners When Run on Different Hardware Platforms?

Often when a planner is run at a competition or in someone else's lab, the hardware and software platforms differ from the platform used during development. Clearly, slowing down the processor speed should slow down planning, requiring higher cut-offs. Reduction in memory may well change the set of problems that can be solved or increase the processing time due to increased swapping. Changing the hardware configuration may change the way memory is cached and organized, favoring some planners' internal representations over others. Changing compilers could also affect the amount and type of optimizations in the code. The exact effects are probably unknown. The assumption is that such changes affect all planners more or less equally.

To test this, we ran the planners on a less powerful, lower memory machine and compared the results on the two platforms: the base Sun Ultrasparc 10/440 with 256mb of memory and Ultrasparc 1/170 with 128mb of memory. The operating system and compilers were the same versions for both machines. The same problems were run on both platforms. We followed much the same methodology as in the comparison of planner versions: comparing on both number of problems solved and time to solution. Table 11 shows the results as measured by problems solved, failed or timed-out for each planner on the two platforms.

Table 11: Number of problems solved, failed and timed-out for each planner on the two hardware platforms. Last column is the percentage reduction in the number solved from the faster to slower platforms.

Planner	Platform	Solved	Failed	Timed-Out	$\chi ^2$	p	% Reduction
A	Ultra 1	94	383	27
	Ultra 10	95	389	20	1.09	.58	1
B	Ultra 1	121	346	37
	Ultra 10	121	353	30	0.80	.67	0
C	Ultra 1	354	7	143
	Ultra 10	367	7	130	0.85	.65	4
D	Ultra 1	218	59	227
	Ultra 10	217	59	228	0.01	.998	-.4
E	Ultra 1	280	145	79
	Ultra 10	284	150	70	0.66	.72	1
F	Ultra 1	277	155	72
	Ultra 10	284	154	66	0.35	.84	2
G	Ultra 1	120	347	37
	Ultra 10	121	352	31	0.57	.75	1
H	Ultra 1	116	350	38
	Ultra 10	122	338	44	0.80	.67	7
I	Ultra 1	265	201	38
	Ultra 10	274	201	29	1.36	.51	3
J	Ultra 1	280	220	4
	Ultra 10	285	217	2	0.73	.69	2
K	Ultra 1	108	370	26
	Ultra 10	108	368	28	0.08	.96	0
L	Ultra 1	149	339	16
	Ultra 10	150	341	13	0.32	.85	1
M	Ultra 1	250	65	189
	Ultra 10	258	66	180	0.35	.84	3

As before, we also looked at change in time to solution. Table 12 shows how the time to solution changes for each planner. Not surprisingly, faster processor and more memory nearly always lead to better performance. Somewhat surprisingly, the difference is far less than the doubling that might be expected; the mean differences are much less than the mean times on the faster processor (see Table 10 for the mean solution times).

Table 12: Improvements in execution speed moving from slower to faster platform. Counts only problems that were solved on both platforms. For faster and slower, the mean and standard deviation (Sd) of difference is also provided.

Planner	Faster			Slower			Same	Total
	#	Mean $\Delta$	Sd $\Delta$	#	Mean $\Delta$	Sd $\Delta$
A	92	5.18	30.76	1			1	94
B	120	4.02	10.01	0			1	121
C	294	31.89	101.71	60	0.29	0.14	0	354
D	177	11.02	82.82	39	0.23	0.14	1	217
E	275	2.68	12.27	1			4	280
F	271	14.86	72.44	0			6	277
G	117	5.02	17.17	1			2	120
H	115	6.86	25.24	0			1	116
I	261	25.73	119.97	0			4	265
J	280	42.24	138.16	0			0	280
K	107	15.26	75.42	0			1	108
L	148	16.81	98.54	1			0	149
M	194	32.72	139.73	56	0.30	0.18	0	250

Also, the effect seems to vary between the planners. Based on the counts, the Lisp-based planners appear to be less susceptible to this trend (the only ones that sometimes were faster on the slower platform). However, the advantages are very small, affecting primarily the smaller problems. We think that this effect is due to the need to load in a Lisp image at startup from a centralized server; thus, computation time for small problems will be dominated by any network delay. Older versions of planners appear to be less sensitive to the switch in platform.

In this study, the platforms make little difference to the results, despite a more than doubling of processor speed and doubling of memory. However, the two platforms are underpowered when compared to the development platforms for some of the planners. We chose these platforms because they differed in only a few characteristics (processor speed and memory amount) and because we had access to 20 identically configured machines. To really observe a difference, 1GB⁹ of memory or more may be needed.

Recent trends in planning technology have exploited cheap memory: translations to propositional representations, compilation of the problems and built-in caching and memory management techniques. Thus, some planners are designed to trade-off memory for time; these planners will understandably be affected by memory limitations for some problems. Given the results of this study, we considered performing a more careful study of memory by artificially limiting memory for the planners but did not do so because we did not have access to enough sufficiently large machines to likely make a difference and because we could not devise a scheme for fairly doing so across all the planners (which are implemented in different languages and require different software run-time environments).

Another important factor may be memory architecture/management. Some planners include their own memory managers, which map better to some hardware platforms than to others (e.g., HSP uses a linear organization that appears to fit well with Intel's memory architecture).

Next: Metric Assumption 2: Do Up: Performance Metrics Previous: Performance Metrics