Configuration

Configuration is an area where CSP technology has been particularly effective. A configuration problem can be viewed as trying to specify a product defined by a set of attributes, where attribute values can only be combined in predefined ways. Such problems can be modelled as CSPs, where variables correspond to attributes, the domains of the variables correspond to the possible values of the attributes, and constraints specify the predefined ways in which values can be combined. In many configuration problems the constraints are expressed extensionally through lists of allowed (or disallowed) combinations of values. Alternatively, constraints are expressed as rules which can easily be transformed into an extensional representation. Consider the following example adapted from the paper by Subbarayan, Jensen, Hadzic, Andersen, Hulgaard & Moller subbarayan04.

Example 6.1   The configuration of a T-shirt requires that we specify the size (small, medium, or large), the print (``Men in Black'' - MIB or ``Save the Whales'' - STW), and the color (black, white, or red). There are the following constraints: 1) If the small size is chosen then the STW print cannot be selected. 2) If the MIB print is chosen then the black color has to be chosen as well, and if the STW print is chosen then the black color cannot be selected. This configuration problem can be modelled as a CSP with three variables $\{x_1,x_2,x_3\}$ representing size, print, and color respectively. The domains of the variables are $D(x_1)=\{small,medium,large\}$, $D(x_2)=\{MIB,STW\}$, and $D(x_3)=\{black,white,red\}$. The first constraint is a binary constraint between variables $x_1$ and $x_2$ with the following allowed tuples: $\{<small,MIB>,<medium,MIB>,<medium,STW>,<large,MIB>,<large,STW>\}$. The second constraint is a binary constraint between variables $x_2$ and $x_3$ with the following allowed tuples: $\{<MIB,black>,<STW,white>,<STW,red>\}$.

In practice, many solvers for configuration problems are able to interact with the user so that, apart from meeting the given specifications, the user's choices of values for certain attributes are also satisfied. In this study we use configuration instances to compare the non-binary representation to binary encodings on structured realistic problems. Although it would be interesting to investigate the applicability of binary encodings in an interactive configurator, such work is outside the scope of this paper.

We run experiments on five problems taken from CLib, a library of benchmark configuration problems [CLibCLib2005]. The first thing we noticed after encoding the five problems as binary and non-binary CSPs is that they are trivially solvable by all algorithms without backtracking. A closer look at the structure of CLibs's problems revealed the reason; their constraint graphs consist of various unconnected components. Each component consists of very few or, in some cases, a single variable. As a result, the problems are split into independent subproblems that are trivially solved by all algorithms. In order to obtain difficult instances for benchmarking, we made the graphs connected by adding some randomness. Each of the five problems was extended by adding to it 6 variables and 8-10 constraints so that the graph became connected8. Table 5 shows the total number of variables and constraints in the modified problems. The added constraints were of arity 2, 3, or 4 (chosen at random) and the variables on which they were posted were selected at random, making sure that the resulting graph was connected. The looseness of each added constraint was also set at random, and finally, the allowed tuples of each constraint were chosen at random according to its looseness.


Table 5: Comparison of algorithms on configuration problems. $arity$ and $dom$ are the maximum constraint arity and maximum domain size in the problem. Run times are given in seconds.
 problem   $n$   $e$   $arity$   $dom$   MGAC-2001   MAC-PW-AC   MAC-PW-ACd  
                     nodes - time   nodes - time   nodes - time  
 machine   30   22   4   9   535874 - 13.83   813 - 0.37   3367 - 1.64  
 fx   24   21   5   44   193372 - 4.10   92 - 0.01   70 - 0.01  
 fs   29   18   6   51   618654 - 30.43   41 - 0.03   193 - 0.05  
 esvs   33   20   5   61   9960160 - 332.52   7384 - 3.09   64263 - 29.86  
 bike   43   28   6   37   21098334 - 501.85   16890 - 12.23   112957 - 87.77  



Table 5 gives the average run times and node visits for algorithms MGAC-2001 in the non-binary representation, MAC-PW-AC in the DE, and MAC-PW-ACd in the double encoding. For each of the five benchmarks we repeatedly generated instances using the model described above. Each generated instance was solved by all three algorithms and was stored if the instance was hard for at least one algorithm. Otherwise, it was discarded. An instance was considered hard if at least one algorithm took more than one second to solve it. Table 5 reports averages over the first 50 hard instances generated from each benchmark. That is, we run 250 hard instances in total. Note that in the binary encodings all constraints of the original problem (even binary ones) were encoded as dual variables.

The experimental results of Table 5 show a very significant advantage in favor of the binary encodings compared to the non-binary representation, both in node visits and run times. The DE is clearly the most efficient model. MAC-PW-AC in the DE can be up to three orders of magnitude faster than MGAC-2001 in the non-binary representation. There was not a single instance among the 250 instances where MGAC-2001 was faster than MAC-PW-AC. The double encoding is also much more efficient than the non-binary representation. The main factor contributing to the performance of the encodings is the strong propagation that is achieved through the constraints between dual variables, which is reflected on the numbers of node visits. There is a number of reasons, related to the structure of these configuration problems, that can justify the strong performance of the encodings:

Note that the ``profile'' of configuration problems, as analyzed above, agrees with the conjectures we made based on results from random problems. That is, the dual and double encodings are suitable for sparse problems with tight constraints, where intersecting constraints may share more than one variable.

Nikolaos Samaras 2005-11-09