Conclusions

In this paper, we have introduced the categorizability assumption that states that a robot can be driven to achieve a given task using only simple rules: i.e., rules including a reduced set of feature detectors and elementary actions. This assumption is supported by our experience within the behavior-based approach where controllers are formed by sets of rules with relatively simple conditions and actions. We have shown that a learning algorithm based on the categorizability assumption allows a large speed up in the learning process in many realistic robotic applications with respect to existing algorithms.

To exploit the categorizability assumption both in the observations and action spaces, we have introduced a new representation formalism based on the concept of partial rules and not on the concepts of independent states and independent actions that are the kernel of many existing reinforcement-learning approaches.

The introduction of the partial-rule concept provides a large flexibility on how problems are formalized. With the same structure and algorithms, we can confront problems with generalization in the perception side (usually considered in reinforcement learning), in the action side (usually not considered), or in both of them.

When no generalization is possible at all via partial rules, we have to use complete rules: rules involving all the available inputs and outputs. In this case, the partial-rule approach is equivalent to the non-generalizing reinforcement learning. The algorithm we have presented can, if necessary, generate complete rules and, consequently, it can, in principle, solve any problem that can be solved using a traditional reinforcement-learning algorithm. However, we take the categorizability assumption as valid and so, the generation of complete rules is an extreme case that is only likely to occur in a very limit situation. Therefore, in our approach, we forego generality in order to increase the efficiently of the learning process in the class of problems we want to address.

Another advantage of the partial-rule framework is that it allows the easy and robust introduction of initial knowledge in the learning process in the form of rules that are easily understood by the programmer. This is in contrast with usual reinforcement-learning algorithms where the introduction of initial knowledge is, in general, rather difficult.

In the partial-rule approach, there is a subtle change of emphasis as to the main goal of learning: While in most work in reinforcement learning the emphasis is on learning the value of each action in each state, our main purpose is to learn the relevance of (subsets of) elementary actions and feature detectors. If the relevant subsets of elementary actions and feature detectors are identified, the learning becomes straightforward.

The main limitation of our work is that it is not possible to know a priori (except for trivial cases) whether or not an environment is categorizable by a given robot. Non-generalizing reinforcement learning implicitly assumes that the environment is non-categorizable and that, consequently, all the possible combination of features and actions have to be taken into account separately. Our approach assumes just the opposite: that the environment is categorizable and, so, only reduced combinations of features and actions need to be taken into account. The drawback of using the non-generalizing approach is that robotic tasks become intractable because of the curse of dimensionality. With generalization techniques this problem can be partially alleviated, but not enough in general. In our approach we take a more radical approach in order to be much less affected by the curse of dimensionality: we introduce a strong bias in the learning process to drastically limit the use of combinations of features and actions.

We have tested the partial-rule learning algorithm in many robotic-inspired problems and two of them have been discussed in this paper (landmark based-navigation and six-legged robot gait generation) and the categorizability assumption proved to be valid in all cases we tested. The algorithm out-performs generalizing and non-generalizing reinforcement-learning algorithms in both memory requirements and convergence time. Additionally, we have shown that our approach scales well when the number of inputs increases, while the performance of existing algorithms is largely degraded. This is a very important result that lets us think that it could be possible to use our approach to control more complex robots, while the use of existing approaches has to be discarded.

From the work presented in this paper, we can extract two main proposals. First, to apply reinforcement learning to agents with many sensors and actuators, we should concentrate our efforts in determining the relevance of inputs and outputs and, second, to achieve efficient learning in complex environments it could be necessary to introduce additional assumptions into the reinforcement-learning algorithms, even at the risk of losing generality.

Josep M Porta 2005-02-17