Action Evaluation

The simplest procedure to get the estimated value for actions is a brute-force approach consisting of the independent evaluation of each one of them. In simple cases, this approach would be enough but, when the number of valid combinations of elementary actions (i.e., of actions) is large, the separate evaluation of each action would take long time, increasing the time of each robot decision and decreasing the reactivity of the control. To avoid this, Appendix B presents a more efficient procedure to get the value of any action.

Figure 13: Action Evaluation procedure.
\begin{figure}{\small \begin{center} \fbox{\parbox{18cm}{ \begin{tabbing}i :\= i... ...n_w,\epsilon_w) $\\ \>\>{\bf endfor} \end{tabbing}}} \end{center}} \end{figure}

Figure 13 summarizes the action-evaluation procedure using partial rules. The reward for each action is guessed using the most relevant rule for this action (i.e., the winner rule). This winner rule is computed as

winner $ (C',a)=$arg $ \underset{\forall w \in C'(a)}{\max}\{\rho_w\}$,
where $ \rho_w$ is the relevance of rule $ w$

$\displaystyle \rho_w = \frac{1}{1+\epsilon_w}.$    

The reward estimation using the winner rule is selected at random (uniformly) from the interval

$\displaystyle I_w = [ q_w - 2 \epsilon_w,q_w + 2 \epsilon_w ],$    

with

$\displaystyle \epsilon_w = e_w \: c_w + \overline{e} \: (1-c_w).$    

Here, $ \overline{e}$ is the average error on the reward prediction (i.e., the reward error prediction of the empty rule, $ w_\emptyset$).

Josep M Porta 2005-02-01