Statistics Update

In the statistics-update procedure (Figure 14), $ q_w$ and $ e_w$ are adjusted for all rules that were active in the previous time step and proposed a partial command in accordance with $ a$ (the last executed action).

Figure 14: Statistics update procedure.
\begin{figure}{\small \begin{center} \fbox{\parbox{18cm}{ \begin{tabbing}i :\= i... ...line{e} \leftarrow e_{w_{\emptyset}}$ \end{tabbing}}} \end{center}} \end{figure}

Both $ q_w$ and $ e_w$ are updated using a learning rate ($ m_w$) computed using the MAM function, which initially is 1, and consequently, the initial values of $ q_w$ and $ e_w$ have no influence on the future values of these variables. These initial values become relevant when using a constant learning rate, as many existing reinforcement-learning algorithms do.

If the observed effects of the last executed action agree with the current estimate interval for the reward ($ I_w$), then the confidence index is increased by one unit. Otherwise, the confidence index is decreased allowing a faster adaptation of the statistics to the last obtained, surprising values of reward.



Josep M Porta 2005-02-01