In the statistics-update procedure (Figure 14), and are adjusted for all rules that were active in the previous time step and proposed a partial command in accordance with (the last executed action).
Both and are updated using a learning rate () computed using the MAM function, which initially is 1, and consequently, the initial values of and have no influence on the future values of these variables. These initial values become relevant when using a constant learning rate, as many existing reinforcement-learning algorithms do.
If the observed effects of the last executed action agree with the current estimate interval for the reward (), then the confidence index is increased by one unit. Otherwise, the confidence index is decreased allowing a faster adaptation of the statistics to the last obtained, surprising values of reward.