Appendix C: Notation

Uppercase are used for sets, and Greek letters represent parameters of the algorithms.

$ S$ Set of states.
$ s,s'$ Individual states. Full views.
$ n_s$ Number of states.
$ FD=\{$fd$ _i \: \vert \: i=1..n_f\}$ Set of feature detectors.
$ v($fd$ _{i_1}, \ldots,$fd$ _{i_k})$ Partial view of order $ k$.
$ A$ Set of actions of the robot.
$ n_a$ Number of actions.
$ EA=\{ea_i \: \vert \: i=1..n_e\}$ Set of elementary actions.
$ n_m$ Number of motors of the robot.
$ ea_i=(m_i \leftarrow k)$ Elementary action that assigns value $ k$ to motor $ m_i$.
$ c(ea_{i_1}, \ldots, ea_{i_k})$ Partial command of order $ k$.
$ a=(ea_{1},\ldots,ea_{n_m})$ Action. Combination of elementary actions. Full command.
$ w=(v,c)$ Partial rule composed by partial view $ v$ and partial command $ c$.
$ w_{\emptyset}$ The empty partial rule.
$ w_{1} \oplus w_{2}$ Composition of two partial rules.
$ C=\{w_i \: \vert \: i=1..n_r\}$ Controller or set of partial rules.
$ \mu$ Maximum number of elements of $ C$.
$ C',C'_{ant}$ Subset of rules active at a given time step and at the previous one.
$ C'(a)$ Active rules with a partial command in accordance with $ a$.
$ q_w$ Expected value of the partial rule $ w$.
$ e_w$ Expected error in the value estimation of the partial rule $ w$.
$ \overline{e}$ Average error in the value prediction.
$ i_w$ Confidence index.
$ c_w$ Confidence on the statistics of the partial rule $ w$.
$ \beta $ Top value of the confidence.
$ \eta $ Index where the confidence function reaches the value $ \beta $.
$ \epsilon_w = e_w \: c_w + \overline{e} \: (1-c_w)$ Error in the return prediction of the partial rule $ w$.
$ \rho_w=1/(1+\epsilon_w)$ Relevance of rule $ w$.
$ I_w=[q_w \pm 2 \epsilon_w]$ Value interval of the partial rule $ w$.
$ m_w$ Updating ratio for the statistics of the partial rule $ w$.
$ \alpha$ Learning rate. Top value of $ m_w$.
$ U(w)$ Number of times rule $ w$ has been used.
$ winner(C',a)$ Most relevant active partial rule w.r.t. action $ a$.
$ guess(a)$ Most reliable value estimation for action $ a$.
$ r_a$ Reward received after the execution of $ a$.
$ \gamma$ Discount factor.
$ v$ Goodness of a given situation.
$ q=r_a+\gamma v$ Value of executing action $ a$ in given situation.
$ \tau$ Number of new partial rules created at a time.
$ \lambda$ Redundancy threshold used for partial-rule elimination.

Josep M Porta 2005-02-17