B Their interaction is denoted by Aξ ⊗ A. It is described by the associated Markov chain (S, P ) (see the end of Sec. 2, Chap. 1). This chain can be decomposed, generally speaking, into the ergodic classes and the set of inessential states. Let S1 , . . , Sh , S0 be the subsets of S corresponding to this decomposition. The main demand is: each set Sj should contain the elements from all sets S (i) of the decomposition (1), in other words, all actions u1 , . . , uk can be used. We form h automata Aj = (X, Sj , U ; Πj (x)), j = 1, .

Zt−1 ; ut−l , . . , ut−2 ) and respectively. (zt−l+1 , . . , zt ; ut−l+1 , . . , ut−1 ) ch01 October 22, 2005 10:12 WSPC/SPI-B324-Mathematical Theory of Adaptive Control (Rok Ting) 26 Basic Notions and Deﬁnitions In controlling the CRP the system Eσ generates the next value of the process xt (it is deﬁned by the conditional distribution of the model µt ) by using the control ut−1 . Then this value is transformed into the observable value zt . It is the input signal of the system Eσ and, further, according to the rule σ (l) the next control ut is generated and so on.

E. to ﬁnd the collection of control choice rules (σ0 , σ1 , . . , σT −1 ) which maximize Wθ (σ). To construct the optimal control we shall use the Bayesian approach to the problems with unknown parameters. The main hypothesis is the following: H: There exists a priori distribution of the parameter θ denoted by F (θ) that is supposed to be known. For simplicity, we shall assume that the distribution F (θ) has the density f (θ). Then according to Bayes’ Theorem the posterior distribution densities of the parameter θ denoted by f (θ)(1) , .