Previous: Improving ``Unconscious'' Behaviors Up: A simulation study: Air Battle

Observing Successful Patterns of Interaction in the World

We assumed that the agent does not know about long term consequences of its actions. Furthermore, the reinforcement based learning we described in the previous section assumes a Markovian environment. That is, the agent believes the world changes only due to its own actions. This makes it necessary to observe interactions with the world in order to learn sequences of actions. Over a finite number of actions, when the agent observes a substantially improved situation, chances are he has found a successful Routine. We record such detected Routines and as they re-occur, we increase our confidence in them. When our confidence in a Routine reaches a certain level, a concept is created at the Knowledge level of GLAIR for the routine and from then on, this routine can be treated as a single action at that level.

We plan to explore other learning techniques such as experimentation as a form of learning [Shen 1989]. We are also interested in developing experiments that will help in psychological validation of GLAIR and the learning strategies used in ABS. As of the time of writing ABS is fully operational, but several issues are still being investigated, as noted above.

lammens@cs.buffalo.edu