|title:||Learning Recognizers from Experience|
|published in:||February 2007|
Master of Science thesis
Man-machine interaction group
Delft University of Technology
|PDF (3817 KB)|
Learning through interaction forms a foundational aspect of learning. To construct
powerful autonomous agents we need to ground their knowledge in (verifiable) observations
and predictions over these observations. We call these observations experience.
Reinforcement learning is a computational approach that defines problems and solutions
for learning through interaction by goal-oriented agents. We provide a survey of
this field as well as the mathematical foundations of reinforcement learning, Markov
Decision processes. A key step towards making reinforcement learning easier to use
in real world applications is the ability to use data efficiently, as well as to use data
sets generated off-line in the learning process. Off-policy learning algorithms aim to
achieve this goal.
Recently, [Precup et al., 2005] proposed a framework called recognizers, which facilitates off-policy learning. Recognizers can be viewed as filters on actions, which can be applied to a data set in order to learn expected returns for different policies. In the work so far, recognizers were hand specified. In this thesis, we present the first results on learning such recognizers from data. The main idea of the approach is based on eliminating from the recognition function actions that are deemed suboptimal.
We provide theoretical results regarding the convergence in the limit to the optimal policy, as well as PAC-style convergence guarantees. We implemented our model and tested our system in different environments and under varying conditions. We provide empirical results illustrating the advantages of this approach.