Harry Seip - Learning Recognizers from Experience

title:	Learning Recognizers from Experience
author:	Harry Seip
published in:	February 2007
appeared as:	Master of Science thesis Man-machine interaction group Delft University of Technology
	PDF (3817 KB)

Abstract

Learning through interaction forms a foundational aspect of learning. To construct powerful autonomous agents we need to ground their knowledge in (verifiable) observations and predictions over these observations. We call these observations experience. Reinforcement learning is a computational approach that defines problems and solutions for learning through interaction by goal-oriented agents. We provide a survey of this field as well as the mathematical foundations of reinforcement learning, Markov Decision processes. A key step towards making reinforcement learning easier to use in real world applications is the ability to use data efficiently, as well as to use data sets generated off-line in the learning process. Off-policy learning algorithms aim to achieve this goal.
Recently, [Precup et al., 2005] proposed a framework called recognizers, which facilitates off-policy learning. Recognizers can be viewed as filters on actions, which can be applied to a data set in order to learn expected returns for different policies. In the work so far, recognizers were hand specified. In this thesis, we present the first results on learning such recognizers from data. The main idea of the approach is based on eliminating from the recognition function actions that are deemed suboptimal.
We provide theoretical results regarding the convergence in the limit to the optimal policy, as well as PAC-style convergence guarantees. We implemented our model and tested our system in different environments and under varying conditions. We provide empirical results illustrating the advantages of this approach.