
title: | Learning Recognizers from Experience |
author: | Harry Seip |
published in: | February 2007 |
appeared as: |
Master of Science thesis Man-machine interaction group Delft University of Technology |
PDF (3817 KB) |

Abstract
Learning through interaction forms a foundational aspect of learning. To construct
powerful autonomous agents we need to ground their knowledge in (verifiable) observations
and predictions over these observations. We call these observations experience.
Reinforcement learning is a computational approach that defines problems and solutions
for learning through interaction by goal-oriented agents. We provide a survey of
this field as well as the mathematical foundations of reinforcement learning, Markov
Decision processes. A key step towards making reinforcement learning easier to use
in real world applications is the ability to use data efficiently, as well as to use data
sets generated off-line in the learning process. Off-policy learning algorithms aim to
achieve this goal.
Recently, [Precup et al., 2005] proposed a framework called recognizers,
which facilitates off-policy learning. Recognizers can be viewed as filters on
actions, which can be applied to a data set in order to learn expected returns for different
policies. In the work so far, recognizers were hand specified. In this thesis,
we present the first results on learning such recognizers from data. The main idea of
the approach is based on eliminating from the recognition function actions that are
deemed suboptimal.
We provide theoretical results regarding the convergence in the
limit to the optimal policy, as well as PAC-style convergence guarantees. We implemented
our model and tested our system in different environments and under varying
conditions. We provide empirical results illustrating the advantages of this approach.