My research in reinforcement learning

This page is subject to on-going changes.

These days, I am studying the following points in reinforcement learning:

Very brief background on reinforcement learning

In the reinforcement problem, an agent learns how to behave in its environment. This is an extremely difficult problem, far from being solved in general.
Reinforcement learning takes its roots in the study of the dynamics of the behavior of living beings. In 1896, Thorndike formulates the law of effect which basically states that, in any animal, the probability of emission of a behavior that is followed by favorable consequences increases, and conversely, if it is favorable by bad consequences, its probability of emission decreases. This principle seems obvious to most of us; however, when followed accurately to its diverse consequences, lots of us disagree with them. Anyway, this may be the topic of excellent conversations but, as a researcher in computer science, this goes beyond my expected professional abilities.

As a computer scientist, my interest is that this very simple law of the dynamics of behavior gives some insight to how one might try to solve the reinforcement learning problem.

Much more background information may be found on the web, in particular on Rich Sutton's team website.

How I got involved in reinforcement learning

Actually, I got interested in reinforcement learning thanks to Samuel Delepoulle whom I had the pleasure to advise during the completion of his PhD in psychology. He had this excellent idea to use reinforcement learning algorithm (Q-learning to be precise) to model an arm acquiring a reaching movement (see this paper and my publication page for more details on this work).
To be a little bit more precise, the arm is made of 2 segments and each joint (the shoulder and the elbow) is controlled by a couple of antagonistic muscles. Each muscle is actually controlled by a Q-learning. This set of Q-learners has to learn to behave in order to reach a certain target with the hand. The nice thing is that there is absolutely no overall control of the 4 muscles and, still, the task is performed.
We then tried to simulate things that are more elaborate such as two arms connected to a trunk (10 Q-leaners in this case). Again, we succeeded and an other nice thing here is that despite this 5-fold increase in the number of agents, the time (wall-clock) for learning the task was approximately the same, which means that each agents learnt this task 5 times faster than the agents of the single arm!
We continue on this line of work, aiming at simulating eye tracking (see DYNAPP project). In this ongoing work, we use function approximator rather than crude tabular algorithms.

What I like in reinforcement learning (in machine learning in general), is that things may be quite nicely formalized. It often becomes very quickly very technical mathematically speaking, but still, intellectually, that's nice. Having been trained as a mere computer scientist, these mathematical developments generally lead me right to things I have never been taught during my studies. But at least, this formal aspect provides a path along which we are guided towards good practices, good ways of thinking (would we also get trapped in these ways of thinking... I do not know).
For a certain amount of time (and this effort still goes on), I have thus been doing my best effort to understand some of the mathematics underlying machine learning in general, and reinforcement learning in particular. Things are made quite more complicated (but intellectually beautified) by the fact that different branches of maths are required: statistics, stochastic processes, functional analysis to name the most important. This effort led me to go beyond the superficialities of so-called neural netowrks and other exotic things of computer science, and concentrate on what these things really are, that is, what these things are as computing machineries, understand why they are prone to do things that are different from what we want them to, and how to remedy these apparent deficiencies (as I keep on telling to my students: a computer is always, and only, doing what you are asking it to do).

Valid XHTML 1.0! Valid CSS!