On Robot Learning

Posts

Policy Gradients

May 22, 2018

As I mentioned before, we want to use policy gradient methods because: We're interested in controlling robot systems that live in a continuous world Model-Based policy gradient methods have the potential of data-efficiency Within policy gradient methods there are two main categories: model-free and model-based . The main difference between these two is the way the expectation over trajectories is evaluated. Model-free methods do not make any assumption about the dynamics $P(s' | s, a)$, other than the ones made by the MDP framework. The most popular version of model-free methods are based on the score-function or likelihood-ratio trick (see this blog entry by Shakir Mohammed for more). Taking the gradient of $V(\pi_\theta)$ (following the notation from this paper by Jie Tang ) \[ \begin{align*} \nabla_\theta V(\pi_\theta) &= \nabla_\theta \mathbb{E}_{P(\tau|\theta)}\left\lbrace R(\tau) \right\rbrace \\ &= \int \nabla_\theta P(\tau|\theta) R(\tau) \mathrm...

Some formalities

May 22, 2018

The usual formalization used in RL is to treat tasks as Markov Decision Processes (MDP). The main ingredients of an MDP are: A set of states $s \in \mathcal{S}$ that describe the possible configurations of the environment A set of actions $a \in \mathcal{A}$ that can be applied at each state Transition dynamics $P(s' | s, a)$ that establish how actions change the state of the environment An instantaneous reward $P(r|s)$ function for evaluating states; e. g. desired states have high rewards. An initial state distribution $P(s_0)$, which tells us in which states the agent is likely to start. A time horizon $H$ during which the behaviour of the agent is going to be evaluated Time is assumed to evolve in discrete steps. The objective of the agent is to maximize the reward accumulated over the horizon . Solutions to the problem determine which actions should be applied at any given state. This mapping from states to action is called a policy , usually wri...

Introduction to Model Based RL for robotics

May 22, 2018

Its' been a while since I wanted to start a blog about the stuff I've been working on for the past few years. Today, I encountered the opportunity to do so: I want to re-write an simplify some of my research code base to enable so of the experiments I'd like to do. The work I've been doing is on the application of Reinforcement Learning to robotics. Reinforcement Learning (RL) has been shown to produce computer programs that beat experts in video games and board games, that control complex robotics systems, or produce believable physics based simulations of articulated characters. RL can be seen as a meta-programming paradigm where computer software changes itself as it interacts with the world, via trial an error. Under this paradigm, a computer programmer writes code that encodes the way the software should change according to its experience; i.e. its learning rules and the objective it is supposed to achieve. The software agent is allowed to measure the state of...

Search This Blog

On Robot Learning

Posts

More motivation on why we want models

Policy Gradients

Some formalities

Introduction to Model Based RL for robotics