Reinforce Learning in Real Robots

This demonstration aims to show:

RL will save our programming cost.
RL may find better control rules than heuristic control rules, especially in uncertain and/or unknown environments that are hard to deal with for human experts.

Figure A is Type-A robot that has a two-link arm.
Figure B is Type-B robot whose body has two joints; one is to bend the body, another is to twist the body.
Both have two servo motors.
Each robot has a wheel sensor to detect the body's movement.
The goal is to learn control rules to move to the front, so the body's movement for each time step is given for the agent as an instantaneous reward.

Although the mechanics between the robot A and B are absolutely different, the input-output (state-action) for the controllers are the same.

In this demonstration, we use the same RL algorithm for both robots in order to explain "RL will save our programming cost". Although the RL algorithms are the same, obtained control rules are different and each robot learns appropreately. We adopt an actor-critic algorithm since the state and action are continuous.
The state input is of two dimensional that represents joints' angles (theta 1 and theta 2), and the action output is also two dimensional that is a destination angles of the servo motors. But, these robots are so cheap, we use the executed action in the previous step as a substitute for the current state input. Therefore, a hidden-state problem occurs when the servo motors fail moving the joints to the destination in time.
The decision interval is about 0.2 sec.

Examples of the obtained behavior. (ROBOTSM.jpg 640x792, 119KBytes)

MPEG movies

The behavior of the ROBOT-A.
Small size: ROBOT_As.mpg (176x112 MPEG1 movie, 4.82MB)
Large size: ROBOT_A.mpg (352x240 MPEG1 movie, 22.2MB)

ROBOT-B. The parameters are the same as the ROBOT_A.
Small size: ROBOT_Bs.mpg (176x112 MPEG1 movie, 2.2MB)
Large size: ROBOT_B.mpg (352x240 MPEG1 movie, 9.92MB)
Animation gif(65.7KB)

Paper(s)

Hajime Kimura, Shigenobu Kobayashi:
Reinforcement Learning using Stochastic Gradient Algorithm and its Application to Robots,
The Transaction of the Institute of Electrical Enginners of Japan, Vol.119, No.8 (1999) (in Japanese, sorry)
Although this paper is written in Japanese, you may easily follow the mathematics.
4 pages, postscript file, denki99.ps (Japanese fonts are included in the file.)
PDF file, denki99.pdf

If you have any comments, please mail to

Reinforce Learning in Real Robots

MPEG movies

Paper(s)

homepage