Direct policy search: intrinsic vs. extrinsic perturbations
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Direct policy search : intrinsic vs. extrinsic perturbations. / Heidrich-Meisner, V.; Igel, Christian.
Workshop New Challenges in Neural Computation . ed. / B. Hammer; T. Villmann. 2010. p. 33-39 (Machine Learning Reports, Vol. 04/2010).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Direct policy search
AU - Heidrich-Meisner, V.
AU - Igel, Christian
PY - 2010
Y1 - 2010
N2 - Reinforcement learning (RL) is a biological inspired learning paradigm based on trial-and-error learning. A successful RL algorithm has to balance exploration of new behavioral strategies and exploitation of already obtained knowledge. In the initial learning phase exploration is the dominant process. Exploration is realized by stochastic perturbations, which can be applied at different levels. When considering direct policy search in the space of neural network policies, exploration can be applied on the synaptic level or on the level of neuronal activity. We propose neuroevolution strategies (NeuroESs) for direct policy search in RL. Learning using NeuroESs can be interpreted as modelling of extrinsic perturbations on the level of synaptic weights. In contrast, policy gradient methods (PGMs) can be regarded as intrinsic perturbation of neuronal activity. We compare these two approaches conceptually and experimentally.
AB - Reinforcement learning (RL) is a biological inspired learning paradigm based on trial-and-error learning. A successful RL algorithm has to balance exploration of new behavioral strategies and exploitation of already obtained knowledge. In the initial learning phase exploration is the dominant process. Exploration is realized by stochastic perturbations, which can be applied at different levels. When considering direct policy search in the space of neural network policies, exploration can be applied on the synaptic level or on the level of neuronal activity. We propose neuroevolution strategies (NeuroESs) for direct policy search in RL. Learning using NeuroESs can be interpreted as modelling of extrinsic perturbations on the level of synaptic weights. In contrast, policy gradient methods (PGMs) can be regarded as intrinsic perturbation of neuronal activity. We compare these two approaches conceptually and experimentally.
M3 - Article in proceedings
T3 - Machine Learning Reports
SP - 33
EP - 39
BT - Workshop New Challenges in Neural Computation
A2 - Hammer, B.
A2 - Villmann, T.
Y2 - 21 September 2010 through 21 September 2010
ER -
ID: 33863042