# Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks

@article{Houthooft2016CuriositydrivenEI, title={Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks}, author={Rein Houthooft and Xi Chen and Yan Duan and John Schulman and Filip De Turck and P. Abbeel}, journal={ArXiv}, year={2016}, volume={abs/1605.09674} }

Scalable and effective exploration remains a key challenge in reinforcement learning (RL. [...] Key Method We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous… Expand

#### 52 Citations

Learning-Driven Exploration for Reinforcement Learning

- Computer Science, Mathematics
- ArXiv
- 2019

Deep reinforcement learning algorithms have been shown to learn complex skills using only high-dimensional observations and scalar reward. Effective and intelligent exploration still remains an… Expand

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

- Computer Science, Mathematics
- ICML
- 2018

This paper presents the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark. Expand

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

- Computer Science, Mathematics
- 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)
- 2018

A curiosity-driven agent is extended to use these predictions of the value function of the next state at any point in time directly for training, and is able to learn significantly faster than the baselines. Expand

Deep Active Inference as Variational Policy Gradients

- Computer Science
- ArXiv
- 2019

This paper proposes a novel deep Active Inference algorithm which approximates key densities using deep neural networks as flexible function approximators, which enables active Inference to scale to significantly larger and more complex tasks. Expand

Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs

- Computer Science
- 2016

This thesis proposes a practical method called trust region policy optimization (TRPO), which performs well on two challenging tasks: simulated robotic locomotion, and playing Atari games using screen images as input. Expand

Model-Based Action Exploration

- Computer Science
- ArXiv
- 2018

This method greatly reduces the space of exploratory actions, increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion. Expand

Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking

- Computer Science
- ArXiv
- 2016

This work introduces an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as $\epsilon$-greedy and Boltzmann exploration. Expand

Dynamics-Aware Unsupervised Skill Discovery

- Computer Science
- ICLR 2020
- 2020

This work proposes an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics, and demonstrates that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, and substantially improves over prior hierarchical RL methods for unsuper supervised skill discovery. Expand

A Bandit Framework for Optimal Selection of Reinforcement Learning Agents

- Mathematics, Computer Science
- ArXiv
- 2019

A multi-arm bandit framework that selects from a set of different reinforcement learning agents to choose the one with the best inductive bias and is able to consistently select the optimal agent after a finite number of steps, while collecting more cumulative reward compared to selecting a sub-optimal architecture or uniformly alternating between different agents. Expand

Learning Actionable Representations with Goal-Conditioned Policies

- Computer Science, Mathematics
- ICLR
- 2019

Representation learning is a central challenge across a range of machine learning areas. In reinforcement learning, effective and functional representations have the potential to tremendously… Expand

#### References

SHOWING 1-10 OF 45 REFERENCES

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

- Computer Science, Mathematics
- ArXiv
- 2015

This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods. Expand

An information-theoretic approach to curiosity-driven reinforcement learning

- Computer Science, Medicine
- Theory in Biosciences
- 2011

It is shown that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Expand

Near-Bayesian exploration in polynomial time

- Mathematics, Computer Science
- ICML '09
- 2009

A simple algorithm is presented, and it is proved that with high probability it is able to perform ε-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. Expand

Generalization and Exploration via Randomized Value Functions

- Mathematics, Computer Science
- ICML
- 2016

The results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization. Expand

Intrinsically motivated model learning for developing curious robots

- Computer Science
- Artif. Intell.
- 2017

Experiments show that combining the agent's intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone, and the applicability of this approach to learning on robots is presented. Expand

Deep Exploration via Bootstrapped DQN

- Computer Science, Mathematics
- NIPS
- 2016

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and… Expand

Exploration from Demonstration for Interactive Reinforcement Learning

- Computer Science
- AAMAS
- 2016

This work presents a model-free policy-based approach called Exploration from Demonstration (EfD) that uses human demonstrations to guide search space exploration and shows how EfD scales to large problems and provides convergence speed-ups over traditional exploration and interactive learning methods. Expand

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

- Computer Science
- NIPS
- 2012

This work provides a "sanity check" theoretical analysis, and provides experimental studies demonstrating the robustness of these exploration measures in cases of non-stationary environments or where original approaches are misled by wrong domain assumptions. Expand

Benchmarking Deep Reinforcement Learning for Continuous Control

- Computer Science, Mathematics
- ICML
- 2016

This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure. Expand

Human-level control through deep reinforcement learning

- Computer Science, Medicine
- Nature
- 2015

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand