DDPG

Deep Deterministic Policy Gradient (DDPG) is an off-policy Actor-Critic algorithm for continuous action space. Since computing the maximum over actions in the target is a challenge in continuous action space, DDPG deals with this using a policy network to compute an action. This implementation provides DDPG and supports the following extensions:

  • Experience replay: ✔️

  • Target network: ✔️

  • Gradient clipping: ✔️

  • Reward clipping: ❌

  • Prioritized Experience Replay (PER): ✔️

  • Ornstein–Uhlenbeck noise: ✔️

Warning

In the DDPG paper, the authors use time-correlated Ornstein-Uhlenbeck Process to add noise to the action output. However, as shown in the later works, the Ornstein-Uhlenbeck Process is an overcomplication that does not have a noticeable effect on performance when compared to uncorrelated Gaussian noise.

Code Snippet

import torch
from elegantrl.run import train_and_evaluate
from elegantrl.config import Arguments
from elegantrl.train.config import build_env
from elegantrl.agents.AgentDDPG import AgentDDPG

# train and save
args = Arguments(env=build_env('Pendulum-v0'), agent=AgentDDPG())
args.cwd = 'demo_Pendulum_DDPG'
args.env.target_return = -200
args.reward_scale = 2 ** -2
train_and_evaluate(args)

# test
agent = AgentDDPG()
agent.init(args.net_dim, args.state_dim, args.action_dim)
agent.save_or_load_agent(cwd=args.cwd, if_save=False)

env = build_env('Pendulum-v0')
state = env.reset()
episode_reward = 0
for i in range(2 ** 10):
    action = agent.select_action(state)
    next_state, reward, done, _ = env.step(action)

    episode_reward += reward
    if done:
        print(f'Step {i:>6}, Episode return {episode_reward:8.3f}')
        break
    else:
        state = next_state
    env.render()

Parameters

class elegantrl.agents.AgentDDPG.AgentDDPG(net_dims: [<class 'int'>], state_dim: int, action_dim: int, gpu_id: int = 0, args: ~elegantrl.train.config.Config = <elegantrl.train.config.Config object>)[source]

DDPG(Deep Deterministic Policy Gradient) “Continuous control with deep reinforcement learning”. T. Lillicrap et al.. 2015.”

net_dims: the middle layer dimension of MLP (MultiLayer Perceptron) state_dim: the dimension of state (the number of state vector) action_dim: the dimension of action (or the number of discrete action) gpu_id: the gpu_id of the training device. Use CPU when cuda is not available. args: the arguments for agent training. args = Config()

Networks

class elegantrl.agents.net.Actor(*args: Any, **kwargs: Any)[source]
class elegantrl.agents.net.Critic(*args: Any, **kwargs: Any)[source]