Double DQN

Double Deep Q-Network (Double DQN) is one of the most important extensions of vanilla DQN. It resolves the issue of overestimation via a simple trick: decoupling the max operation in the target into action selection and action evaluation.

Without having to introduce additional networks, we use a Q-network to select the best among the available next actions and use the target network to evaluate its Q-value. This implementation supports the following extensions:

  • Experience replay: ✔️

  • Target network: ✔️

  • Gradient clipping: ✔️

  • Reward clipping: ❌

  • Prioritized Experience Replay (PER): ✔️

  • Dueling network architecture: ✔️

Code Snippet

import torch
from elegantrl.run import train_and_evaluate
from elegantrl.config import Arguments
from elegantrl.train.config import build_env
from elegantrl.agents.AgentDoubleDQN import AgentDoubleDQN

# train and save
args = Arguments(env=build_env('CartPole-v0'), agent=AgentDoubleDQN())
args.cwd = 'demo_CartPole_DoubleDQN'
args.target_return = 195
train_and_evaluate(args)

# test
agent = AgentDoubleDQN()
agent.init(args.net_dim, args.state_dim, args.action_dim)
agent.save_or_load_agent(cwd=args.cwd, if_save=False)

env = build_env('CartPole-v0')
state = env.reset()
episode_reward = 0
for i in range(2 ** 10):
    action = agent.select_action(state)
    next_state, reward, done, _ = env.step(action)

    episode_reward += reward
    if done:
        print(f'Step {i:>6}, Episode return {episode_reward:8.3f}')
        break
    else:
        state = next_state
    env.render()

Parameters

class elegantrl.agents.AgentDoubleDQN.AgentDoubleDQN(net_dim: int, state_dim: int, action_dim: int, gpu_id: int = 0, args: Optional[Arguments] = None)[source]

Double Deep Q-Network algorithm. “Deep Reinforcement Learning with Double Q-learning”. H. V. Hasselt et al.. 2015.

Parameters
  • net_dim – the dimension of networks (the width of neural networks)

  • state_dim – the dimension of state (the number of state vector)

  • action_dim – the dimension of action (the number of discrete action)

  • gpu_id – the gpu_id of the training device. Use CPU when cuda is not available.

  • args – the arguments for agent training. args = Arguments()

get_obj_critic_per(buffer: ReplayBuffer, batch_size: int)[source]

Calculate the loss of the network and predict Q values with Prioritized Experience Replay (PER).

Parameters
  • buffer – the ReplayBuffer instance that stores the trajectories.

  • batch_size – the size of batch data for Stochastic Gradient Descent (SGD).

Returns

the loss of the network and Q values.

get_obj_critic_raw(buffer: ReplayBuffer, batch_size: int)[source]

Calculate the loss of the network and predict Q values with uniform sampling.

Parameters
  • buffer – the ReplayBuffer instance that stores the trajectories.

  • batch_size – the size of batch data for Stochastic Gradient Descent (SGD).

Returns

the loss of the network and Q values.

Networks

class elegantrl.agents.net.QNetTwin(*args: Any, **kwargs: Any)[source]
class elegantrl.agents.net.QNetTwinDuel(*args: Any, **kwargs: Any)[source]

Critic class for Dueling Double DQN.

Parameters
  • mid_dim[int] – the middle dimension of networks

  • state_dim[int] – the dimension of state (the number of state vector)

  • action_dim[int] – the dimension of action (the number of discrete action)

forward(state)[source]

The forward function for Dueling Double DQN.

Parameters

state – [tensor] the input state.

Returns

the output tensor.

get_q1_q2(state)[source]

TBD