Multi-Agent TD3 (MATD3) uses double centralized critics to reduce overestimation bias in multi-agent environments. It combines the improvements of TD3 with MADDPG.

Code Snippet

def update_net(self, buffer, batch_size, repeat_times, soft_update_tau):
    Update the neural networks by sampling batch data from ``ReplayBuffer``.

    :param buffer: the ReplayBuffer instance that stores the trajectories.
    :param batch_size: the size of batch data for Stochastic Gradient Descent (SGD).
    :param repeat_times: the re-using times of each trajectory.
    :param soft_update_tau: the soft update parameter.
    :return Nonetype
    self.batch_size = batch_size
    self.update_tau = soft_update_tau
    rewards, dones, actions, observations, next_obs = buffer.sample_batch(self.batch_size)
    for index in range(self.n_agents):
        self.update_agent(rewards, dones, actions, observations, next_obs, index)

    for agent in self.agents:
        self.soft_update(agent.cri_target, agent.cri, self.update_tau)
        self.soft_update(agent.act_target, agent.act, self.update_tau)




class*args: Any, **kwargs: Any)[source]
class*args: Any, **kwargs: Any)[source]