Replay Buffer:

ElegantRL provides ReplayBuffer to store sampled transitions.

In ElegantRL, we utilize Worker for exploration (data sampling) and Learner for exploitation (model learning), and we view such a relationship as a “producer-consumer” model, where a worker produces transitions and a learner consumes, and a learner updates the actor net at worker to produce new transitions. In this case, the ReplayBuffer is the storage buffer that connects the worker and learner.

Each transition is in a format (state, (reward, done, action)).


We allocate the ReplayBuffer on continuous RAM for high performance training. Since the collected transitions are packed in sequence, the addressing speed increases dramatically when a learner randomly samples a batch of transitions.


class elegantrl.train.replay_buffer.ReplayBuffer(max_size: int, state_dim: int, action_dim: int, gpu_id: int = 0, num_envs: int = 1, if_use_per: bool = False, args: ~elegantrl.train.config.Config = <elegantrl.train.config.Config object>)[source]

PER. Prioritized Experience Replay. Section 4 alpha, beta = 0.7, 0.5 for rank-based variant alpha, beta = 0.6, 0.4 for proportional variant