Replay Buffer:

ElegantRL provides ReplayBuffer to store sampled transitions.

In ElegantRL, we utilize Worker for exploration (data sampling) and Learner for exploitation (model learning), and we view such a relationship as a “producer-consumer” model, where a worker produces transitions and a learner consumes, and a learner updates the actor net at worker to produce new transitions. In this case, the ReplayBuffer is the storage buffer that connects the worker and learner.

Each transition is in a format (state, (reward, done, action)).


We allocate the ReplayBuffer on continuous RAM for high performance training. Since the collected transitions are packed in sequence, the addressing speed increases dramatically when a learner randomly samples a batch of transitions.


class elegantrl.train.replay_buffer.ReplayBuffer(max_capacity: int, state_dim: int, action_dim: int, gpu_id=0, if_use_per=False)[source]




class elegantrl.train.replay_buffer.BinarySearchTree(memo_len)[source]

Binary Search Tree for PER Contributor: Github GyChou, Github mississippiu Reference: Reference: