Replay Buffer: replay_buffer.py

ElegantRL provides ReplayBuffer to store sampled transitions.

In ElegantRL, we utilize Worker for exploration (data sampling) and Learner for exploitation (model learning), and we view such a relationship as a “producer-consumer” model, where a worker produces transitions and a learner consumes, and a learner updates the actor net at worker to produce new transitions. In this case, the ReplayBuffer is the storage buffer that connects the worker and learner.

Each transition is in a format (state, (reward, done, action)).

Note

We allocate the ReplayBuffer on continuous RAM for high performance training. Since the collected transitions are packed in sequence, the addressing speed increases dramatically when a learner randomly samples a batch of transitions.

Implementations

class elegantrl.train.replay_buffer.ReplayBuffer(max_capacity: int, state_dim: int, action_dim: int, gpu_id=0, if_use_per=False)[source]

Multiprocessing

Initialization

Utils

class elegantrl.train.replay_buffer.BinarySearchTree(memo_len)[source]

Binary Search Tree for PER Contributor: Github GyChou, Github mississippiu Reference: https://github.com/kaixindelele/DRLib/tree/main/algos/pytorch/td3_sp Reference: https://github.com/jaromiru/AI-blog/blob/master/SumTree.py