Replay Buffer: replay_buffer.py

ElegantRL provides ReplayBuffer to store sampled transitions.

In ElegantRL, we utilize Worker for exploration (data sampling) and Learner for exploitation (model learning), and we view such a relationship as a “producer-consumer” model, where a worker produces transitions and a learner consumes, and a learner updates the actor net at worker to produce new transitions. In this case, the ReplayBuffer is the storage buffer that connects the worker and learner.

Each transition is in a format (state, (reward, done, action)).

Note

We allocate the ReplayBuffer on continuous RAM for high performance training. Since the collected transitions are packed in sequence, the addressing speed increases dramatically when a learner randomly samples a batch of transitions.

Implementations

class elegantrl.train.replay_buffer.ReplayBuffer(max_size: int, state_dim: int, action_dim: int, gpu_id: int = 0, num_envs: int = 1, if_use_per: bool = False, args: ~elegantrl.train.config.Config = <elegantrl.train.config.Config object>)[source]
per_beta

PER. Prioritized Experience Replay. Section 4 alpha, beta = 0.7, 0.5 for rank-based variant alpha, beta = 0.6, 0.4 for proportional variant

Multiprocessing

Initialization

Utils