Replay Buffer: replay_buffer.py¶
ElegantRL provides ReplayBuffer
to store sampled transitions.
In ElegantRL, we utilize Worker
for exploration (data sampling) and Learner
for exploitation (model learning), and we view such a relationship as a “producer-consumer” model, where a worker produces transitions and a learner consumes, and a learner updates the actor net at worker to produce new transitions. In this case, the ReplayBuffer
is the storage buffer that connects the worker and learner.
Each transition is in a format (state, (reward, done, action)).
Note
We allocate the ReplayBuffer
on continuous RAM for high performance training. Since the collected transitions are packed in sequence, the addressing speed increases dramatically when a learner randomly samples a batch of transitions.
Implementations¶
- class elegantrl.train.replay_buffer.ReplayBuffer(max_size: int, state_dim: int, action_dim: int, gpu_id: int = 0, num_envs: int = 1, if_use_per: bool = False, args: ~elegantrl.train.config.Config = <elegantrl.train.config.Config object>)[source]¶
- per_beta¶
PER. Prioritized Experience Replay. Section 4 alpha, beta = 0.7, 0.5 for rank-based variant alpha, beta = 0.6, 0.4 for proportional variant