Replay Buffer: replay_buffer.py¶
ReplayBuffer to store sampled transitions.
In ElegantRL, we utilize
Worker for exploration (data sampling) and
Learner for exploitation (model learning), and we view such a relationship as a “producer-consumer” model, where a worker produces transitions and a learner consumes, and a learner updates the actor net at worker to produce new transitions. In this case, the
ReplayBuffer is the storage buffer that connects the worker and learner.
Each transition is in a format (state, (reward, done, action)).
We allocate the
ReplayBuffer on continuous RAM for high performance training. Since the collected transitions are packed in sequence, the addressing speed increases dramatically when a learner randomly samples a batch of transitions.
- class elegantrl.train.replay_buffer.ReplayBuffer(max_size: int, state_dim: int, action_dim: int, gpu_id: int = 0, num_envs: int = 1, if_use_per: bool = False, args: ~elegantrl.train.config.Config = <elegantrl.train.config.Config object>)¶
PER. Prioritized Experience Replay. Section 4 alpha, beta = 0.7, 0.5 for rank-based variant alpha, beta = 0.6, 0.4 for proportional variant