Example 2: BipedalWalker-v3¶

BipedalWalker-v3 is a classic task in robotics that performs a fundamental skill: moving forward as fast as possible. The goal is to get a 2D biped walker to walk through rough terrain. BipedalWalker is considered to be a difficult task in the continuous action space, and there are only a few RL implementations that can reach the target reward. Our Python code is available here.

When a biped walker takes random actions:

Step 1: Install ElegantRL¶

pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git

Step 2: Import packages¶

ElegantRL

OpenAI Gym: a toolkit for developing and comparing reinforcement learning algorithms (collections of environments).

from elegantrl.run import *

gym.logger.set_level(40) # Block warning

Step 3: Get environment information¶

get_gym_env_args(gym.make('BipedalWalker-v3'), if_print=False)

Output:

env_args = {
    'env_num': 1,
    'env_name': 'BipedalWalker-v3',
    'max_step': 1600,
    'state_dim': 24,
    'action_dim': 4,
    'if_discrete': False,
    'target_return': 300,
}

Step 4: Initialize agent and environment¶

agent: chooses a agent (DRL algorithm) from a set of agents in the directory.

env_func: the function to create an environment, in this case, we use gym.make to create BipedalWalker-v3.

env_args: the environment information.

env_func = gym.make
env_args = {
    'env_num': 1,
    'env_name': 'BipedalWalker-v3',
    'max_step': 1600,
    'state_dim': 24,
    'action_dim': 4,
    'if_discrete': False,
    'target_return': 300,
    'id': 'BipedalWalker-v3',
}

args = Arguments(AgentPPO, env_func=env_func, env_args=env_args)

Step 5: Specify hyper-parameters¶

A list of hyper-parameters is available here.

args.target_step = args.max_step * 4
args.gamma = 0.98
args.eval_times = 2 ** 4

Step 6: Train your agent¶

In this tutorial, we provide four different modes to train an agent:

Single-process: utilize one GPU for a single-process training. No parallelism.

Multi-process: utilize one GPU for a multi-process training. Support worker and learner parallelism.

Multi-GPU: utilize multi-GPUs to train an agent through model fusion. Specify the GPU ids you want to use.

Tournament-based ensemble training: utilize multi-GPUs to run tournament-based ensemble training.

flag = 'SingleProcess'

if flag == 'SingleProcess':
    args.learner_gpus = 0
    train_and_evaluate(args)

elif flag == 'MultiProcess':
    args.learner_gpus = 0
    train_and_evaluate_mp(args)

elif flag == 'MultiGPU':
    args.learner_gpus = [0, 1, 2, 3]
    train_and_evaluate_mp(args)

elif flag == 'Tournament-based':
    args.learner_gpus = [[i, ] for i in range(4)]  # [[0,], [1, ], [2, ]] or [[0, 1], [2, 3]]
    python_path = '.../bin/python3'
    train_and_evaluate_mp(args, python_path)

else:
    raise ValueError(f"Unknown flag: {flag}")

Try by yourself through this Colab!

Performance of a trained agent:

Check out our video on bilibili: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.