Example 1: LunarLanderContinuous-v2¶

LunarLanderContinuous-v2 is a robotic control task. The goal is to get a Lander to rest on the landing pad. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. Detailed description of the task can be found at OpenAI Gym. Our Python code is available here.

When a Lander takes random actions:

Step 1: Install ElegantRL¶

pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git

Step 2: Import packages¶

ElegantRL

OpenAI Gym: a toolkit for developing and comparing reinforcement learning algorithms (collections of environments).

from elegantrl.run import *

gym.logger.set_level(40) # Block warning

Step 3: Get environment information¶

get_gym_env_args(gym.make('LunarLanderContinuous-v2'), if_print=True)

Output:

env_args = {
    'env_num': 1,
    'env_name': 'LunarLanderContinuous-v2',
    'max_step': 1000,
    'state_dim': 8,
    'action_dim': 4,
    'if_discrete': True,
    'target_return': 200,
    'id': 'LunarLanderContinuous-v2'
}

Step 4: Initialize agent and environment¶

agent: chooses a agent (DRL algorithm) from a set of agents in the directory.

env_func: the function to create an environment, in this case, we use gym.make to create LunarLanderContinuous-v2.

env_args: the environment information.

env_func = gym.make
env_args = {
    'env_num': 1,
    'env_name': 'LunarLanderContinuous-v2',
    'max_step': 1000,
    'state_dim': 8,
    'action_dim': 4,
    'if_discrete': True,
    'target_return': 200,
    'id': 'LunarLanderContinuous-v2'
}

args = Arguments(AgentModSAC, env_func=env_func, env_args=env_args)

Step 5: Specify hyper-parameters¶

A list of hyper-parameters is available here.

args.target_step = args.max_step
args.gamma = 0.99
args.eval_times = 2 ** 5

Step 6: Train your agent¶

In this tutorial, we provide a single-process demo to train an agent:

train_and_evaluate(args)

Try by yourself through this Colab!

Performance of a trained agent: