Example 1: LunarLanderContinuous-v2¶
LunarLanderContinuous-v2 is a robotic control task. The goal is to get a Lander to rest on the landing pad. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. Detailed description of the task can be found at OpenAI Gym. Our Python code is available here.
When a Lander takes random actions:

Step 1: Install ElegantRL¶
pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git
Step 2: Import packages¶
ElegantRL
OpenAI Gym: a toolkit for developing and comparing reinforcement learning algorithms (collections of environments).
from elegantrl.run import *
gym.logger.set_level(40) # Block warning
Step 3: Get environment information¶
get_gym_env_args(gym.make('LunarLanderContinuous-v2'), if_print=True)
Output:
env_args = {
'env_num': 1,
'env_name': 'LunarLanderContinuous-v2',
'max_step': 1000,
'state_dim': 8,
'action_dim': 4,
'if_discrete': True,
'target_return': 200,
'id': 'LunarLanderContinuous-v2'
}
Step 4: Initialize agent and environment¶
agent: chooses a agent (DRL algorithm) from a set of agents in the directory.
env_func: the function to create an environment, in this case, we use
gym.make
to create LunarLanderContinuous-v2.env_args: the environment information.
env_func = gym.make
env_args = {
'env_num': 1,
'env_name': 'LunarLanderContinuous-v2',
'max_step': 1000,
'state_dim': 8,
'action_dim': 4,
'if_discrete': True,
'target_return': 200,
'id': 'LunarLanderContinuous-v2'
}
args = Arguments(AgentModSAC, env_func=env_func, env_args=env_args)
Step 5: Specify hyper-parameters¶
A list of hyper-parameters is available here.
args.target_step = args.max_step
args.gamma = 0.99
args.eval_times = 2 ** 5
Step 6: Train your agent¶
In this tutorial, we provide a single-process demo to train an agent:
train_and_evaluate(args)
Try by yourself through this Colab!
Performance of a trained agent:
