Baseline models Gym (for MuJoCo environments)#

This project contains the code for training baseline models for the tasks under the MuJoCo group of Gym environments, included "Ant-v2", "HalfCheetah-v2", "Hopper-v2", "Humanoid-v2", "InvertedDoublePendulum-v2", "InvertedPendulum-v2", Reacher-v2, "Swimmer-v2", and Walker2d-v2".

Provided are experiment configs for training a lightweight implementation with separate MLPs for actors and critic, MemorylessActorCritic, with a Gaussian distribution to sample actions for all continuous-control environments under the MuJoCo group of Gym environments.

The experiments are set up to train models using the DD-PPO Reinforcement Learning Algorithm.

To train an experiment run the following command from the allenact root directory:

python main.py <PATH_TO_EXPERIMENT_CONFIG> -o <PATH_TO_OUTPUT>

Where <PATH_TO_OUTPUT> is the path of the directory where we want the model weights and logs to be stored and <PATH_TO_EXPERIMENT_CONFIG> is the path to the python file containing the experiment configuration. An example usage of this command would be:

python main.py projects/gym_baselines/experiments/mujoco/gym_mujoco_ant_ddppo.py -o /YOUR/DESIRED/MUJOCO/OUTPUT/SAVE/PATH/gym_mujoco_ant_ddppo

This trains a lightweight implementation with separate MLPs for actors and critic with a Gaussian distribution to sample actions in the "Ant-v2" environment, and stores the model weights and logs to /YOUR/DESIRED/MUJOCO/OUTPUT/SAVE/PATH/gym_mujoco_ant_ddppo.

Results#

In our experiments, the rewards for MuJoCo environments we obtained after training using PPO are similar to those reported by OpenAI Gym Baselines(1M steps). The Humanoid environment is compared with the original PPO paper where training 50M steps using PPO. Due to the time constraint, we only tested our baseline across two seeds so far.

Environment	Gym Baseline Reward	Ours Reward
Ant-v2	1083.2	1098.6(reached 4719 in 25M steps)
HalfCheetah-v2	1795.43	1741(reached 4019 in 18M steps)
Hopper-v2	2316.16	2266
Humanoid-v2	4000+	4500+(reached 6500 in 70M steps)
InvertedPendulum-v2	809.43	1000
Reacher-v2	-6.71	-7.045
Swimmer-v2	111.19	124.7
Walker2d	3424.95	2723 in 10M steps