The Colosseum Benchmark
The Colosseum Benchmark#
The default \(\texttt{Colosseum}\) benchmark targets the four most widely studies setting of reinforcement learning: episodic ergodic, episodic communicating, continuous ergodic, and continuous communicating. Note that the continuous setting is also known as infinite horizon. The environments selection is the result of the theoretical analysis and empirical validation presented in the paper.
The default benchmark environments
The tables below report the environments in the benchmark along with their parameters. We briefly describe the parameters that are common to all environment below, and we refer to the API of the environment classes for the meaning of class-specific parameters.
The size
parameter controls the number of states,
the make_reward_stochastic
parameter checks whether the rewards are stochastic,
the p_lazy
parameter is the probability that an MDP stays in the same state instead of executing the action selected by an agent, and
the p_rand
parameter is the probability that an MDP executes a random action instead of the action selected by an agent.
Episodic ergodic
DeepSea(size=10, p_rand=0.4, make_reward_stochastic=False)
DeepSea(size=13, p_rand=0.3, make_reward_stochastic=True)
FrozenLake(size=4, make_reward_stochastic=True, p_lazy=0.03, p_frozen=0.98, p_rand=0.001)
MiniGridEmpty(size=10, make_reward_stochastic=False, p_lazy=None, p_rand=0.05, n_starting_states=3)
MiniGridEmpty(size=10, make_reward_stochastic=True, p_lazy=None, p_rand=0.3, n_starting_states=3)
MiniGridEmpty(size=10, make_reward_stochastic=False, p_lazy=0.05, p_rand=0.2, n_starting_states=3)
MiniGridEmpty(size=8, make_reward_stochastic=False, p_lazy=None, p_rand=0.3, n_starting_states=3)
MiniGridEmpty(size=8, make_reward_stochastic=True, p_lazy=0.02, p_rand=0.4, n_starting_states=3)
MiniGridEmpty(size=6, make_reward_stochastic=True, p_lazy=0.1, p_rand=0.05, n_starting_states=3)
MiniGridEmpty(size=4, make_reward_stochastic=False, p_lazy=0.1, p_rand=0.4, n_starting_states=3)
MiniGridRooms(make_reward_stochastic=True, room_size=3, n_rooms=4, p_rand=0.001, p_lazy=None)
MiniGridRooms(make_reward_stochastic=True, room_size=3, n_rooms=9, p_rand=0.1, p_lazy=0.01)
MiniGridRooms(make_reward_stochastic=True, room_size=4, n_rooms=4, p_rand=0.1, p_lazy=0.01)
RiverSwim(make_reward_stochastic=True, size=5, p_lazy=0.1, p_rand=0.01, sub_optimal_distribution=(“beta”,(2.4, 24.0)), optimal_distribution=(“beta”,(0.01, 0.11)), other_distribution=(“beta”,(2.4, 249.0)))
RiverSwim(make_reward_stochastic=True, size=30, p_lazy=0.01, p_rand=0.01, sub_optimal_distribution=(“beta”,(14.9, 149.0)), optimal_distribution=(“beta”,(0.01, 0.11)), other_distribution=(“beta”,(14.9, 1499.0)))
SimpleGrid(reward_type=3, size=10, make_reward_stochastic=True, p_lazy=0.2, p_rand=0.01)
SimpleGrid(reward_type=3, size=16, make_reward_stochastic=True, p_lazy=0.2, p_rand=0.2)
SimpleGrid(reward_type=3, size=16, make_reward_stochastic=False, p_lazy=0.01, p_rand=0.2)
Taxi(make_reward_stochastic=True, p_lazy=0.001, size=4, length=1, width=1, space=1, n_locations=3, p_rand=0.01, default_r=(“beta”,(0.8, 20.0)), successfully_delivery_r=(“beta”,(1.0, 0.1)), failure_delivery_r=(“beta”,(0.8, 50.0)))
Taxi(make_reward_stochastic=True, p_lazy=0.01, size=4, length=1, width=1, space=1, n_locations=3, p_rand=0.2, default_r=(“beta”,(0.8, 6.0)), successfully_delivery_r=(“beta”,(1.0, 0.1)), failure_delivery_r=(“beta”,(0.8, 40.0)))
Episodic communicating
DeepSea(size=5, p_rand=None, make_reward_stochastic=True)
DeepSea(size=25, p_rand=None, make_reward_stochastic=True)
FrozenLake(size=3, make_reward_stochastic=True, p_lazy=None, p_frozen=0.9, p_rand=None)
MiniGridEmpty(size=6, make_reward_stochastic=True, p_lazy=None, p_rand=None, n_starting_states=3)
MiniGridEmpty(size=6, make_reward_stochastic=True, p_lazy=0.35, p_rand=None, n_starting_states=5)
MiniGridEmpty(size=6, make_reward_stochastic=True, p_lazy=0.25, p_rand=0.1, n_starting_states=3)
MiniGridEmpty(size=10, make_reward_stochastic=False, p_lazy=0.1, p_rand=None, n_starting_states=5)
MiniGridEmpty(size=10, make_reward_stochastic=False, p_lazy=0.15, p_rand=0.1, n_starting_states=3)
MiniGridRooms(make_reward_stochastic=True, room_size=4, n_rooms=4, p_rand=None, p_lazy=None)
MiniGridRooms(make_reward_stochastic=False, room_size=3, n_rooms=9, p_rand=None, p_lazy=0.05)
MiniGridRooms(make_reward_stochastic=False, room_size=3, n_rooms=9, p_rand=None, p_lazy=0.1)
MiniGridRooms(make_reward_stochastic=False, room_size=3, n_rooms=4, p_rand=None, p_lazy=0.15)
RiverSwim(make_reward_stochastic=True, size=25, p_lazy=0.05, p_rand=None, sub_optimal_distribution=(“beta”,(12.4, 124.0)), optimal_distribution=(“beta”,(0.01, 0.11)), other_distribution=(“beta”,(12.4, 1249.0)))
RiverSwim(make_reward_stochastic=False, size=40, p_lazy=None, p_rand=None, sub_optimal_distribution=None, optimal_distribution=None, other_distribution=None)
SimpleGrid(size=10, make_reward_stochastic=True, p_lazy=0.5)
SimpleGrid(size=13, make_reward_stochastic=True, p_lazy=0.4)
SimpleGrid(size=13, make_reward_stochastic=False, p_lazy=None)
SimpleGrid(size=20, make_reward_stochastic=True, p_lazy=0.4)
Taxi(make_reward_stochastic=True, p_lazy=0.01, size=4, length=1, width=1, space=1, n_locations=3, p_rand=None, default_r=(“beta”,(0.7, 30.0)), successfully_delivery_r=(“beta”,(0.4, 0.1)), failure_delivery_r=(“beta”,(0.8, 50.0)))
Taxi(make_reward_stochastic=True, p_lazy=0.2, size=5, length=1, width=1, space=1, n_locations=3, p_rand=None, default_r=(“beta”,(0.7, 30.0)), successfully_delivery_r=(“beta”, (0.4, 0.1)), failure_delivery_r=(“beta”,(0.8, 50.0)))
Continuous ergodic
DeepSea(size=20, p_rand=0.1, make_reward_stochastic=False)
FrozenLake(size=5, make_reward_stochastic=True, p_lazy=0.01, p_frozen=0.95, p_rand=0.05)
MiniGridEmpty(size=12, make_reward_stochastic=True, p_lazy=0.05, p_rand=0.495, n_starting_states=3)
MiniGridEmpty(size=12, make_reward_stochastic=True, p_lazy=0.1, p_rand=0.395, n_starting_states=3)
MiniGridEmpty(size=10, make_reward_stochastic=False, p_lazy=0.02, p_rand=0.7, n_starting_states=3)
MiniGridEmpty(size=14, make_reward_stochastic=True, p_lazy=0.02, p_rand=0.6, n_starting_states=3)
MiniGridEmpty(size=10, make_reward_stochastic=True, p_lazy=0.1, p_rand=0.21, n_starting_states=3, optimal_distribution=(“beta”,(1.0, 0.11)), other_distribution=(“beta”,(1.0, 4.0)))
MiniGridEmpty(size=14, make_reward_stochastic=True, p_lazy=0.02, p_rand=0.4, n_starting_states=3, optimal_distribution=(“beta”,(0.5, 0.11)), other_distribution=(“beta”,(1.5, 4.0)))
MiniGridEmpty(size=14, make_reward_stochastic=True, p_lazy=0.05, p_rand=0.31, n_starting_states=3, optimal_distribution=(“beta”,(1.0, 0.11)), other_distribution=(“beta”,(1.0, 4.0)))
MiniGridEmpty(size=14, make_reward_stochastic=True, p_lazy=0.1, p_rand=0.6, n_starting_states=3, optimal_distribution=(“beta”,(0.3, 0.11)), other_distribution=(“beta”,(2.0, 4.0)))
MiniGridRooms(make_reward_stochastic=True, room_size=3, n_rooms=16, p_rand=0.1, p_lazy=0.4)
MiniGridRooms(make_reward_stochastic=False, room_size=5, n_rooms=9, p_rand=0.3, p_lazy=0.4)
RiverSwim(make_reward_stochastic=False, size=30, p_lazy=0.1, p_rand=0.2)
RiverSwim(make_reward_stochastic=True, size=50, p_lazy=0.1, p_rand=0.1)
RiverSwim(make_reward_stochastic=True, size=80, p_lazy=0.001, p_rand=0.2)
RiverSwim(make_reward_stochastic=False, size=80, p_lazy=0.1, p_rand=0.01)
SimpleGrid(size=15, make_reward_stochastic=True, p_lazy=0.4, p_rand=0.4)
SimpleGrid(size=10, make_reward_stochastic=False, p_lazy=0.2, sub_optimal_distribution=None, optimal_distribution=None, other_distribution=None, p_rand=0.01)
SimpleGrid(size=20, make_reward_stochastic=False, p_lazy=0.1, sub_optimal_distribution=None, optimal_distribution=None, other_distribution=None, p_rand=0.1)
Taxi(make_reward_stochastic=True, p_lazy=0.01, size=4, length=1, width=1, space=1, n_locations=3, p_rand=0.1, default_r=(“beta”,(0.8, 20.0)), successfully_delivery_r=(“beta”,(1.0, 0.1)), failure_delivery_r=(“beta”,(0.8, 50.0)))
Continuous communicating
DeepSea(size=40, p_rand=None, make_reward_stochastic=True)
DeepSea(size=40, p_rand=None, make_reward_stochastic=False)
DeepSea(size=35, p_rand=None, make_reward_stochastic=True)
FrozenLake(size=4, make_reward_stochastic=True, p_lazy=0.01, p_frozen=0.95, p_rand=None)
FrozenLake(size=5, make_reward_stochastic=True, p_lazy=0.35, p_frozen=0.9, p_rand=None)
MiniGridEmpty(size=12, make_reward_stochastic=True, p_lazy=0.25, p_rand=None, n_starting_states=3, optimal_distribution=(“beta”,(1.0, 0.11)), other_distribution=(“beta”,(1.0, 4.0)))
MiniGridEmpty(size=12, make_reward_stochastic=False, p_lazy=0.3, p_rand=None, n_starting_states=3, optimal_distribution=None, other_distribution=None)
MiniGridEmpty(size=8, make_reward_stochastic=False, p_lazy=0.3, p_rand=None, n_starting_states=3, optimal_distribution=None, other_distribution=None)
MiniGridEmpty(size=8, make_reward_stochastic=True, p_lazy=0.7, p_rand=None, n_starting_states=3, optimal_distribution=(“beta”,(1.0, 0.11)), other_distribution=(“beta”,(1.0, 4.0)))
MiniGridEmpty(size=12, make_reward_stochastic=True, p_lazy=0.7, p_rand=None, n_starting_states=3, optimal_distribution=(“beta”,(1.0, 0.11)), other_distribution=(“beta”,(1.0, 4.0)))
MiniGridRooms(make_reward_stochastic=True, room_size=5, n_rooms=9, p_rand=None, p_lazy=0.3)
MiniGridRooms(make_reward_stochastic=True, room_size=3, n_rooms=9, p_rand=None, p_lazy=0.5)
MiniGridRooms(make_reward_stochastic=True, room_size=5, n_rooms=9, p_rand=None, p_lazy=0.5)
RiverSwim(make_reward_stochastic=True, size=25, p_lazy=0.1, p_rand=None)
RiverSwim(make_reward_stochastic=True, size=90, p_lazy=0.03, p_rand=None)
SimpleGrid(size=15, make_reward_stochastic=True, p_lazy=0.2, p_rand=None, sub_optimal_distribution=(“beta”,(0.3, 49.0)), optimal_distribution=(“beta”,(2.0, 0.11)), other_distribution=(“beta”,(0.3, 4.0)))
SimpleGrid(size=16, make_reward_stochastic=False, p_lazy=0.065, p_rand=None, sub_optimal_distribution=None, optimal_distribution=None, other_distribution=None)
SimpleGrid(size=25, make_reward_stochastic=True, p_lazy=0.1, p_rand=None, sub_optimal_distribution=(“beta”,(0.3, 49.0)), optimal_distribution=(“beta”,(2.0, 0.11)), other_distribution=(“beta”,(0.3, 4.0)))
SimpleGrid(size=25, make_reward_stochastic=False, p_lazy=0.3, p_rand=None, sub_optimal_distribution=None, optimal_distribution=None, other_distribution=None)
Taxi(make_reward_stochastic=True, p_lazy=0.01, size=4, length=1, width=1, space=1, n_locations=3, p_rand=None, default_r=(“beta”,(0.7, 30.0)), successfully_delivery_r=(“beta”,(0.4, 0.1)), failure_delivery_r=(“beta”,(0.8, 50.0)))
Instantiate the default benchmark
A benchmark in \(\texttt{Colosseum}\) can be instantiated using the ColosseumBenchmark
class.
A ColosseumBenchmark
object contains the parameters of the MDP and an ExperimentConfig
, which regulates the agent/MDP interactions.
The default benchmark can be accesses through the ColosseumDefaultBenchmark
enumeration, which also allow to retrieve the benchmark as shown below.
# Locally instantiate the episodic ergodic benchmark with folder name "benchmark_er"
instantiate_benchmark_folder(
ColosseumDefaultBenchmark.EPISODIC_ERGODIC.get_benchmark(), "benchmark_er"
)
# Locally instantiate the episodic communicating benchmark with folder name "benchmark_ec"
instantiate_benchmark_folder(
ColosseumDefaultBenchmark.EPISODIC_COMMUNICATING.get_benchmark(), "benchmark_ec"
)
# Locally instantiate the continuous ergodic benchmark with folder name "benchmark_ce"
instantiate_benchmark_folder(
ColosseumDefaultBenchmark.CONTINUOUS_ERGODIC.get_benchmark(), "benchmark_ce"
)
# Locally instantiate the continuous communicating benchmark with folder name "benchmark_cc"
instantiate_benchmark_folder(
ColosseumDefaultBenchmark.CONTINUOUS_COMMUNICATING.get_benchmark(), "benchmark_cc"
)
# Print the folder structure of the benchmark
for bdir in glob("benchmark_*"):
seedir.seedir(bdir, style="emoji")
print("-" * 70)
# Print the benchmark configurations
print_benchmark_configurations()
📁 benchmark_er/
├─📁 mdp_configs/
│ ├─📄 MiniGridRoomsEpisodic.gin
│ ├─📄 DeepSeaEpisodic.gin
│ ├─📄 SimpleGridEpisodic.gin
│ ├─📄 MiniGridEmptyEpisodic.gin
│ ├─📄 TaxiEpisodic.gin
│ ├─📄 RiverSwimEpisodic.gin
│ └─📄 FrozenLakeEpisodic.gin
└─📄 experiment_config.yml
📁 benchmark_ec/
├─📁 mdp_configs/
│ ├─📄 MiniGridRoomsEpisodic.gin
│ ├─📄 DeepSeaEpisodic.gin
│ ├─📄 SimpleGridEpisodic.gin
│ ├─📄 MiniGridEmptyEpisodic.gin
│ ├─📄 TaxiEpisodic.gin
│ ├─📄 RiverSwimEpisodic.gin
│ └─📄 FrozenLakeEpisodic.gin
└─📄 experiment_config.yml
📁 benchmark_ce/
├─📁 mdp_configs/
│ ├─📄 MiniGridEmptyContinuous.gin
│ ├─📄 TaxiContinuous.gin
│ ├─📄 DeepSeaContinuous.gin
│ ├─📄 MiniGridRoomsContinuous.gin
│ ├─📄 RiverSwimContinuous.gin
│ ├─📄 SimpleGridContinuous.gin
│ └─📄 FrozenLakeContinuous.gin
└─📄 experiment_config.yml
📁 benchmark_cc/
├─📁 mdp_configs/
│ ├─📄 MiniGridEmptyContinuous.gin
│ ├─📄 TaxiContinuous.gin
│ ├─📄 DeepSeaContinuous.gin
│ ├─📄 MiniGridRoomsContinuous.gin
│ ├─📄 RiverSwimContinuous.gin
│ ├─📄 SimpleGridContinuous.gin
│ └─📄 FrozenLakeContinuous.gin
└─📄 experiment_config.yml
----------------------------------------------------------------------
Default experiment configuration in the 'experiment_config.yml' files.
n_seeds: 20
n_steps: 500000
max_interaction_time_s: 600
log_performance_indicators_every: 100
The mdp_configs
folders for the different setting contain the Gin files with the configurations of the MDPs that were previously presented.
The experiment_config.yml
file contains the default ExperimentConfig
.