Benchmarking Agents#

We’ll shortly explain how to benchmark agents.

# Define a small scale experiment config
experiment_config = ExperimentConfig(
    n_seeds=1,
    n_steps=5_000,
    max_interaction_time_s=1 * 30,
    log_performance_indicators_every=1000,
)

# Take the default colosseum benchmark for the episodic ergodic and the continuous communicating settings
b_e = ColosseumDefaultBenchmark.EPISODIC_QUICK_TEST.get_benchmark()
b_e.experiment_config = experiment_config
b_c = ColosseumDefaultBenchmark.CONTINUOUS_QUICK_TEST.get_benchmark()
b_c.experiment_config = experiment_config

# Randomly sample some episodic agents
agents_configs_e = {
    PSRLEpisodic : sample_agent_gin_configs_file(PSRLEpisodic, n=1, seed=seed),
    QLearningEpisodic : sample_agent_gin_configs_file(QLearningEpisodic, n=1, seed=seed),
}

# Randomly sample some continuous agents
agents_configs_c = {
    QLearningContinuous : sample_agent_gin_configs_file(QLearningContinuous, n=1, seed=seed),
    UCRL2Continuous : sample_agent_gin_configs_file(UCRL2Continuous, n=1, seed=seed),
}

# Obtain the experiment instances for the agents configurations and the benchmark
agents_and_benchmarks = [
    (agents_configs_e, b_e),
    (agents_configs_c, b_c),
]
experiment_instances = instantiate_and_get_exp_instances_from_agents_and_benchmarks(agents_and_benchmarks)

# Run the experiment instances
# Note that if multiprocessing is enabled, Colosseum will take advantage of it
run_experiment_instances(experiment_instances)