Hyperparameters Optimization#

The sampling spaces defined in the agents classes are used by the hyperparameters optimization procedure, which aims to minimize the cumulative regret across a set of randomly sampled environments.

Hyperparameters optimization configurations

The HyperOptConfig class controls the parameters of the hyperparameter optimization procedure. There are four default hyperparameters optimization configurations available in the package. DEFAULT_HYPEROPT_CONF is the default hyperparameters optimization configuration for tabular agents, SMALL_HYPEROPT_CONF is a quick hyperparameters optimization configuration for tabular agents that can be used for quick testing, DEFAULT_HYPEROPT_CONF_NONTABULAR is the default hyperparameters optimization configuration for non-tabular agents, and SMALL_HYPEROPT_CONF_NONTABULAR is the default hyperparameters optimization configuration for non-tabular agents that can be used for quick testing.

	Default tabular	Small tabular	Default non-tabular	Small non-tabular
seed	42	42	42	42
n_timesteps	250000	30000	250000	50000
max_interaction_time_s	300	120	600	60
n_samples_agents	50	2	50	2
n_samples_mdps	5	2	5	2
log_every	100000	10000	50000	10000
emission_map	None	None	StateInfo	StateInfo
mdp_classes	None	None	None	None
n_seeds	3	1	3	1

Hyperparameters optimization

Running the hyperparameters optimization procedure is very similar to running a benchmark. The only difference is that the benchmark environments are automatically sampled.

# Define a custom small scale hyperparameters optimization procedure
hpoc = HyperOptConfig(
    seed=42,
    n_timesteps=20_000,
    max_interaction_time_s=40,
    n_samples_agents=1,
    n_samples_mdps=1,
    log_every=500,
    n_seeds=1,
)

# Take the q-learning agents as running example
agent_cls = [QLearningContinuous, QLearningEpisodic]

# Create the benchmarks for the given agents classes and hyperparameters optimzation configuration
hyperopt_agents_and_benchmarks = sample_agent_configs_and_benchmarks_for_hyperopt(agent_cls, hpoc)

# Obtain the instances and run them locally
hp_exp_instances = instantiate_and_get_exp_instances_from_agents_and_benchmarks_for_hyperopt(
    hyperopt_agents_and_benchmarks
)
run_experiment_instances(hp_exp_instances)

# Compute the best hyperparameters, which, by default, minimize the average normalized cumulative regret
optimal_agent_configs = retrieve_best_agent_config_from_hp_folder()

prms_0/QLearningEpisodic.p=0.05
prms_0/QLearningEpisodic.UCB_type="bernstein"
prms_0/QLearningEpisodic.c_1 = 0.4126
prms_0/QLearningEpisodic.c_2 = 1.0458
prms_0/QLearningEpisodic.min_at = 0.1467


prms_0/QLearningContinuous.h_weight = 0.4126
prms_0/QLearningContinuous.span_approx_weight = 1.0458
prms_0/QLearningContinuous.min_at = 0.1467

The hitchhiker's guide to Colosseum

Hyperparameters Optimization

Hyperparameters Optimization#