Hyperparameters Optimization#

The sampling spaces defined in the agents classes are used by the hyperparameters optimization procedure, which aims to minimize the cumulative regret across a set of randomly sampled environments.

Hyperparameters optimization configurations

The HyperOptConfig class controls the parameters of the hyperparameter optimization procedure. There are four default hyperparameters optimization configurations available in the package. DEFAULT_HYPEROPT_CONF is the default hyperparameters optimization configuration for tabular agents, SMALL_HYPEROPT_CONF is a quick hyperparameters optimization configuration for tabular agents that can be used for quick testing, DEFAULT_HYPEROPT_CONF_NONTABULAR is the default hyperparameters optimization configuration for non-tabular agents, and SMALL_HYPEROPT_CONF_NONTABULAR is the default hyperparameters optimization configuration for non-tabular agents that can be used for quick testing.

  Default tabular Small tabular Default non-tabular Small non-tabular
seed 42 42 42 42
n_timesteps 250000 30000 250000 50000
max_interaction_time_s 300 120 600 60
n_samples_agents 50 2 50 2
n_samples_mdps 5 2 5 2
log_every 100000 10000 50000 10000
emission_map None None StateInfo StateInfo
mdp_classes None None None None
n_seeds 3 1 3 1

Hyperparameters optimization

Running the hyperparameters optimization procedure is very similar to running a benchmark. The only difference is that the benchmark environments are automatically sampled.

# Define a custom small scale hyperparameters optimization procedure
hpoc = HyperOptConfig(
    seed=42,
    n_timesteps=20_000,
    max_interaction_time_s=40,
    n_samples_agents=1,
    n_samples_mdps=1,
    log_every=500,
    n_seeds=1,
)

# Take the q-learning agents as running example
agent_cls = [QLearningContinuous, QLearningEpisodic]

# Create the benchmarks for the given agents classes and hyperparameters optimzation configuration
hyperopt_agents_and_benchmarks = sample_agent_configs_and_benchmarks_for_hyperopt(agent_cls, hpoc)

# Obtain the instances and run them locally
hp_exp_instances = instantiate_and_get_exp_instances_from_agents_and_benchmarks_for_hyperopt(
    hyperopt_agents_and_benchmarks
)
run_experiment_instances(hp_exp_instances)

# Compute the best hyperparameters, which, by default, minimize the average normalized cumulative regret
optimal_agent_configs = retrieve_best_agent_config_from_hp_folder()
prms_0/QLearningEpisodic.p=0.05
prms_0/QLearningEpisodic.UCB_type="bernstein"
prms_0/QLearningEpisodic.c_1 = 0.4126
prms_0/QLearningEpisodic.c_2 = 1.0458
prms_0/QLearningEpisodic.min_at = 0.1467


prms_0/QLearningContinuous.h_weight = 0.4126
prms_0/QLearningContinuous.span_approx_weight = 1.0458
prms_0/QLearningContinuous.min_at = 0.1467