Hyperparameters Optimization
Hyperparameters Optimization#
The sampling spaces defined in the agents classes are used by the hyperparameters optimization procedure, which aims to minimize the cumulative regret across a set of randomly sampled environments.
Hyperparameters optimization configurations
The HyperOptConfig
class controls the parameters of the hyperparameter optimization procedure.
There are four default hyperparameters optimization configurations available in the package.
DEFAULT_HYPEROPT_CONF
is the default hyperparameters optimization configuration for tabular agents,
SMALL_HYPEROPT_CONF
is a quick hyperparameters optimization configuration for tabular agents that can be used for quick testing,
DEFAULT_HYPEROPT_CONF_NONTABULAR
is the default hyperparameters optimization configuration for non-tabular agents, and
SMALL_HYPEROPT_CONF_NONTABULAR
is the default hyperparameters optimization configuration for non-tabular agents that can be used for quick testing.
Default tabular | Small tabular | Default non-tabular | Small non-tabular | |
---|---|---|---|---|
seed | 42 | 42 | 42 | 42 |
n_timesteps | 250000 | 30000 | 250000 | 50000 |
max_interaction_time_s | 300 | 120 | 600 | 60 |
n_samples_agents | 50 | 2 | 50 | 2 |
n_samples_mdps | 5 | 2 | 5 | 2 |
log_every | 100000 | 10000 | 50000 | 10000 |
emission_map | None | None | StateInfo | StateInfo |
mdp_classes | None | None | None | None |
n_seeds | 3 | 1 | 3 | 1 |
Hyperparameters optimization
Running the hyperparameters optimization procedure is very similar to running a benchmark. The only difference is that the benchmark environments are automatically sampled.
# Define a custom small scale hyperparameters optimization procedure
hpoc = HyperOptConfig(
seed=42,
n_timesteps=20_000,
max_interaction_time_s=40,
n_samples_agents=1,
n_samples_mdps=1,
log_every=500,
n_seeds=1,
)
# Take the q-learning agents as running example
agent_cls = [QLearningContinuous, QLearningEpisodic]
# Create the benchmarks for the given agents classes and hyperparameters optimzation configuration
hyperopt_agents_and_benchmarks = sample_agent_configs_and_benchmarks_for_hyperopt(agent_cls, hpoc)
# Obtain the instances and run them locally
hp_exp_instances = instantiate_and_get_exp_instances_from_agents_and_benchmarks_for_hyperopt(
hyperopt_agents_and_benchmarks
)
run_experiment_instances(hp_exp_instances)
# Compute the best hyperparameters, which, by default, minimize the average normalized cumulative regret
optimal_agent_configs = retrieve_best_agent_config_from_hp_folder()
prms_0/QLearningEpisodic.p=0.05
prms_0/QLearningEpisodic.UCB_type="bernstein"
prms_0/QLearningEpisodic.c_1 = 0.4126
prms_0/QLearningEpisodic.c_2 = 1.0458
prms_0/QLearningEpisodic.min_at = 0.1467
prms_0/QLearningContinuous.h_weight = 0.4126
prms_0/QLearningContinuous.span_approx_weight = 1.0458
prms_0/QLearningContinuous.min_at = 0.1467