Analyse Benchmarking Results#

We’ll reproduce the analysis of the benchmark results of the tabular agents in the continuous communicating setting presented in the accompanying paper.

Visualization tools

Two types of visualizations are available, tables and plots. Note that \(\LaTeX\) code for the tables is automatically generated.

Tables

Summary table

The get_latex_table_of_average_indicator function produces a table that summarises the agents’ performances in terms of a single indicator.

# The tex variable contains a LaTex ready version, whereas pd_table is a Pandas table
tex, pd_table = get_latex_table_of_average_indicator(
    benchmark_log_folder,
    "normalized_cumulative_regret",
    print_table=True,
    return_table=True,
)
                                    PSRL              Q-learning                   UCRL2
                                 prms_41                 prms_49                 prms_49
MDP                                                                                     
DeepSea           $\mathbf{0.78}\pm0.05$  $\mathbf{0.78}\pm0.00$           $0.90\pm0.01$
DeepSea           $\mathbf{0.99}\pm0.00$  $\mathbf{0.99}\pm0.00$  $\mathbf{0.99}\pm0.00$
DeepSea           $\mathbf{0.79}\pm0.04$  $\mathbf{0.79}\pm0.00$           $0.92\pm0.01$
FrozenLake        $\mathbf{0.01}\pm0.04$           $0.77\pm0.04$  $\mathbf{0.01}\pm0.01$
FrozenLake        $\mathbf{0.01}\pm0.02$           $0.84\pm0.04$           $0.04\pm0.06$
MiniGridEmpty              $0.95\pm0.22$           $0.51\pm0.23$  $\mathbf{0.02}\pm0.00$
MiniGridEmpty              $1.00\pm0.00$  $\mathbf{0.01}\pm0.00$           $0.02\pm0.00$
MiniGridEmpty              $0.60\pm0.50$  $\mathbf{0.00}\pm0.00$           $0.01\pm0.00$
MiniGridEmpty              $1.00\pm0.00$           $0.35\pm0.17$  $\mathbf{0.01}\pm0.00$
MiniGridEmpty              $1.00\pm0.00$           $0.75\pm0.21$  $\mathbf{0.08}\pm0.20$
MiniGridRooms              $1.00\pm0.00$  $\mathbf{0.01}\pm0.01$           $0.78\pm0.40$
MiniGridRooms              $1.00\pm0.00$  $\mathbf{0.01}\pm0.01$           $0.02\pm0.01$
MiniGridRooms              $1.00\pm0.00$  $\mathbf{0.02}\pm0.02$           $0.66\pm0.47$
RiverSwim         $\mathbf{0.00}\pm0.01$           $0.16\pm0.03$  $\mathbf{0.00}\pm0.00$
RiverSwim         $\mathbf{0.01}\pm0.00$           $0.34\pm0.14$           $0.02\pm0.01$
SimpleGrid                 $0.93\pm0.00$           $0.11\pm0.01$  $\mathbf{0.01}\pm0.00$
SimpleGrid                 $0.45\pm0.15$  $\mathbf{0.01}\pm0.00$  $\mathbf{0.01}\pm0.00$
SimpleGrid                 $0.93\pm0.00$  $\mathbf{0.15}\pm0.01$           $0.70\pm0.40$
SimpleGrid                 $0.50\pm0.00$  $\mathbf{0.01}\pm0.00$           $0.33\pm0.24$
Taxi                       $0.94\pm0.04$           $0.95\pm0.00$  $\mathbf{0.12}\pm0.01$
\textit{Average}           $0.69\pm0.38$           $0.38\pm0.37$  $\mathbf{0.28}\pm0.37$
/home/michelangelo/miniconda3/envs/py39/lib/python3.9/site-packages/colosseum/analysis/tables.py:146: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  table_lat = table_lat.to_latex(escape=False).replace(

The \(\LaTeX\) summary table is provided below.

\begin{tabular}{lccc}
\toprule
{} &                    PSRL &              Q-learning &                   UCRL2 \\
\midrule
DeepSea          &  $\mathbf{0.78}\pm0.05$ &  $\mathbf{0.78}\pm0.00$ &           $0.90\pm0.01$ \\
                 &  $\mathbf{0.99}\pm0.00$ &  $\mathbf{0.99}\pm0.00$ &  $\mathbf{0.99}\pm0.00$ \\
                 &  $\mathbf{0.79}\pm0.04$ &  $\mathbf{0.79}\pm0.00$ &           $0.92\pm0.01$ \\
\arrayrulecolor{black!15}\midrule%
FrozenLake       &  $\mathbf{0.01}\pm0.04$ &           $0.77\pm0.04$ &  $\mathbf{0.01}\pm0.01$ \\
                 &  $\mathbf{0.01}\pm0.02$ &           $0.84\pm0.04$ &           $0.04\pm0.06$ \\
\arrayrulecolor{black!15}\midrule%
MG-Empty    &           $0.95\pm0.22$ &           $0.51\pm0.23$ &  $\mathbf{0.02}\pm0.00$ \\
                 &           $1.00\pm0.00$ &  $\mathbf{0.01}\pm0.00$ &           $0.02\pm0.00$ \\
                 &           $0.60\pm0.50$ &  $\mathbf{0.00}\pm0.00$ &           $0.01\pm0.00$ \\
                 &           $1.00\pm0.00$ &           $0.35\pm0.17$ &  $\mathbf{0.01}\pm0.00$ \\
                 &           $1.00\pm0.00$ &           $0.75\pm0.21$ &  $\mathbf{0.08}\pm0.20$ \\
\arrayrulecolor{black!15}\midrule%
MG-Rooms    &           $1.00\pm0.00$ &  $\mathbf{0.01}\pm0.01$ &           $0.78\pm0.40$ \\
                 &           $1.00\pm0.00$ &  $\mathbf{0.01}\pm0.01$ &           $0.02\pm0.01$ \\
                 &           $1.00\pm0.00$ &  $\mathbf{0.02}\pm0.02$ &           $0.66\pm0.47$ \\
\arrayrulecolor{black!15}\midrule%
RiverSwim        &  $\mathbf{0.00}\pm0.01$ &           $0.16\pm0.03$ &  $\mathbf{0.00}\pm0.00$ \\
                 &  $\mathbf{0.01}\pm0.00$ &           $0.34\pm0.14$ &           $0.02\pm0.01$ \\
\arrayrulecolor{black!15}\midrule%
SimpleGrid       &           $0.93\pm0.00$ &           $0.11\pm0.01$ &  $\mathbf{0.01}\pm0.00$ \\
                 &           $0.45\pm0.15$ &  $\mathbf{0.01}\pm0.00$ &  $\mathbf{0.01}\pm0.00$ \\
                 &           $0.93\pm0.00$ &  $\mathbf{0.15}\pm0.01$ &           $0.70\pm0.40$ \\
                 &           $0.50\pm0.00$ &  $\mathbf{0.01}\pm0.00$ &           $0.33\pm0.24$ \\
\arrayrulecolor{black!15}\midrule%
Taxi             &           $0.94\pm0.04$ &           $0.95\pm0.00$ &  $\mathbf{0.12}\pm0.01$ \\
\arrayrulecolor{black!30}\midrule%
\textit{Average} &           $0.69\pm0.38$ &           $0.38\pm0.37$ &  $\mathbf{0.28}\pm0.37$ \\
\arrayrulecolor{black!15}\midrule%
\end{tabular}

Indicators table

The get_latex_table_of_indicators function produces a large table that can include multiple indicators. It also reports the the number of seeds that the agent was able to complete in the given training time limit.

# The tex variable contains a LaTex ready version
tex = get_latex_table_of_indicators(
    benchmark_log_folder,
    ["normalized_cumulative_regret", "steps_per_second"],
    show_prm_mdp=True,
    print_table=True,
)
                          Norm. cumulative regret Steps per second \# completed seeds
MDP            Agent                                                                 
DeepSea (1)    PSRL                 $0.78\pm0.05$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.78\pm0.00$    $0.00\pm0.00$            $20/20$
               UCRL2                $0.90\pm0.01$    $0.00\pm0.00$             $0/20$
DeepSea (2)    PSRL                 $0.99\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.99\pm0.00$    $0.00\pm0.00$            $20/20$
               UCRL2                $0.99\pm0.00$    $0.00\pm0.00$             $0/20$
DeepSea (3)    PSRL                 $0.79\pm0.04$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.79\pm0.00$    $0.00\pm0.00$            $20/20$
               UCRL2                $0.92\pm0.01$    $0.00\pm0.00$             $0/20$
FrozenLake (1) PSRL                 $0.01\pm0.03$    $0.02\pm0.00$            $20/20$
               Q-learning           $0.77\pm0.04$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.01\pm0.01$    $0.02\pm0.01$            $20/20$
FrozenLake (2) PSRL                 $0.01\pm0.02$    $0.01\pm0.00$            $20/20$
               Q-learning           $0.84\pm0.04$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.04\pm0.06$    $0.02\pm0.01$            $20/20$
MG-Empty (1)   PSRL                 $0.95\pm0.22$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.51\pm0.22$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.02\pm0.00$    $0.00\pm0.00$             $7/20$
MG-Empty (2)   PSRL                 $1.00\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.01\pm0.00$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.02\pm0.00$    $0.00\pm0.00$             $9/20$
MG-Empty (3)   PSRL                 $0.60\pm0.49$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.00\pm0.00$    $0.02\pm0.01$            $20/20$
               UCRL2                $0.01\pm0.00$    $0.01\pm0.00$            $20/20$
MG-Empty (4)   PSRL                 $1.00\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.35\pm0.17$    $0.02\pm0.01$            $20/20$
               UCRL2                $0.01\pm0.00$    $0.01\pm0.00$            $20/20$
MG-Empty (5)   PSRL                 $1.00\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.75\pm0.20$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.08\pm0.20$    $0.00\pm0.00$             $0/20$
MG-Rooms (1)   PSRL                 $1.00\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.01\pm0.01$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.78\pm0.39$    $0.00\pm0.00$             $0/20$
MG-Rooms (2)   PSRL                 $1.00\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.01\pm0.01$    $0.02\pm0.01$            $20/20$
               UCRL2                $0.02\pm0.01$    $0.00\pm0.00$            $18/20$
MG-Rooms (3)   PSRL                 $1.00\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.02\pm0.02$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.66\pm0.46$    $0.00\pm0.00$             $0/20$
RiverSwim (1)  PSRL                 $0.00\pm0.01$    $0.01\pm0.00$            $20/20$
               Q-learning           $0.16\pm0.03$    $0.03\pm0.01$            $20/20$
               UCRL2                $0.00\pm0.00$    $0.02\pm0.00$            $20/20$
RiverSwim (2)  PSRL                 $0.01\pm0.00$    $0.01\pm0.00$            $20/20$
               Q-learning           $0.34\pm0.13$    $0.02\pm0.01$            $20/20$
               UCRL2                $0.02\pm0.01$    $0.01\pm0.00$            $20/20$
SimpleGrid (1) PSRL                 $0.93\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.11\pm0.01$    $0.02\pm0.00$            $20/20$
               UCRL2                $0.01\pm0.00$    $0.00\pm0.00$            $20/20$
SimpleGrid (2) PSRL                 $0.45\pm0.15$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.01\pm0.00$    $0.02\pm0.00$            $20/20$
               UCRL2                $0.01\pm0.00$    $0.00\pm0.00$            $13/20$
SimpleGrid (3) PSRL                 $0.93\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.15\pm0.01$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.70\pm0.39$    $0.00\pm0.00$             $0/20$
SimpleGrid (4) PSRL                 $0.50\pm0.00$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.01\pm0.00$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.33\pm0.23$    $0.00\pm0.00$             $0/20$
Taxi (1)       PSRL                 $0.94\pm0.03$    $0.00\pm0.00$             $0/20$
               Q-learning           $0.95\pm0.00$    $0.01\pm0.00$            $20/20$
               UCRL2                $0.12\pm0.01$    $0.01\pm0.00$            $20/20$
/home/michelangelo/miniconda3/envs/py39/lib/python3.9/site-packages/colosseum/analysis/tables.py:317: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  table = table.to_latex(escape=False).split("\n")

The \(\LaTeX\) indicators table is provided below.

\begin{tabular}{lllll}
\toprule
         &       & Norm. cumulative regret & Steps per second & \# completed seeds \\
MDP & Agent &                         &                  &                    \\
\midrule
DeepSea (1) & PSRL &           $0.78\pm0.05$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.78\pm0.00$ &    $0.00\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.90\pm0.01$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
DeepSea (2) & PSRL &           $0.99\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.99\pm0.00$ &    $0.00\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.99\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
DeepSea (3) & PSRL &           $0.79\pm0.04$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.79\pm0.00$ &    $0.00\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.92\pm0.01$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
FrozenLake (1) & PSRL &           $0.01\pm0.03$ &    $0.02\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.77\pm0.04$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.01\pm0.01$ &    $0.02\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
FrozenLake (2) & PSRL &           $0.01\pm0.02$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.84\pm0.04$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.04\pm0.06$ &    $0.02\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Empty (1) & PSRL &           $0.95\pm0.22$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.51\pm0.22$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.02\pm0.00$ &    $0.00\pm0.00$ &             $7/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Empty (2) & PSRL &           $1.00\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.01\pm0.00$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.02\pm0.00$ &    $0.00\pm0.00$ &             $9/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Empty (3) & PSRL &           $0.60\pm0.49$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.00\pm0.00$ &    $0.02\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.01\pm0.00$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Empty (4) & PSRL &           $1.00\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.35\pm0.17$ &    $0.02\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.01\pm0.00$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Empty (5) & PSRL &           $1.00\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.75\pm0.20$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.08\pm0.20$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Rooms (1) & PSRL &           $1.00\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.01\pm0.01$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.78\pm0.39$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Rooms (2) & PSRL &           $1.00\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.01\pm0.01$ &    $0.02\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.02\pm0.01$ &    $0.00\pm0.00$ &            $18/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
MG-Rooms (3) & PSRL &           $1.00\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.02\pm0.02$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.66\pm0.46$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
RiverSwim (1) & PSRL &           $0.00\pm0.01$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.16\pm0.03$ &    $0.03\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.00\pm0.00$ &    $0.02\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
RiverSwim (2) & PSRL &           $0.01\pm0.00$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.34\pm0.13$ &    $0.02\pm0.01$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.02\pm0.01$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
SimpleGrid (1) & PSRL &           $0.93\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.11\pm0.01$ &    $0.02\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.01\pm0.00$ &    $0.00\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
SimpleGrid (2) & PSRL &           $0.45\pm0.15$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.01\pm0.00$ &    $0.02\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.01\pm0.00$ &    $0.00\pm0.00$ &            $13/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
SimpleGrid (3) & PSRL &           $0.93\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.15\pm0.01$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.70\pm0.39$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
SimpleGrid (4) & PSRL &           $0.50\pm0.00$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.01\pm0.00$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.33\pm0.23$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{1-4}
Taxi (1) & PSRL &           $0.94\pm0.03$ &    $0.00\pm0.00$ &             $0/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & Q-learning &           $0.95\pm0.00$ &    $0.01\pm0.00$ &            $20/20$ \\
\arrayrulecolor{black!15}\cmidrule{2-4}
         & UCRL2 &           $0.12\pm0.01$ &    $0.01\pm0.00$ &            $20/20$ \\
\bottomrule
\end{tabular}

Plots

Hardness space plot

The plot_indicator_in_hardness_space function produces a plot that places the average cumulative regret obtained by each agent in the benchmark MDPs in the position corresponding to the diameter and environmental value norm of the corresponding MDP. This plot enables investigating which kind of complexity impacts the performance of the agents most.

fig = plot_indicator_in_hardness_space(benchmark_log_folder, fontsize=24, savefig_folder = None)
../_images/benchmark-analysis_12_0.png

Online agents’ performances

The agent_performances_per_mdp_plot function produces a plot that shows the values for a given indicator during the agent/MDP interactions. This plot enables easily comparing agents’ performances in the benchmark MDPs and provides an intuitive overview of the critical moments of the agent/MDP interaction, e.g., when an agent runs out of time or reaches the optimal policy.

fig = agent_performances_per_mdp_plot(
    benchmark_log_folder,
    "normalized_cumulative_regret",
    figsize_scale=5,
    standard_error=True,
    n_rows=7,
    savefig_folder = None
)
../_images/benchmark-analysis_15_0.png