Description
This time the maze got more complex by adding a K(ey) and a D(oor).
The door only opens if the agent has previusly found the key.
In terms of the Q-Table this is realised by adding an additional dimension.
Instead of accessing the action ratings for a position, we also include if the key has been optained or not.
This is a crucial point of Q-Learning, the complexity of the environment is directly related with the Q-Table.

The code can be found at github.