Description
The agent gets a reward for reaching the red goal.
Each iteration the reward from reaching the target reaches further back in the Q-Table.
The arrows indicate which action has the highest rating.

The code can be found at github.