I'm researching on "GridWorld" from Q-learning Perspective.I have issues regarding the following question
1) If there is a case where rewards are positive for goals, negative
for running into the edge of the world, and zero otherwise.Are
the Signs of the reward play a significant role or the Intervals
between them play a significant role ?