Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Different final epsilon and evaluation epsilon for Atari implementations #429

Open
3 tasks done
pseudo-rnd-thoughts opened this issue Nov 15, 2023 · 0 comments
Open
3 tasks done

Comments

@pseudo-rnd-thoughts
Copy link
Collaborator

Problem Description

Within the Q-learning implementation for Atari (DQN, C51 and QDAgger DQN, both jax and pytorch implementations), then there are different final epsilon values used during training (example at 0.01) and the epsilon value used during evaluation at the end (example at 0.05)

I believe this will result in the Atari environments having unfair evaluations compared to the true agent performance.

I don't think this affects the training curves as we mostly compare the episodic rewards rather than evaluation results but we should fix for users comparing the evaluation results.

This bug appears to have occurred when copying code from the DQN agent where 0.05 is the final epsilon.

Checklist

Current Behavior

Agent policies should be evaluated with their final epsilon used during training

Expected Behavior

Agent policies are being evaluated at a different and higher epsilon than the training epsilon

Possible Solution

Modify all Q-learning agents to use the evaluation epsilon equal to the final training epsilon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant