-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Modification of reward signals and rl_trainer for SAC #2433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…order to test them all appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions and minor suggestions, but mostly LGTM.
ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
Outdated
Show resolved
Hide resolved
ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
Outdated
Show resolved
Hide resolved
ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py
Outdated
Show resolved
Hide resolved
Latest changes LGTM. Can you briefly summarize how you tested this PR? |
Tested GAIL with Crawler and Curiosity + GAIL with Pyramids. Tested GAIL with GridWorld for visual obs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, approved!
Had to make additional changes to reward signals and rl_trainer to abstract to off-policy.
evaluate_batch
to reward signals. Evaluates on minibatch rather than on BrainInfo.end_episode
to rl_trainer