Modification of reward signals and rl_trainer for SAC #2433

ervteng · 2019-08-14T00:05:23Z

Had to make additional changes to reward signals and rl_trainer to abstract to off-policy.

Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo.
Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals.
Moves end_episode to rl_trainer

…order to test them all appropriately

harperj

A couple of questions and minor suggestions, but mostly LGTM.

ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py

ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py

ml-agents/mlagents/trainers/rl_trainer.py

harperj · 2019-08-15T20:24:43Z

Latest changes LGTM. Can you briefly summarize how you tested this PR?

ervteng · 2019-08-15T20:37:02Z

Latest changes LGTM. Can you briefly summarize how you tested this PR?

Tested GAIL with Crawler and Curiosity + GAIL with Pyramids. Tested GAIL with GridWorld for visual obs.

harperj

Sounds good, approved!

awjuliani and others added 30 commits April 24, 2019 14:24

All reward signals use strength to scale output

9fa51c1

produce scaled and unscaled reward

7f24677

Remove unused dictionary

4a714d0

Current trainer config

3e2671d

Add discrete control and pyramid experimentation

77211d8

Minor changes to GAIL

2334de8

Add relevant strength parameters

439387e

Replace string

ba793a3

Add support for visual observations w/ GAIL

a52ba0b

Finish implementing visual obs for GAIL

5b2ef22

Include demo files

13542b4

Fix for RNN w/ GAIL

ae7a8b0

Keep track of reward streams separately

bf89082

Bootstrap value estimates separately

360482b

Add value head

c78639d

Use sepaprate value streams for each reward

3b2485d

Add VAIL

40bc9ba

Use adaptive B

c6e1504

Comments improvements

60d9ff7

Added comments and refactored a pievce of the code

49ec682

Added Comments

d9847e0

Fix on Curriosity

dc7620b

Fixed typo

28e0bd5

Added a forgotten comment

0257d2b

Stabilized Vail learning. Still no learning for Walker

fd55c00

Fixing typo on curiosity when using visual input

2343b3f

Added some comments

c74ad19

modified the hyperparameters

2dd7c61

Fixed some of the tests, will need to refactor the reward signals in …

42429a5

…order to test them all appropriately

Putting the has_updated fags inside each reward signal

ec0e106

Ervin Teng added 13 commits August 9, 2019 10:42

Minor refactor of rl_trainer add_experiences for SAC

1592b64

Add SAC environment test

3bafd3c

Refactor reward signals to remove duplicate code

725b3e5

Some reward signal cleanup

7e0456d

Move mock_brain creation to common file

b2768e7

Improve SAC policy testing

206ba89

Fix loading replay buffer and add tests

3a7d872

Move end_episode

a61e3e9

Revert trainer_config for PPO

a3ca93c

Merge reward signal parallel update

656a3b4

Fix issue with dones in GAIL

8459106

Fix other reward signals not reported to TB

aba9a92

Remove SAC changes

1a31b96

ervteng requested a review from harperj August 14, 2019 00:05

ervteng changed the base branch from master to develop August 14, 2019 00:05

Ervin Teng added 6 commits August 13, 2019 17:05

Remove SAC trainer config

1873f9b

Revert __init__

19e1607

Revert tests

4de66a1

Revert trainer util

795730b

Use sample method for GAIL

c227397

Add policy estimate and expert estimate for debug

45fa433

harperj reviewed Aug 14, 2019

View reviewed changes

Ervin Teng added 2 commits August 14, 2019 14:26

Address comments

5f595ee

Fix bug with BCModule and LSTM

47f5f84

harperj approved these changes Aug 15, 2019

View reviewed changes

ervteng merged commit 27cd6dd into develop Aug 15, 2019

ervteng deleted the develop-rewardsignalsforsac branch August 15, 2019 22:29

github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modification of reward signals and rl_trainer for SAC #2433

Modification of reward signals and rl_trainer for SAC #2433

Uh oh!

ervteng commented Aug 14, 2019 •

edited

Loading

Uh oh!

harperj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harperj commented Aug 15, 2019

Uh oh!

ervteng commented Aug 15, 2019

Uh oh!

harperj left a comment

Uh oh!

Uh oh!

Modification of reward signals and rl_trainer for SAC #2433

Modification of reward signals and rl_trainer for SAC #2433

Uh oh!

Conversation

ervteng commented Aug 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harperj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harperj commented Aug 15, 2019

Uh oh!

ervteng commented Aug 15, 2019

Uh oh!

harperj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ervteng commented Aug 14, 2019 •

edited

Loading