-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Improvements for GAIL #2296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Improvements for GAIL #2296
Changes from 1 commit
Commits
Show all changes
160 commits
Select commit
Hold shift + click to select a range
eb4abf2
New version of GAIL
awjuliani d0852ac
Move Curiosity to separate class
awjuliani 4b15b80
Curiosity fully working under new system
awjuliani ad9381b
Begin implementing GAIL
awjuliani 8bf8302
fix discrete curiosity
vincentpierre d3e244e
Add expert demonstration
awjuliani a5b95f7
Remove notebook
awjuliani dc2fcaa
Record intrinsic rewards properly
awjuliani 49cff40
Add gail model updating
awjuliani 48d3769
Code cleanup
awjuliani 6eeb565
Nested structure for intrinsic rewards
awjuliani 8ca7728
Rename files
awjuliani 226b5c7
Update models so files
awjuliani 3386aa7
fix typo
awjuliani 6799756
Add reward strength parameter
awjuliani 468c407
Use dictionary of reward signals
awjuliani 519e2d3
Remove reward manager
awjuliani 7df1a69
Extrinsic reward just another type
awjuliani 99237cd
Clean up imports
awjuliani 9fa51c1
All reward signals use strength to scale output
awjuliani 7f24677
produce scaled and unscaled reward
awjuliani 4a714d0
Remove unused dictionary
awjuliani 3e2671d
Current trainer config
awjuliani 77211d8
Add discrete control and pyramid experimentation
awjuliani 2334de8
Minor changes to GAIL
awjuliani 439387e
Add relevant strength parameters
awjuliani ba793a3
Replace string
awjuliani a52ba0b
Add support for visual observations w/ GAIL
awjuliani 5b2ef22
Finish implementing visual obs for GAIL
awjuliani 13542b4
Include demo files
awjuliani ae7a8b0
Fix for RNN w/ GAIL
awjuliani bf89082
Keep track of reward streams separately
awjuliani 360482b
Bootstrap value estimates separately
awjuliani c78639d
Add value head
awjuliani 3b2485d
Use sepaprate value streams for each reward
awjuliani 40bc9ba
Add VAIL
awjuliani c6e1504
Use adaptive B
awjuliani 60d9ff7
Comments improvements
vincentpierre 49ec682
Added comments and refactored a pievce of the code
vincentpierre d9847e0
Added Comments
vincentpierre dc7620b
Fix on Curriosity
vincentpierre 28e0bd5
Fixed typo
vincentpierre 0257d2b
Added a forgotten comment
vincentpierre fd55c00
Stabilized Vail learning. Still no learning for Walker
vincentpierre 2343b3f
Fixing typo on curiosity when using visual input
vincentpierre c74ad19
Added some comments
vincentpierre 2dd7c61
modified the hyperparameters
vincentpierre 42429a5
Fixed some of the tests, will need to refactor the reward signals in …
vincentpierre ec0e106
Putting the has_updated fags inside each reward signal
vincentpierre 6ae1c2f
Added comments for the GAIL update method
vincentpierre ef65bc2
initial commit
vincentpierre 8cbdbf4
No more normalization after pre-training
vincentpierre 3f35d45
Fixed large bug in Vail
vincentpierre 3be9be7
BUG FIX VAIL : The noise dimension was wrong and the discriminator sc…
vincentpierre 9e9b4ff
implemented discrete control pretraining
vincentpierre d537a6b
bug fixing
vincentpierre 713263c
Bug fix, still not tested for recurrent
vincentpierre ca5b948
Fixing beta in GAIL so it will change properly
vincentpierre 671629e
Allow for not specifying an extrinsic reward
a31c8a5
Rough implementation of annealed BC
93cb4ff
Fixes for rebase onto v0.8
6534291
Moved BC trainer out of reward_signals and code cleanup
700b478
Rename folder to "components"
71eedf5
Fix renaming in Curiosity
83b4603
Remove demo_aided as a required param
9e4b4e2
Make old BC compatible
f814432
Fix visual obs for curiosity
e10194f
Tweaks all around
fdcfb30
Add reward normalization and bug fix
cb5e927
Load multiple .demo files. Fix bug with csv nans
2c5c853
Remove reward normalization
e66a343
Rename demo_aided to pretraining
0a98289
Fix bc configs
cd6e498
Increase small val to prevent NaNs
d23f6f3
Fix init in components
d93e36e
Merge remote-tracking branch 'origin/develop' into develop-irl-ervin
1bf68c7
Fix PPO tests
9da6e6c
Refactor components into common location
4a57a32
Minor code cleanup
11cc6f9
Preliminary RNN support
e66a6f7
Revert regression with NaNs for LSTMs
bea2bc7
Better LSTM support for BC
6302a55
Code cleanup and black reformat
d1cded9
Remove demo_helper and reformat signal
2b98f3b
Tests for GAIL and curiosity
440146b
Fix Black again...
98f9160
Tests for BCModule and visual tests for RewardSignals
5c923cb
Refactor to new structure and use class generator
e7ce888
Generalize reward_signal interface and stats
858194f
Fix incorrect environment reward reporting
28bceba
Rename reward signals for consistency. clean up comments
248cae4
Default trainer config (for cloud testing)
744df94
Remove "curiosity_enc_size" from the regular params
31dabfc
Fix PushBlock config
a557f84
Revert Pyramids environment
d4dbddb
Fix indexing issue with add_experiences
ddb673b
Fix tests
975e05b
Change to BCModule
a83fd5d
Merge branch 'develop' into develop-irl-ervin
fae7646
Remove the bools for reward signals
5cf98ac
Make update take in a mini buffer rather than the
d1afc9b
Always reference reward signals name and not index
80f2c75
More code cleanup
394b25a
Clean up reward_signal abstract class
a9724a3
Fix issue with recording values
66fef61
Add use_actions to GAIL
0e3be1d
Add documentation for Reward Signals
015f50d
Add documentation for GAIL
7c3059b
Remove unused variables in BCModel
16c3c06
Remove Entropy Reward Signal
1fbfa5d
Change tests to use safe_load
f9a3808
Don't use mutable default
ce551bf
Set defaults in parent __init__ (Reward Signals)
3e7ea5b
Remove unneccesary lines
eda6993
Merge branch 'develop' into develop-irl-ervin
cace2e6
Make some files same as develop
3f161fc
Add demos for example envs
2794c75
Update docs
48b7b43
Fix tests, imports, cleanup code
f47b173
Make pretrainer stats similar to reward signal
1e257d4
Merge branch 'develop' of github.com:Unity-Technologies/ml-agents int…
a8b5d09
Fixes after merge develop
fb3d5ae
Additional tests, bugfix for LSTM+BC+Visual
7e0a677
GAIL code cleanup
1953233
Add types to BCModel
593f819
Fix bugs with incorrect return values
98b7732
Change tests to use RewardSignalResult
6ee0c63
Add docs for pretraining and plot for all three
6d37be2
Fix bug with demo loading directories, add test
c672ad9
Add typing to BCModule, GAIL, and demo loader
61e84c6
Fix black
9d43336
Fix mypy issues
99a2a3c
Codacy cleanup
cbb1af3
Doc fixes
736c807
More sophisticated tests for reward signals
04e22fd
Fix bug in GAIL when num_sequences is 1
8ead02e
Clean up use_vail and feed_dicts
71f85e1
Change to swish from learningmodel
5537e60
Make variables more readable
73d20cb
Code and comment cleanup
f4950b4
Not all should be swish
6784ee6
Remove prints
2704e62
Doc updates
1206a89
Make VAIL default false, improve logging
2407a5a
Fix tests for sequences
4aa033b
Change max_batches and set VAIL to default to false
f0d7368
Minor code refactor
cfb88a1
Merge branch 'develop' of github.com:Unity-Technologies/ml-agents int…
0887859
Add gradient penalty to GAIL (GAIL-GP)
e346e6d
Fix NaNs in gradient penalty
43f8602
Only terminate value estimate for extrinsic signal
5eaaa76
Update imitation learning docs
e1ef0ed
Update docstring
576fbc4
Update GAIL config
4bbeb91
Update comments and variable/method names
14ef4b1
Flip flag to use_terminal_states
410eb00
Don't create gradient magnitude if not neccessary
32c0e63
Remove gamma where not needed
4c7b547
Fix GridWorld gamma
481d8e4
Merge branch 'develop' into develop-irl-ervin
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Update comments and variable/method names
- Loading branch information
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does norm result in NaN? I could see that happening if there was overflow, but in that case adding an epsilon isn't going to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the norm, it's the gradient of the norm. At 0 the gradient of sqrt() is a horizontal line. I'll update the comment to reflect this.