Improvements for GAIL #2296

ervteng · 2019-07-19T21:48:28Z

Don't 0 value bootstrap for GAIL and Curiosity
Add gradient penalties to GAN to help with stability
Add gail_config.yaml with GAIL examples
Cleaned up trainer_config.yaml and unnecessary gammas
Documentation updates
Code cleanup

# Conflicts: # ml-agents/mlagents/trainers/models.py

…o develop-irl-ervin # Conflicts: # ml-agents/mlagents/trainers/components/bc/model.py # ml-agents/mlagents/trainers/components/bc/module.py # ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py # ml-agents/mlagents/trainers/components/reward_signals/gail/model.py # ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py

awjuliani · 2019-07-22T20:00:14Z

ml-agents/mlagents/trainers/components/reward_signals/gail/model.py

@@ -261,5 +292,8 @@ def create_loss(self, learning_rate: float) -> None:
            )
        else:
            self.loss = self.discriminator_loss
+
+        self.loss = self.loss + self.gradient_penalty * self.compute_gradient_penalty()


I think we need better var names here. From the current names I would expect self.compute_gradient_penalty() to return the gradient penalty, but it returns the magnitude of the gradient.

Changed this to create_gradient_magnitude and the weight to be gradient_penalty_weight

chriselion · 2019-07-22T20:06:52Z

ml-agents/mlagents/trainers/components/reward_signals/gail/model.py

+        grad = tf.gradients(grad_estimate, [grad_input])[0]
+
+        # Norm, like log, can return NaN. Use our own safe_norm
+        safe_norm = tf.sqrt(tf.reduce_sum(grad ** 2, axis=-1) + EPSILON)


How does norm result in NaN? I could see that happening if there was overflow, but in that case adding an epsilon isn't going to help.

Not the norm, it's the gradient of the norm. At 0 the gradient of sqrt() is a horizontal line. I'll update the comment to reflect this.

awjuliani · 2019-07-22T20:08:46Z

ml-agents/mlagents/trainers/components/reward_signals/gail/model.py

-        self.intrinsic_reward = -tf.log(1.0 - self.discriminator_score + 1e-7)
+        self.intrinsic_reward = -tf.log(1.0 - self.discriminator_score + EPSILON)
+
+    def compute_gradient_penalty(self) -> tf.Tensor:


What is the performance improvement from this? Faster/more stable convergence?

Faster convergence, esp. for Crawler. I'm seeing about 25% less steps required with PPO. TBH a large motivation for this is for SAC, where without GP the discriminator will overfit very quickly - GAIL + SAC doesn't work at all without it.

The squiggly line does not lie.

awjuliani

Overall looks good. It would be nice to see some rough numbers showing better stability/performance with the gradient penalty as compared to without.

chriselion · 2019-07-22T20:16:21Z

ml-agents/mlagents/trainers/components/reward_signals/reward_signal.py

@@ -32,6 +32,9 @@ def __init__(self, policy: TFPolicy, strength: float, gamma: float):
        short_name = class_name.replace("RewardSignal", "")
        self.stat_name = f"Policy/{short_name} Reward"
        self.value_name = f"Policy/{short_name} Value Estimate"
+        # Don't terminate discounted reward computation at Done. Useful for eliminating positive bias in rewards with
+        # no natural end, e.g. GAIL or Curiosity
+        self.ignore_terminal_states = False


nit (feel free to ignore): could you name this in terms of what it does instead of what it doesn't do? I find double-negatives like this a little harder to understand when trying to reason about the code.

It was confusing for me too while writing it - I reversed it

Cool, not sure if it's on any official naming scheme guides, but I generally try avoid using variables with negative names for this reason.

chriselion · 2019-07-22T20:21:17Z

ml-agents/mlagents/trainers/components/reward_signals/gail/model.py

@@ -261,5 +292,8 @@ def create_loss(self, learning_rate: float) -> None:
            )
        else:
            self.loss = self.discriminator_loss
+
+        self.loss = self.loss + self.gradient_penalty * self.compute_gradient_penalty()


Maybe don't use self.compute_gradient_penalty() if self.gradient_penalty == 0.0 (to allow turning it off if you really don't want it for some reason)?

Made it only happen if the penalty is greater than 0

awjuliani and others added 30 commits April 24, 2019 13:49

New version of GAIL

eb4abf2

Move Curiosity to separate class

d0852ac

Curiosity fully working under new system

4b15b80

Begin implementing GAIL

ad9381b

fix discrete curiosity

8bf8302

# Conflicts: # ml-agents/mlagents/trainers/models.py

Add expert demonstration

d3e244e

Remove notebook

a5b95f7

Record intrinsic rewards properly

dc2fcaa

Add gail model updating

49cff40

Code cleanup

48d3769

Nested structure for intrinsic rewards

6eeb565

Rename files

8ca7728

Update models so files

226b5c7

fix typo

3386aa7

Add reward strength parameter

6799756

Use dictionary of reward signals

468c407

Remove reward manager

519e2d3

Extrinsic reward just another type

7df1a69

Clean up imports

99237cd

All reward signals use strength to scale output

9fa51c1

produce scaled and unscaled reward

7f24677

Remove unused dictionary

4a714d0

Current trainer config

3e2671d

Add discrete control and pyramid experimentation

77211d8

Minor changes to GAIL

2334de8

Add relevant strength parameters

439387e

Replace string

ba793a3

Add support for visual observations w/ GAIL

a52ba0b

Finish implementing visual obs for GAIL

5b2ef22

Include demo files

13542b4

Ervin Teng added 7 commits July 15, 2019 18:29

Change max_batches and set VAIL to default to false

4aa033b

Minor code refactor

f0d7368

Add gradient penalty to GAIL (GAIL-GP)

0887859

Fix NaNs in gradient penalty

e346e6d

Only terminate value estimate for extrinsic signal

43f8602

Update imitation learning docs

5eaaa76

ervteng marked this pull request as ready for review July 20, 2019 01:20

ervteng requested review from awjuliani, xiaomaogy and chriselion July 20, 2019 01:20

Ervin Teng added 2 commits July 19, 2019 18:41

Update docstring

e1ef0ed

Update GAIL config

576fbc4

awjuliani reviewed Jul 22, 2019

View reviewed changes

chriselion reviewed Jul 22, 2019

View reviewed changes

awjuliani reviewed Jul 22, 2019

View reviewed changes

awjuliani approved these changes Jul 22, 2019

View reviewed changes

chriselion reviewed Jul 22, 2019

View reviewed changes

Ervin Teng added 4 commits July 22, 2019 13:26

Update comments and variable/method names

4bbeb91

Flip flag to use_terminal_states

14ef4b1

Don't create gradient magnitude if not neccessary

410eb00

Remove gamma where not needed

32c0e63

ervteng mentioned this pull request Jul 22, 2019

proof of concept - attrs TrainerParameters #2311

Closed

Ervin Teng added 2 commits July 22, 2019 13:43

Fix GridWorld gamma

4c7b547

Merge branch 'develop' into develop-irl-ervin

481d8e4

chriselion approved these changes Jul 22, 2019

View reviewed changes

ervteng merged commit f576c88 into develop Jul 22, 2019

awjuliani deleted the develop-irl-ervin branch July 23, 2019 20:20

github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improvements for GAIL #2296

Improvements for GAIL #2296

Uh oh!

ervteng commented Jul 19, 2019 •

edited

Loading

Uh oh!

awjuliani Jul 22, 2019

Uh oh!

ervteng Jul 22, 2019

Uh oh!

chriselion Jul 22, 2019

Uh oh!

ervteng Jul 22, 2019

Uh oh!

awjuliani Jul 22, 2019

Uh oh!

ervteng Jul 22, 2019

Uh oh!

ervteng Jul 22, 2019

Uh oh!

awjuliani Jul 22, 2019

Uh oh!

awjuliani left a comment

Uh oh!

chriselion Jul 22, 2019

Uh oh!

ervteng Jul 22, 2019

Uh oh!

chriselion Jul 22, 2019

Uh oh!

chriselion Jul 22, 2019

Uh oh!

ervteng Jul 22, 2019

Uh oh!

Uh oh!

Improvements for GAIL #2296

Improvements for GAIL #2296

Uh oh!

Conversation

ervteng commented Jul 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awjuliani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ervteng commented Jul 19, 2019 •

edited

Loading