Add Seeding, MaxStepReached, and Bootstrapping fix #303

awjuliani · 2018-02-02T00:50:34Z

Add ability to seed learning (numpy, tensorflow, and Unity) with --seed flag.
Add maxStepReached flag to Agents and Academy.
Change way value bootstrapping works in PPO to take advantage of timeouts.
Default size of GridWorld changed to 5x5 in order to validate bootstrapping changes.

eshvk · 2018-02-02T18:57:20Z

python/learn.py

    fast_simulation = not bool(options['--slow'])

-    env = UnityEnvironment(file_name=env_name, worker_id=worker_id, curriculum=curriculum_file)
+    if seed != -1:


Can you explain a bit why this check exists? Why is the default pseudo randomizer explicitly to -1?

eshvk · 2018-02-02T18:58:59Z

python/learn.py

      --lesson=<n>               Start learning from this lesson [default: 0].
      --load                     Whether to load the model or randomly initialize [default: False].
-      --run-path=<path>          The sub-directory name for model and summary statistics [default: ppo]. 
+      --run-id=<path>            The sub-directory name for model and summary statistics [default: ppo]. 


If this is branching from dev-0.3, I think this is a typo and should be --run-path. model_path below uses --run-path anyways.

eshvk · 2018-02-02T19:00:42Z

python/trainer_configurations.yaml

-    beta: 2.5e-3
-    buffer_size: 5000
+    batch_size: 32
+    beta: 5.0e-3


Would prefer adding a single line comment articulating the change in default parameters. Perhaps referencing a trial run?

eshvk · 2018-02-02T19:05:32Z

python/trainers/ppo_trainer.py


        self.variable_scope = trainer_parameters['graph_scope']
        with tf.variable_scope(self.variable_scope):
+            tf.set_random_seed(seed)


So in the master learn.py you check if seed != -1 before setting the seed for tf. However PPOTrainer is again called with the seed which may potentially be -1 which means that here we could end up setting tf.set_random_seed(-1) ?

eshvk · 2018-02-02T19:09:02Z

unity-environment/Assets/ML-Agents/Scripts/ExternalCommunicator.cs

    List<float> concatenatedRewards = new List<float>(32);
    List<float> concatenatedMemories = new List<float>(1024);
    List<bool> concatenatedDones = new List<bool>(32);
+    List<bool> concatenatedMaxes = new List<bool>(32);


This are set as sane defaults. Perhaps we can do a public const int DEFAULT_NUM_AGENTS = 32 before hand?

vincentpierre · 2018-02-02T19:29:20Z

python/trainers/ppo_trainer.py

        info = info[self.brain_name]
        for l in range(len(info.agents)):
            agent_actions = self.training_buffer[info.agents[l]]['actions']
            if ((info.local_done[l] or len(agent_actions) > self.trainer_parameters['time_horizon'])


Don't you need to check if info.max_reached[l] in this line ?

vincentpierre · 2018-02-02T19:33:41Z

python/trainers/imitation_trainer.py

    """The ImitationTrainer is an implementation of the imitation learning."""
-    def __init__(self, sess, env, brain_name, trainer_parameters, training):
+    def __init__(self, sess, env, brain_name, trainer_parameters, training, seed):
        """


You need to make use of this seed.

vincentpierre · 2018-02-02T19:37:38Z

python/unityagents/environment.py

                [launch_string,
-                 '--port', str(self.port)])
+                 '--port', str(self.port),
+                 '--seed', str(seed)])


Do not send the seed if the seed is -1 or None

awjuliani added 5 commits January 25, 2018 13:02

Initial commit to add max step reached flag

86c4b21

Merge branch 'development-0.3' into dev-bootstrapping

f0487b9

Add random seed

5003c93

Enforce ELU activation for conv2d layers

458fdbc

Add hyper parameters for GridWorld 5

e13d4f3

awjuliani requested a review from vincentpierre February 2, 2018 00:50

eshvk reviewed Feb 2, 2018

View reviewed changes

vincentpierre reviewed Feb 2, 2018

View reviewed changes

Address comments

127ed60

awjuliani merged commit c87f700 into development-0.3 Feb 2, 2018

awjuliani deleted the dev-bootstrapping branch February 2, 2018 21:01

github-actions bot locked as resolved and limited conversation to collaborators May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Seeding, MaxStepReached, and Bootstrapping fix #303

Add Seeding, MaxStepReached, and Bootstrapping fix #303

Uh oh!

awjuliani commented Feb 2, 2018 •

edited

Loading

Uh oh!

eshvk Feb 2, 2018

Uh oh!

eshvk Feb 2, 2018 •

edited

Loading

Uh oh!

eshvk Feb 2, 2018

Uh oh!

eshvk Feb 2, 2018

Uh oh!

eshvk Feb 2, 2018

Uh oh!

vincentpierre Feb 2, 2018

Uh oh!

vincentpierre Feb 2, 2018

Uh oh!

vincentpierre Feb 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Seeding, MaxStepReached, and Bootstrapping fix #303

Add Seeding, MaxStepReached, and Bootstrapping fix #303

Uh oh!

Conversation

awjuliani commented Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eshvk Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

eshvk Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eshvk Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

eshvk Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

eshvk Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

vincentpierre Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

vincentpierre Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

vincentpierre Feb 2, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

awjuliani commented Feb 2, 2018 •

edited

Loading

eshvk Feb 2, 2018 •

edited

Loading