-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Add Seeding, MaxStepReached, and Bootstrapping fix #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/learn.py
Outdated
| fast_simulation = not bool(options['--slow']) | ||
|
|
||
| env = UnityEnvironment(file_name=env_name, worker_id=worker_id, curriculum=curriculum_file) | ||
| if seed != -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit why this check exists? Why is the default pseudo randomizer explicitly to -1?
python/learn.py
Outdated
| --lesson=<n> Start learning from this lesson [default: 0]. | ||
| --load Whether to load the model or randomly initialize [default: False]. | ||
| --run-path=<path> The sub-directory name for model and summary statistics [default: ppo]. | ||
| --run-id=<path> The sub-directory name for model and summary statistics [default: ppo]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is branching from dev-0.3, I think this is a typo and should be --run-path. model_path below uses --run-path anyways.
| beta: 2.5e-3 | ||
| buffer_size: 5000 | ||
| batch_size: 32 | ||
| beta: 5.0e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer adding a single line comment articulating the change in default parameters. Perhaps referencing a trial run?
|
|
||
| self.variable_scope = trainer_parameters['graph_scope'] | ||
| with tf.variable_scope(self.variable_scope): | ||
| tf.set_random_seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the master learn.py you check if seed != -1 before setting the seed for tf. However PPOTrainer is again called with the seed which may potentially be -1 which means that here we could end up setting tf.set_random_seed(-1) ?
| List<float> concatenatedRewards = new List<float>(32); | ||
| List<float> concatenatedMemories = new List<float>(1024); | ||
| List<bool> concatenatedDones = new List<bool>(32); | ||
| List<bool> concatenatedMaxes = new List<bool>(32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This are set as sane defaults. Perhaps we can do a public const int DEFAULT_NUM_AGENTS = 32 before hand?
| info = info[self.brain_name] | ||
| for l in range(len(info.agents)): | ||
| agent_actions = self.training_buffer[info.agents[l]]['actions'] | ||
| if ((info.local_done[l] or len(agent_actions) > self.trainer_parameters['time_horizon']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check if info.max_reached[l] in this line ?
| """The ImitationTrainer is an implementation of the imitation learning.""" | ||
| def __init__(self, sess, env, brain_name, trainer_parameters, training): | ||
| def __init__(self, sess, env, brain_name, trainer_parameters, training, seed): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to make use of this seed.
| [launch_string, | ||
| '--port', str(self.port)]) | ||
| '--port', str(self.port), | ||
| '--seed', str(seed)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not send the seed if the seed is -1 or None
--seedflag.maxStepReachedflag to Agents and Academy.