-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Remove env creation logic from TrainerController #1562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Apologies for the size of the PR, though most of it was moving code it still grew a bit bigger than I'd like. |
@@ -222,7 +223,7 @@ def __str__(self): | |||
for k in self._resetParameters])) + '\n' + \ | |||
'\n'.join([str(self._brains[b]) for b in self._brains]) | |||
|
|||
def reset(self, config=None, train_mode=True) -> AllBrainInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean it will no longer be possible to switch between training and inference configurations at reset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. Is this a case it is important for us to support? This seems to me like an uncommon enough case that it would be fine to create a new environment, but I'm not sure I fully understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding @awjuliani to this conversation because he is the one who asked for this feature. I personally think the user should be able to change the configuration (training vs inference) at each step and not only at reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to not remove this feature ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Vince. No reason to remove this feature, which is useful for those using ML-Agents in the context of Jupyter notebooks or other interactive tutorials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, will fix
if trainer_parameters_dict[brain_name]['trainer'] == 'offline_bc': | ||
self.trainers[brain_name] = OfflineBCTrainer( | ||
self.env.brains[brain_name], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure BC works, I think there might be a bug here. self.env.brains
contains all the brains, not just the external ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting the brain_name
from self.external_brains
and just looking them up them in self.env.brains
, so I think it should be fine.
I've double checked and it works as expected for PushBlockIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. All that needs changing is to re-implement the inference/training mode switch in reset()
.
@@ -244,7 +245,7 @@ def reset(self, config=None, train_mode=True) -> AllBrainInfo: | |||
|
|||
if self._loaded: | |||
outputs = self.communicator.exchange( | |||
self._generate_reset_input(train_mode, config) | |||
self._generate_reset_input(self.train_mode, config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just change the signature of this method to use train_mode which is a member of the UnityEnvironment class? i.e.self._generate_reset_input(config)
?
|
||
|
||
def run_training(sub_id, run_seed, run_options, process_queue): | ||
def run_training(sub_id: int, run_seed: int, run_options, process_queue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on type hints :)
Currently TrainerController includes logic related to creating the UnityEnvironment, which causes poor separation of concerns between the learn.py application script, TrainerController and UnityEnvironment: * TrainerController must know about the proper way to instantiate the UnityEnvironment, which may differ from application to application. This also makes mocking or subclassing UnityEnvironment more difficult. * Many arguments are passed by learn.py to TrainerController and passed along to UnityEnvironment. This change moves environment construction logic into learn.py, as part of the greater refactor to separate trainer logic from actor / environment.
* Move load_config before environment creation * Extract curriculum loading logic into its own method
|
bfde780
to
9a323f5
Compare
…agents into develop-barracuda * 'develop-barracuda' of github.com:Unity-Technologies/ml-agents: deleted dead meta file and added a note on the OpenGLCore Graphics API Barracuda : Updating the documentation (#1607) Remove env creation logic from TrainerController (#1562) Fix In editor Docker training (#1582) Only using multiprocess when --num-runs>1 (#1583) Replace AddVectorObs(float[]) and AddVectorObs(List<float>) with a more generic AddVectorObs(IEnumerable<float>) (#1540) fixed the windows ctrl-c bug (#1558) Improve Gym wrapper compatibility and add Dopamine documentation (#1541) Fix typo in documentation (#1516) Update curricula brain names for 0.6 Addressing #1537 Fix for divide-by-zero error with Discrete Actions (#1520) Documentation tweaks and updates (#1479)
* develop-barracuda: Backup and restore fixedDeltaTime and maximumDeltaTime on Academy init / shutdown Restore global gravity value when Academy gets destroyed deleted dead meta file and added a note on the OpenGLCore Graphics API Barracuda : Updating the documentation (#1607) Remove env creation logic from TrainerController (#1562) Fix In editor Docker training (#1582) Only using multiprocess when --num-runs>1 (#1583) Replace AddVectorObs(float[]) and AddVectorObs(List<float>) with a more generic AddVectorObs(IEnumerable<float>) (#1540) fixed the windows ctrl-c bug (#1558) Improve Gym wrapper compatibility and add Dopamine documentation (#1541) Fix typo in documentation (#1516) Update curricula brain names for 0.6 Addressing #1537 Fix for divide-by-zero error with Discrete Actions (#1520) Documentation tweaks and updates (#1479)
* Remove env creation logic from TrainerController Currently TrainerController includes logic related to creating the UnityEnvironment, which causes poor separation of concerns between the learn.py application script, TrainerController and UnityEnvironment: * TrainerController must know about the proper way to instantiate the UnityEnvironment, which may differ from application to application. This also makes mocking or subclassing UnityEnvironment more difficult. * Many arguments are passed by learn.py to TrainerController and passed along to UnityEnvironment. This change moves environment construction logic into learn.py, as part of the greater refactor to separate trainer logic from actor / environment.
…1562) * Remove env creation logic from TrainerController Currently TrainerController includes logic related to creating the UnityEnvironment, which causes poor separation of concerns between the learn.py application script, TrainerController and UnityEnvironment: * TrainerController must know about the proper way to instantiate the UnityEnvironment, which may differ from application to application. This also makes mocking or subclassing UnityEnvironment more difficult. * Many arguments are passed by learn.py to TrainerController and passed along to UnityEnvironment. This change moves environment construction logic into learn.py, as part of the greater refactor to separate trainer logic from actor / environment.
Currently TrainerController includes logic related to creating the
UnityEnvironment, which causes poor separation of concerns between
the learn.py application script, TrainerController and UnityEnvironment:
UnityEnvironment, which may differ from application to application.
This also makes mocking or subclassing UnityEnvironment more
difficult.
along to UnityEnvironment.
This change moves environment construction logic into learn.py, as part
of the greater refactor to separate trainer logic from actor / environment.