Skip to content

Remove env creation logic from TrainerController #1562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 24, 2019

Conversation

harperj
Copy link
Contributor

@harperj harperj commented Jan 3, 2019

Currently TrainerController includes logic related to creating the
UnityEnvironment, which causes poor separation of concerns between
the learn.py application script, TrainerController and UnityEnvironment:

  • TrainerController must know about the proper way to instantiate the
    UnityEnvironment, which may differ from application to application.
    This also makes mocking or subclassing UnityEnvironment more
    difficult.
  • Many arguments are passed by learn.py to TrainerController and passed
    along to UnityEnvironment.

This change moves environment construction logic into learn.py, as part
of the greater refactor to separate trainer logic from actor / environment.

@harperj harperj requested review from eshvk and vincentpierre January 3, 2019 23:00
@harperj
Copy link
Contributor Author

harperj commented Jan 3, 2019

Apologies for the size of the PR, though most of it was moving code it still grew a bit bigger than I'd like.

@@ -222,7 +223,7 @@ def __str__(self):
for k in self._resetParameters])) + '\n' + \
'\n'.join([str(self._brains[b]) for b in self._brains])

def reset(self, config=None, train_mode=True) -> AllBrainInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it will no longer be possible to switch between training and inference configurations at reset ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Is this a case it is important for us to support? This seems to me like an uncommon enough case that it would be fine to create a new environment, but I'm not sure I fully understand it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding @awjuliani to this conversation because he is the one who asked for this feature. I personally think the user should be able to change the configuration (training vs inference) at each step and not only at reset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to not remove this feature ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vince. No reason to remove this feature, which is useful for those using ML-Agents in the context of Jupyter notebooks or other interactive tutorials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, will fix

if trainer_parameters_dict[brain_name]['trainer'] == 'offline_bc':
self.trainers[brain_name] = OfflineBCTrainer(
self.env.brains[brain_name],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure BC works, I think there might be a bug here. self.env.brains contains all the brains, not just the external ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting the brain_name from self.external_brains and just looking them up them in self.env.brains, so I think it should be fine.

I've double checked and it works as expected for PushBlockIL

@Unity-Technologies Unity-Technologies deleted a comment from cratkid Jan 7, 2019
Copy link
Contributor

@awjuliani awjuliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. All that needs changing is to re-implement the inference/training mode switch in reset().

@@ -244,7 +245,7 @@ def reset(self, config=None, train_mode=True) -> AllBrainInfo:

if self._loaded:
outputs = self.communicator.exchange(
self._generate_reset_input(train_mode, config)
self._generate_reset_input(self.train_mode, config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change the signature of this method to use train_mode which is a member of the UnityEnvironment class? i.e.self._generate_reset_input(config)?



def run_training(sub_id, run_seed, run_options, process_queue):
def run_training(sub_id: int, run_seed: int, run_options, process_queue):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on type hints :)

Currently TrainerController includes logic related to creating the
UnityEnvironment, which causes poor separation of concerns between
the learn.py application script, TrainerController and UnityEnvironment:

* TrainerController must know about the proper way to instantiate the
  UnityEnvironment, which may differ from application to application.
  This also makes mocking or subclassing UnityEnvironment more
  difficult.
* Many arguments are passed by learn.py to TrainerController and passed
  along to UnityEnvironment.

This change moves environment construction logic into learn.py, as part
of the greater refactor to separate trainer logic from actor / environment.
* Move load_config before environment creation
* Extract curriculum loading logic into its own method
@eshvk
Copy link
Contributor

eshvk commented Jan 24, 2019

:shipit:

@harperj harperj force-pushed the develop-jh-distributed branch from bfde780 to 9a323f5 Compare January 24, 2019 01:43
@harperj harperj merged commit 553c6b7 into develop Jan 24, 2019
mantasp added a commit that referenced this pull request Jan 28, 2019
…agents into develop-barracuda

* 'develop-barracuda' of github.com:Unity-Technologies/ml-agents:
  deleted dead meta file and added a note on the OpenGLCore Graphics API
  Barracuda : Updating the documentation (#1607)
  Remove env creation logic from TrainerController (#1562)
  Fix In editor Docker training (#1582)
  Only using multiprocess when --num-runs>1 (#1583)
  Replace AddVectorObs(float[]) and AddVectorObs(List<float>) with a more generic AddVectorObs(IEnumerable<float>) (#1540)
  fixed the windows ctrl-c bug (#1558)
  Improve Gym wrapper compatibility and add Dopamine documentation (#1541)
  Fix typo in documentation (#1516)
  Update curricula brain names for 0.6
  Addressing #1537
  Fix for divide-by-zero error with Discrete Actions  (#1520)
  Documentation tweaks and updates (#1479)
mantasp added a commit that referenced this pull request Jan 29, 2019
* develop-barracuda:
  Backup and restore fixedDeltaTime and maximumDeltaTime on Academy init / shutdown
  Restore global gravity value when Academy gets destroyed
  deleted dead meta file and added a note on the OpenGLCore Graphics API
  Barracuda : Updating the documentation (#1607)
  Remove env creation logic from TrainerController (#1562)
  Fix In editor Docker training (#1582)
  Only using multiprocess when --num-runs>1 (#1583)
  Replace AddVectorObs(float[]) and AddVectorObs(List<float>) with a more generic AddVectorObs(IEnumerable<float>) (#1540)
  fixed the windows ctrl-c bug (#1558)
  Improve Gym wrapper compatibility and add Dopamine documentation (#1541)
  Fix typo in documentation (#1516)
  Update curricula brain names for 0.6
  Addressing #1537
  Fix for divide-by-zero error with Discrete Actions  (#1520)
  Documentation tweaks and updates (#1479)
harperj pushed a commit that referenced this pull request Feb 20, 2019
* Remove env creation logic from TrainerController

Currently TrainerController includes logic related to creating the
UnityEnvironment, which causes poor separation of concerns between
the learn.py application script, TrainerController and UnityEnvironment:

* TrainerController must know about the proper way to instantiate the
  UnityEnvironment, which may differ from application to application.
  This also makes mocking or subclassing UnityEnvironment more
  difficult.
* Many arguments are passed by learn.py to TrainerController and passed
  along to UnityEnvironment.

This change moves environment construction logic into learn.py, as part
of the greater refactor to separate trainer logic from actor / environment.
@awjuliani awjuliani deleted the develop-jh-distributed branch July 23, 2019 20:17
LeSphax pushed a commit to LeSphax/ml-agents-1 that referenced this pull request May 3, 2020
…1562)

* Remove env creation logic from TrainerController

Currently TrainerController includes logic related to creating the
UnityEnvironment, which causes poor separation of concerns between
the learn.py application script, TrainerController and UnityEnvironment:

* TrainerController must know about the proper way to instantiate the
  UnityEnvironment, which may differ from application to application.
  This also makes mocking or subclassing UnityEnvironment more
  difficult.
* Many arguments are passed by learn.py to TrainerController and passed
  along to UnityEnvironment.

This change moves environment construction logic into learn.py, as part
of the greater refactor to separate trainer logic from actor / environment.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants