Skip to content

Conversation

@ervteng
Copy link
Contributor

@ervteng ervteng commented Jun 23, 2021

Proposed change(s)

This PR adds a feature that was inspired by the approach used by DeepMind in this paper. By specifying (i.e. giving the directory to that run id) multiple different policies as "priors" in the YAML file, the code will load those policies and use them as regularization priors for the learning policy. This has proven to be effective in Dodgeball, where training with two priors (shooting and flag-getting) leads to a skilled policy in roughly 1/3 the time. See ELO plot below.

image

This PR also contains a version of WallJump that can be broken down into subtasks.

TODO

Ideally in order for this to be a feature, we'd want too add these components:

  • Add to PPO and SAC (currently only in POCA)
  • Allow checkpoints with different network architectures (e.g. different num_layers) to be handled properly (I don't see how we could change the obs space and action space, though).
  • Solve the entropy issue (entropy seems to increase when more than one prior is specified, probably b/c it's trying to learn a multimodal policy.
  • Documentation and tests

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@miguelalonsojr miguelalonsojr self-requested a review January 18, 2022 22:53
@miguelalonsojr miguelalonsojr self-assigned this Jan 18, 2022
@miguelalonsojr miguelalonsojr requested review from jrupert-unity and removed request for jrupert-unity and miguelalonsojr January 21, 2022 18:35
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ervin Teng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants