[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439

ervteng · 2021-06-23T23:16:22Z

Proposed change(s)

This PR adds a feature that was inspired by the approach used by DeepMind in this paper. By specifying (i.e. giving the directory to that run id) multiple different policies as "priors" in the YAML file, the code will load those policies and use them as regularization priors for the learning policy. This has proven to be effective in Dodgeball, where training with two priors (shooting and flag-getting) leads to a skilled policy in roughly 1/3 the time. See ELO plot below.

This PR also contains a version of WallJump that can be broken down into subtasks.

TODO

Ideally in order for this to be a feature, we'd want too add these components:

Add to PPO and SAC (currently only in POCA)
Allow checkpoints with different network architectures (e.g. different num_layers) to be handled properly (I don't see how we could change the obs space and action space, though).
Solve the entropy issue (entropy seems to increase when more than one prior is specified, probably b/c it's trying to learn a multimodal policy.
Documentation and tests

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

CLAassistant · 2024-04-01T01:35:53Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Ervin Teng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Ervin Teng added 11 commits June 15, 2021 18:49

Add feature to learn with priors

7e933bb

Move priors to hyperparameters

1309f14

remove unneeded flatten

1f80b5f

Some cleanup

73d7bc8

Some cleanup

897acda

no grad for prior policies

29a3cc6

Add prior loss to stats

759a1a9

Fix dimension bug

fb5b856

Edit some stuff for clarity

48084f1

Multi-mode WallJump

2bc070c

Fix missing script in multi-mode walljump

4a26df4

miguelalonsojr self-requested a review January 18, 2022 22:53

miguelalonsojr self-assigned this Jan 18, 2022

miguelalonsojr requested review from jrupert-unity and removed request for jrupert-unity and miguelalonsojr January 21, 2022 18:35

miguelalonsojr assigned jrupert-unity and unassigned miguelalonsojr Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439

[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439

ervteng commented Jun 23, 2021 •

edited

Loading

Uh oh!

CLAassistant commented Apr 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439

Are you sure you want to change the base?

[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439

Conversation

ervteng commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

TODO

Types of change(s)

Checklist

Other comments

Uh oh!

CLAassistant commented Apr 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ervteng commented Jun 23, 2021 •

edited

Loading