-
Notifications
You must be signed in to change notification settings - Fork 4.3k
A couple fixes for recording demonstrations #1999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1011,6 +1021,8 @@ void ResetIfDone() | |||
// as it is done | |||
_AgentReset(); | |||
hasAlreadyReset = true; | |||
requestDecision = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we make sure this does not break stuff ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any of our environments (aside from Snoopy Pop, where this does work) use On Demand Decisions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bouncer uses ODD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training Bouncer, seems to work (1.0 reward after 50k steps and climbing)
@@ -14,6 +14,7 @@ public class DemonstrationRecorder : MonoBehaviour | |||
private Agent recordingAgent; | |||
private string filePath; | |||
private DemonstrationStore demoStore; | |||
private int maxNameLength = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a const and capitalize variable name
@@ -584,7 +586,15 @@ void SendInfoToBrain() | |||
} | |||
|
|||
info.memories = action.memories; | |||
info.storedVectorActions = action.vectorActions; | |||
if(done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to see how this solves the problem, could you add some comments explaining what problem it solves and how ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a short comment - my initial description on the pull request itself was wrong. Since Reset() is called before SendInfoToBrain(), when Reset happens, action.vectorActions is zeroed out, then SendInfoToBrain() is called and the 0's are stored in storedVectorActions. This is OK for training since the Python code doesn't use the storedVectorActions, but when recording a demo, it saves the 0's to the file. (now that I think about it, we might need to do this for the textActions as well.)
I don't neccessarily like this fix as it isn't very clean. But it seems like the only thing possible without reordering the Academy step loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not zero out action.vectorActions at reset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do this? Maybe we can initialize them to 0 on creation but not zero them out? I guess it's problematic since 0 isn't null, it has an actual meaning when it comes to actions
Uh oh!
There was an error while loading. Please reload this page.