Skip to content

API for sending custom protobuf messages to and from Unity. #1595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 28, 2019

Conversation

malmaud
Copy link
Contributor

@malmaud malmaud commented Jan 12, 2019

Implements #1549.

This allows users to customize two protobuffer messages:

  • They can add fields to CustomParameters and pass an instance of their custom parameters to UnityEnvironment.reset with the new custom_config keyword argument. This is useful if the scene configuration is complex and hard to encode as a few scalar parameters, as the academy is currently set up to support.

  • They can also add fields to CustomOutput message and set the value of those fields in AgentAction, which will in turn be available in Python as part of the agent info returned from UnityEnvironment.step. They call GetCustomOutput in AgentAction or CollectObservations to get an instance of the CustomOutput object and set the value of its fields.

This enables Unity to return more than a single scalar reward to Python, which is useful for eg sending back the complex effects of an action and having the reward calculation logic happen in Python, or for debugging or computing diagnostics in Python.

I'm happy to write up documentation once (if) an API is agreed upon.

@awjuliani awjuliani changed the base branch from master to develop January 13, 2019 18:37
@@ -541,6 +548,7 @@ void ResetData()
* param.numStackedVectorObservations]);

info.visualObservations = new List<Texture2D>();
info.customOutput = new CustomOutput();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me how the user is supposed to pass a customOutput to the agentInfo. Here you create a new one and the GetCustomOutput() gives you the customOutput in the AgentInfo. I would rather have at this line info.customOutput = this.customOutput and have getter and setter on Agent.customOuptut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was the user does something like

void AgentAction(...) {
  var myOutput = GetCustomOutput();
  myOutput.x = 5;
}

This was to get around the user having to create their own instance of CustomOutput , since then the example becomes

void AgentAction(...) {
  var myOutput = new CommunicatorObjects.CustomOutput();
  myOutput.x = 5;
  customOutput = myOutput;

That seems rather ugly, although it become betters if CustomOutput is imported into MLAgents namespace.

Are you proposing simply

void AgentAction(...) {
  customOutput.x = 5;
}

with customOutput a getter on an internal attribute that is still created automatically by Agent instead of by the user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid creating a new instance of CustomOutput at every step when it is not needed. I am not sure what the API should look like here. To be consistent with AddObservation, we could have a AddCustomOutput method.
Now that I think about it, having an Agent.customOutput might be confusing. The user might try to modify this value from another script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about SetCustomOutput to mirror SetReward? AddObservation seems to imply that there can be multiple instances of a CustomOutput outputted by an agent on each step, which isn't true at the moment.

@@ -3,6 +3,9 @@ syntax = "proto3";
option csharp_namespace = "MLAgents.CommunicatorObjects";
package communicator_objects;

message CustomOutput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both CustomOutput and CustomParameters should live in separate files. Since these are supposed to be modified by the user, they should be separated from the rest of the proto definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them into their own files. I could also move them into a separate folder and a separate C#/proto namespace if you think that's useful.

/// User-customizable object for sending structured output from Unity to Python in response
/// to an action in addition to a scalar reward.
/// </summary>
public CustomOutput customOutput;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to investigate but I am not sure this is ever used.

/// Creates a new instance of a custom output object.
/// </summary>
/// <returns>A reference to a new CustomOutput object.</returns>
static public CustomOutput CreateCustomOutput()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to let the developer instantiate his own CustomOutput. This method feels redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that could be problematic since the user has to then know about the CommunicatorObjects namespace, where the CustomOutput class lives. If the C# namespace of the custom_output.proto file ever changes due to some kind of internal refactoring, users would have to adopt their code.

But sure, I'll make this change.

@vincentpierre
Copy link
Contributor

Hi @malmaud
I talked with the rest of the team and we are very impressed by your work.
Here is the gathered feedback on the naming for the API:

  • Rename CustomParameters to CustomResetParameters
  • Rename CustomOutput to CustomObservation
  • Change custom_config in the reset method to custom_reset_parameters

Do you have an example use case ? This would help us define better what the API should look like.
Would it be possible to have CustomAction as well in order to close the loop ?
I think the API would be good enough with those changes and we can start thinking about documentation and tests next.

@malmaud
Copy link
Contributor Author

malmaud commented Jan 23, 2019

I've very grateful for the team's support. Thank you!

I'll make all those requests changes in the next few days. I do have a use case since this was motivated by one of my PhD projects. I can boil that down to the minimum to show off the API and post a link for discussion.

@malmaud
Copy link
Contributor Author

malmaud commented Jan 25, 2019

WRT to renaming CustomOutput to CustomObservation, I realize there are two places you may want custom objects outputted from an agent and I had been conflating them:

  • In CollectObservations, so the external brain can choose an action based on a custom structure that summarizes the relevant observations. Here the name CustomObservation makes total sense.

  • In AgentAction, so an external brain can use a custom summary of the result of an action to compute rewards. This is nice if you want the reward computation to be split between Unity and Python - eg, you use Unity's physics to compute the positions of objects in response to an action, but you prefer the reward calculation based on those positions to happen in your Python script.

I was thinking of adding a new custom message to AgentInfoProto for this case and calling it something like CustomActionResult.

@vincentpierre
Copy link
Contributor

Unfortunately, there is no way to send data from Unity to Python at the AgentAction level. The second scenario you are proposing would require information to be sent from Unity to Python twice per step (If I understood correctly). I do not think the second scenario you mentioned would fit in the ML-Agents toolkit.

@vincentpierre
Copy link
Contributor

I am unclear regarding what CustomActionResult is and what it does. It seems to me to be a duplicate of CustomObservation (The two are always next to each other) so I wonder what it adds.

@malmaud
Copy link
Contributor Author

malmaud commented Feb 7, 2019

Here was my thinking. I'm totally happy to just get rid of CustomActionResult though if this doesn't make sense to you.

The basic flow I imagined was:

  1. Python sends Unity an action for an agent with step.
  2. AgentAction is called on the user's agent. It creates a CustomActionResult that includes metadata about the reward or sufficient information about the environment to calculate reward.
  3. CollectObservations is called. It creates a CustomObservation.
  4. The custom action result and custom observation are sent back to Python as part of the return value of step . Python calculates a reward from the custom action result and then chooses a new action based on the custom observation object.

In principle, the Unity agent could just store whatever information it was going to put in the custom action result during AgentAction into the custom observation instance instead. I just thought that it might be a simpler interface to separate them since their role is often different - one contains information relevant to calculating a reward in response to an action and the other is relevant to choosing the next action based on properties of the environment.

@vincentpierre
Copy link
Contributor

I see what you mean, CustomObservation is not only an observation, it is an output (hence why it was named CustomOutput). It could be CustomObservationAndReward but I think this will be too confusing. I think as a first step we should only include CustomObservations. Users will still be able to have custom results for their actions if they integrate them into the CustomObservation. The Capabilities would be the same, it might just feel a little hacky to put a reward (or similar) inside the observations.
Thank you.

@malmaud
Copy link
Contributor Author

malmaud commented Feb 8, 2019

Alright. How about I add a getter for the CustomObservation too, to make it easier for the user to set fields on it from both CollectObservations and AgentAction?

@vincentpierre
Copy link
Contributor

I think the PR looks good. I think we would need to make sure pytest still work, maybe even make new unit tests. The API looks good, I think you can start making documentation update for it. Let us know if you need help.

@vincentpierre vincentpierre self-assigned this Feb 20, 2019
@malmaud
Copy link
Contributor Author

malmaud commented Feb 21, 2019

Ok, great. It will be a week or so until I can get back to this.

@ervteng ervteng mentioned this pull request Mar 5, 2019
@vincentpierre
Copy link
Contributor

@malmaud Did you have time to revisit this PR ?

@malmaud
Copy link
Contributor Author

malmaud commented Mar 12, 2019 via email

@malmaud
Copy link
Contributor Author

malmaud commented Mar 20, 2019

OK, ready to work on this again. So it sounds like what's still needed is docs and tests.

Looks like for docs, I'd add sections to the docs folder.

Can you guide me a bit on how to get started with adding tests? It looks like ml-agents/tests is the directory I'd want to be checking out.

@vincentpierre
Copy link
Contributor

Yes, all the documentation is present in the docs folder. Since this is not a change to the trainers, I would say the tests will go into the envs tests. Not sure what tests we should do because the custom proto objects are empty, so hard to assign values to them. I would say what replaces the reset parameters can be tested at least.

@malmaud
Copy link
Contributor Author

malmaud commented Mar 22, 2019

OK, I added some documentation. It's primarily in a new Custom-Protos.md file in docs. Feel free to directly push edits to it on my branch if there are small tweaks you'd like to see.

I implemented all the examples I mention in that file to double-check that they work.

Still have to work on tests.

@Unity-Technologies Unity-Technologies deleted a comment Mar 22, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 22, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 22, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 22, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 27, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 27, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 27, 2019
@Unity-Technologies Unity-Technologies deleted a comment Mar 27, 2019
@vincentpierre vincentpierre merged commit c4c35c6 into Unity-Technologies:develop Mar 28, 2019
@ervteng ervteng self-requested a review March 28, 2019 22:27
Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

@malmaud
Copy link
Contributor Author

malmaud commented Mar 28, 2019

Thanks everyone!

LeSphax pushed a commit to LeSphax/ml-agents-1 that referenced this pull request May 3, 2020
…chnologies#1595)

* API for sending custom protobuf messages to and from Unity.

* Rename custom_output to custom_outputs.

* Move custom protos to their own files.

* Add SetCustomOutput method.

* Add docstrings.

* Various adjustments.

* Rename CustomParameters -> CustomResetParameters
* Rename CustomOutput -> CUstomObservation
* Add CustomAction
* Add CustomActionResult

* Remove custom action result.

* Remove custom action result from Python API

* Start new documentation.

* Add some docstrings

* Expand documentation.

* Typos

* Tweak doc.

Also eliminate GetCustomObservation.

* Fix typo.

* Clarify docs.

* Remove trailing whitspace
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants