API for sending custom protobuf messages to and from Unity. #1595

malmaud · 2019-01-12T03:12:38Z

Implements #1549.

This allows users to customize two protobuffer messages:

They can add fields to CustomParameters and pass an instance of their custom parameters to UnityEnvironment.reset with the new custom_config keyword argument. This is useful if the scene configuration is complex and hard to encode as a few scalar parameters, as the academy is currently set up to support.
They can also add fields to CustomOutput message and set the value of those fields in AgentAction, which will in turn be available in Python as part of the agent info returned from UnityEnvironment.step. They call GetCustomOutput in AgentAction or CollectObservations to get an instance of the CustomOutput object and set the value of its fields.

This enables Unity to return more than a single scalar reward to Python, which is useful for eg sending back the complex effects of an action and having the reward calculation logic happen in Python, or for debugging or computing diagnostics in Python.

I'm happy to write up documentation once (if) an API is agreed upon.

…tfix-0.6.0a Hotfix 0.6.0a to master

vincentpierre · 2019-01-14T19:14:21Z

UnitySDK/Assets/ML-Agents/Scripts/Agent.cs

@@ -541,6 +548,7 @@ void ResetData()
                          * param.numStackedVectorObservations]);

            info.visualObservations = new List<Texture2D>();
+            info.customOutput = new CustomOutput();


It is not clear to me how the user is supposed to pass a customOutput to the agentInfo. Here you create a new one and the GetCustomOutput() gives you the customOutput in the AgentInfo. I would rather have at this line info.customOutput = this.customOutput and have getter and setter on Agent.customOuptut.

My idea was the user does something like

void AgentAction(...) { var myOutput = GetCustomOutput(); myOutput.x = 5; }

This was to get around the user having to create their own instance of CustomOutput , since then the example becomes

void AgentAction(...) { var myOutput = new CommunicatorObjects.CustomOutput(); myOutput.x = 5; customOutput = myOutput;

That seems rather ugly, although it become betters if CustomOutput is imported into MLAgents namespace.

Are you proposing simply

void AgentAction(...) { customOutput.x = 5; }

with customOutput a getter on an internal attribute that is still created automatically by Agent instead of by the user?

I would like to avoid creating a new instance of CustomOutput at every step when it is not needed. I am not sure what the API should look like here. To be consistent with AddObservation, we could have a AddCustomOutput method.
Now that I think about it, having an Agent.customOutput might be confusing. The user might try to modify this value from another script.

What about SetCustomOutput to mirror SetReward? AddObservation seems to imply that there can be multiple instances of a CustomOutput outputted by an agent on each step, which isn't true at the moment.

ml-agents/mlagents/envs/brain.py

vincentpierre · 2019-01-14T19:21:21Z

protobuf-definitions/proto/mlagents/envs/communicator_objects/agent_info_proto.proto

@@ -3,6 +3,9 @@ syntax = "proto3";
 option csharp_namespace = "MLAgents.CommunicatorObjects";
 package communicator_objects;

+message CustomOutput {


I think both CustomOutput and CustomParameters should live in separate files. Since these are supposed to be modified by the user, they should be separated from the rest of the proto definitions.

I moved them into their own files. I could also move them into a separate folder and a separate C#/proto namespace if you think that's useful.

vincentpierre · 2019-01-22T21:36:34Z

UnitySDK/Assets/ML-Agents/Scripts/Agent.cs

+        /// User-customizable object for sending structured output from Unity to Python in response
+        /// to an action in addition to a scalar reward.
+        /// </summary>
+        public CustomOutput customOutput;


I would need to investigate but I am not sure this is ever used.

vincentpierre · 2019-01-22T21:37:18Z

UnitySDK/Assets/ML-Agents/Scripts/Agent.cs

+        /// Creates a new instance of a custom output object.
+        /// </summary>
+        /// <returns>A reference to a new CustomOutput object.</returns>
+        static public CustomOutput CreateCustomOutput()


I think it is fine to let the developer instantiate his own CustomOutput. This method feels redundant

My thinking was that could be problematic since the user has to then know about the CommunicatorObjects namespace, where the CustomOutput class lives. If the C# namespace of the custom_output.proto file ever changes due to some kind of internal refactoring, users would have to adopt their code.

But sure, I'll make this change.

vincentpierre · 2019-01-23T18:28:10Z

Hi @malmaud
I talked with the rest of the team and we are very impressed by your work.
Here is the gathered feedback on the naming for the API:

Rename CustomParameters to CustomResetParameters
Rename CustomOutput to CustomObservation
Change custom_config in the reset method to custom_reset_parameters

Do you have an example use case ? This would help us define better what the API should look like.
Would it be possible to have CustomAction as well in order to close the loop ?
I think the API would be good enough with those changes and we can start thinking about documentation and tests next.

malmaud · 2019-01-23T21:13:09Z

I've very grateful for the team's support. Thank you!

I'll make all those requests changes in the next few days. I do have a use case since this was motivated by one of my PhD projects. I can boil that down to the minimum to show off the API and post a link for discussion.

malmaud · 2019-01-25T18:58:48Z

WRT to renaming CustomOutput to CustomObservation, I realize there are two places you may want custom objects outputted from an agent and I had been conflating them:

In CollectObservations, so the external brain can choose an action based on a custom structure that summarizes the relevant observations. Here the name CustomObservation makes total sense.
In AgentAction, so an external brain can use a custom summary of the result of an action to compute rewards. This is nice if you want the reward computation to be split between Unity and Python - eg, you use Unity's physics to compute the positions of objects in response to an action, but you prefer the reward calculation based on those positions to happen in your Python script.

I was thinking of adding a new custom message to AgentInfoProto for this case and calling it something like CustomActionResult.

vincentpierre · 2019-01-25T19:32:49Z

Unfortunately, there is no way to send data from Unity to Python at the AgentAction level. The second scenario you are proposing would require information to be sent from Unity to Python twice per step (If I understood correctly). I do not think the second scenario you mentioned would fit in the ML-Agents toolkit.

vincentpierre · 2019-02-05T00:50:40Z

I am unclear regarding what CustomActionResult is and what it does. It seems to me to be a duplicate of CustomObservation (The two are always next to each other) so I wonder what it adds.

malmaud · 2019-02-07T18:27:12Z

Here was my thinking. I'm totally happy to just get rid of CustomActionResult though if this doesn't make sense to you.

The basic flow I imagined was:

Python sends Unity an action for an agent with step.
AgentAction is called on the user's agent. It creates a CustomActionResult that includes metadata about the reward or sufficient information about the environment to calculate reward.
CollectObservations is called. It creates a CustomObservation.
The custom action result and custom observation are sent back to Python as part of the return value of step . Python calculates a reward from the custom action result and then chooses a new action based on the custom observation object.

In principle, the Unity agent could just store whatever information it was going to put in the custom action result during AgentAction into the custom observation instance instead. I just thought that it might be a simpler interface to separate them since their role is often different - one contains information relevant to calculating a reward in response to an action and the other is relevant to choosing the next action based on properties of the environment.

vincentpierre · 2019-02-07T18:37:41Z

I see what you mean, CustomObservation is not only an observation, it is an output (hence why it was named CustomOutput). It could be CustomObservationAndReward but I think this will be too confusing. I think as a first step we should only include CustomObservations. Users will still be able to have custom results for their actions if they integrate them into the CustomObservation. The Capabilities would be the same, it might just feel a little hacky to put a reward (or similar) inside the observations.
Thank you.

malmaud · 2019-02-08T20:13:48Z

Alright. How about I add a getter for the CustomObservation too, to make it easier for the user to set fields on it from both CollectObservations and AgentAction?

vincentpierre · 2019-02-12T21:40:40Z

I think the PR looks good. I think we would need to make sure pytest still work, maybe even make new unit tests. The API looks good, I think you can start making documentation update for it. Let us know if you need help.

malmaud · 2019-02-21T02:41:41Z

Ok, great. It will be a week or so until I can get back to this.

…lease-v0.7 Release v0.7 into master

vincentpierre · 2019-03-12T17:21:01Z

@malmaud Did you have time to revisit this PR ?

malmaud · 2019-03-12T20:54:47Z

I can work on it later this week.

…

On Tue, Mar 12, 2019 at 1:21 PM Vincent-Pierre BERGES < ***@***.***> wrote: @malmaud <https://github.com/malmaud> Did you have time to revisit this PR ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1595 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8SveML4T6HGkmP7ZEve0mUT9cFPvM6ks5vV-IKgaJpZM4Z8dWU> .

malmaud · 2019-03-20T19:05:23Z

OK, ready to work on this again. So it sounds like what's still needed is docs and tests.

Looks like for docs, I'd add sections to the docs folder.

Can you guide me a bit on how to get started with adding tests? It looks like ml-agents/tests is the directory I'd want to be checking out.

vincentpierre · 2019-03-20T20:50:21Z

Yes, all the documentation is present in the docs folder. Since this is not a change to the trainers, I would say the tests will go into the envs tests. Not sure what tests we should do because the custom proto objects are empty, so hard to assign values to them. I would say what replaces the reset parameters can be tested at least.

malmaud · 2019-03-22T02:38:31Z

OK, I added some documentation. It's primarily in a new Custom-Protos.md file in docs. Feel free to directly push edits to it on my branch if there are small tweaks you'd like to see.

I implemented all the examples I mention in that file to double-check that they work.

Still have to work on tests.

docs/Custom-Protos.md

* Rename CustomParameters -> CustomResetParameters * Rename CustomOutput -> CUstomObservation * Add CustomAction * Add CustomActionResult

Also eliminate GetCustomObservation.

ervteng

Approved!

malmaud · 2019-03-28T22:56:37Z

Thanks everyone!

…chnologies#1595) * API for sending custom protobuf messages to and from Unity. * Rename custom_output to custom_outputs. * Move custom protos to their own files. * Add SetCustomOutput method. * Add docstrings. * Various adjustments. * Rename CustomParameters -> CustomResetParameters * Rename CustomOutput -> CUstomObservation * Add CustomAction * Add CustomActionResult * Remove custom action result. * Remove custom action result from Python API * Start new documentation. * Add some docstrings * Expand documentation. * Typos * Tweak doc. Also eliminate GetCustomObservation. * Fix typo. * Clarify docs. * Remove trailing whitspace

Merge pull request Unity-Technologies#1588 from Unity-Technologies/ho…

d3f9fd6

…tfix-0.6.0a Hotfix 0.6.0a to master

awjuliani requested a review from vincentpierre January 13, 2019 18:37

awjuliani changed the base branch from master to develop January 13, 2019 18:37

vincentpierre reviewed Jan 14, 2019

View reviewed changes

ml-agents/mlagents/envs/brain.py Outdated Show resolved Hide resolved

vincentpierre reviewed Jan 14, 2019

View reviewed changes

vincentpierre reviewed Jan 22, 2019

View reviewed changes

vincentpierre self-assigned this Feb 20, 2019

Merge pull request Unity-Technologies#1764 from Unity-Technologies/re…

e3b86a2

…lease-v0.7 Release v0.7 into master

ervteng mentioned this pull request Mar 5, 2019

loss and reward #1677

Closed

Unity-Technologies deleted a comment Mar 22, 2019

ervteng reviewed Mar 27, 2019

View reviewed changes

docs/Custom-Protos.md Show resolved Hide resolved

ervteng reviewed Mar 27, 2019

View reviewed changes

docs/Custom-Protos.md Outdated Show resolved Hide resolved

ervteng reviewed Mar 27, 2019

View reviewed changes

docs/Custom-Protos.md Outdated Show resolved Hide resolved

Unity-Technologies deleted a comment Mar 27, 2019

malmaud added 15 commits March 27, 2019 14:05

API for sending custom protobuf messages to and from Unity.

b2000da

Rename custom_output to custom_outputs.

17b680c

Move custom protos to their own files.

daebc0d

Add SetCustomOutput method.

32c9945

Add docstrings.

12c99c2

Various adjustments.

b6383eb

* Rename CustomParameters -> CustomResetParameters * Rename CustomOutput -> CUstomObservation * Add CustomAction * Add CustomActionResult

Remove custom action result.

60ca7bb

Remove custom action result from Python API

515a69a

Start new documentation.

1405af4

Add some docstrings

85a073f

Expand documentation.

e691701

Typos

6f4a3ad

Tweak doc.

3cf307a

Also eliminate GetCustomObservation.

Fix typo.

36e933e

Clarify docs.

d9d987a

malmaud force-pushed the custom_protos branch from 445617d to d9d987a Compare March 27, 2019 18:07

Remove trailing whitspace

d0c137b

Unity-Technologies deleted a comment Mar 27, 2019

vincentpierre merged commit c4c35c6 into Unity-Technologies:develop Mar 28, 2019

ervteng self-requested a review March 28, 2019 22:27

ervteng reviewed Mar 28, 2019

View reviewed changes

github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021

API for sending custom protobuf messages to and from Unity. #1595

API for sending custom protobuf messages to and from Unity. #1595

Uh oh!

Conversation

malmaud commented Jan 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentpierre commented Jan 23, 2019

Uh oh!

malmaud commented Jan 23, 2019

Uh oh!

malmaud commented Jan 25, 2019

Uh oh!

vincentpierre commented Jan 25, 2019

Uh oh!

vincentpierre commented Feb 5, 2019

Uh oh!

malmaud commented Feb 7, 2019

Uh oh!

vincentpierre commented Feb 7, 2019

Uh oh!

malmaud commented Feb 8, 2019

Uh oh!

vincentpierre commented Feb 12, 2019

Uh oh!

malmaud commented Feb 21, 2019

Uh oh!

vincentpierre commented Mar 12, 2019

Uh oh!

malmaud commented Mar 12, 2019 via email

Uh oh!

malmaud commented Mar 20, 2019

Uh oh!

vincentpierre commented Mar 20, 2019

Uh oh!

malmaud commented Mar 22, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ervteng left a comment

Choose a reason for hiding this comment

Uh oh!

malmaud commented Mar 28, 2019

Uh oh!

Uh oh!

malmaud commented Jan 12, 2019 •

edited

Loading