Skip to content

Conversation

@wiresharkyyh
Copy link
Contributor

No description provided.

@guyharris
Copy link
Member

The UserData field of an EVENT_RECORD structure is just a pointer; the Microsoft Docs page that describes the EVENT_RECORD structure says

UserData

Event specific data. To parse this data, see Retrieving Event Data Using TDH. If the Flags member of EVENT_HEADER contains EVENT_HEADER_FLAG_STRING_ONLY, the data is a null-terminated Unicode string that you do not need TDH to parse.

and the Retrieving Event Data Using TDH says

To consume event specific data, the consumer must know the format of the event data. If the provider used a manifest, MOF, or TMF files to publish the format of the event data, you can use the trace data helper (TDH) functions to parse the event data.

So, in order to parse that data, would the program 1) have to be running on Windows, 2) have a manifest, MOF, or TMF file that defines the event format, and 3) use those APIs? If so, then those APIs appears to take a PEVENT_RECORD argument, which means that the EVENT_RECORD structure would have to be reconstructed from the EVENT_HEADER and ETW_BUFFER_CONTEXT fields. Is that the case? If so, then presumably ExtendedDataCount would be set to 0, ExtendedData would be set to NULL, and UserContext would be set to NULL; is that correct?

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 6, 2020

Let me update it a little bit. The UserData in fact is the binary content of the pointer. Technically it doesn't have to be limited to be the same as the content as UserData field of EVENT_RECORD. I will update it as

UserData is specific event data of the provider, its format is defined by the provider

It is created mainly for Windows, it saves the data that has been parsed on Windows so it doesn't require to call any windows API to parse it again. So technically, the binary format works on Linux also.

https://gitlab.com/wireshark/wireshark/-/merge_requests/697, this PR in Wireshark dissects this binary format and is OS independent

@guyharris
Copy link
Member

https://gitlab.com/wireshark/wireshark/-/merge_requests/697, this PR in Wireshark dissects this binary format and is OS independent

...but the only user data it dissects appears to be data from the Mobile Broadband Interface Model provider.

I assume the provider GUID, plus the Flags and EventProperty fields, indicates how to parse the UserData:

  • if EVENT_HEADER_FLAG_STRING_ONLY is set in Flags, then the event is just a little-endian UTF-16 string;
  • otherwise:
    -- if EventProperty is EVENT_HEADER_PROPERTY_FORWARDED_XML, the event "contains within itself a fully-rendered XML description of the data";
    -- if EventProperty is EVENT_HEADER_PROPERTY_XML, you need a manifest to parse the data;
    -- if EventProperty is EVENT_HEADER_PROPERTY_LEGACY_EVENTLOG, you need an WMI MOF class to parse the event data

Manifests appear to be XML files, so code to parse UserData for those events could either read such an XML file and us it to interpret the data, or could be generated from that XML file. The manifest includes a GUID that's presumably the provider GUID, so code to parse these could look for a manifest with that GUID and use it.

"MOF" is presumably the Managed Object Format, and the Common Information Model to which they refer is presumably this Common Information Model, so code to parse UserData for those events could either read such an MOF file and us it to interpret the data, or could be generated from that MOF file. I'm not sure how MOF files are associated with provider GUIDs. I'm also assuming that the Managed Object Format is this Managed Object Format.

I'm guessing that if EVENT_HEADER_FLAG_TRACE_MESSAGE is set in Flags, the UserData is described by a Trace Message Format (TMF) files, so code to parse UserData for those events could either read such an TMF file and us it to interpret the data, or could be generated from that TMF file. For those files, the file name is the GUID of the provider. (I'm assuming here that the Message GUID is the provider GUID.)

For this specification, it's probably best to indicate what fields indicate how to parse the UserData, and treat the ETW header as similar to the Ethernet header, to the extent that the meanings of all the fields in the Ethernet header are fixed, but the meaning of the payload depends on the value of the type/length header.

However, the Microsoft documentation does not indicate the numerical values of the bits in Flags or the values for EventProperty - you'd presumably have to get that from Microsoft's header files. This specification should probably give those numerical values. (Frankly, Microsoft's specification should do so, but I digress....)

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 7, 2020

For this specification, it's probably best to indicate what fields indicate how to parse the UserData, and treat the ETW header as similar to the Ethernet header, to the extent that the meanings of all the fields in the Ethernet header are fixed, but the meaning of the payload depends on the value of the type/length header.

As I mentioned before, this binary format (includes UserData, Message, Providername) doesn't require to use extra manifest or MOF to pare

it saves the data that has been parsed on Windows so it doesn't require to call any windows API to parse it again

Are you ok to update this with below sentence?

EVENT_HEADER is 80 bytes long data struct defined by Microsoft. It is the replication of EVENT_HEADER when the event is collected on Windows, it won't indicate how to parse UserData in LINKTYPE_ETW format. UserData is absolutely provider specific, the provider of UserData should write the full UserData in LINKTYPE_ETW format and be able to read/parse it without dependencies of other file.

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 7, 2020

However, the Microsoft documentation does not indicate the numerical values of the bits in Flags or the values for EventProperty - you'd presumably have to get that from Microsoft's header files. This specification should probably give those numerical values. (Frankly, Microsoft's specification should do so, but I digress....)

Yes, it is easy to get its numerical values from eventcons.h that is published within SDK. I can list of them in the spec if you think it is interesting to the reader of this specification

#define EVENT_HEADER_PROPERTY_XML               0x0001
#define EVENT_HEADER_PROPERTY_FORWARDED_XML     0x0002
#define EVENT_HEADER_PROPERTY_LEGACY_EVENTLOG   0x0004

#define EVENT_HEADER_FLAG_EXTENDED_INFO         0x0001
#define EVENT_HEADER_FLAG_PRIVATE_SESSION       0x0002
#define EVENT_HEADER_FLAG_STRING_ONLY           0x0004
#define EVENT_HEADER_FLAG_TRACE_MESSAGE         0x0008
#define EVENT_HEADER_FLAG_NO_CPUTIME            0x0010
#define EVENT_HEADER_FLAG_32_BIT_HEADER         0x0020
#define EVENT_HEADER_FLAG_64_BIT_HEADER         0x0040
#define EVENT_HEADER_FLAG_CLASSIC_HEADER        0x0100

@guyharris
Copy link
Member

As I mentioned before, this binary format (includes UserData, Message, Providername) doesn't require to use extra manifest or MOF to pare

It requires knowledge of the format. One form that this knowledge takes is an XML manifest; another form is an MOF file. Perhaps others have formats described by TMF files, or by some other files. If EVENT_HEADER_FLAG_STRING_ONLY is set in Flags, it's just a UTF-16LE string.

If EventProperty is EVENT_HEADER_PROPERTY_FORWARDED_XML, the event "contains within itself a fully-rendered XML description of the data". Where is it documented what this means? Does it mean that the UserData is just XML, in some encoding such as UTF-16LE?

The other values of EventProperty indicates that the data isn't self-describing, and indicates in what form Microsoft supplies a description of the data (XML manifest or MOF file).

If EVENT_HEADER_FLAG_TRACE_MESSAGE is set in Flags, does that mean that Microsoft supplies a description of the data as a TMF file?

@guyharris
Copy link
Member

What, if anything, do EVENT_HEADER_FLAG_32_BIT_HEADER, EVENT_HEADER_FLAG_64_BIT_HEADER, and EVENT_HEADER_FLAG_CLASSIC_HEADER indicate? What, if anything, would they change in the dissection?

@guyharris
Copy link
Member

What about the "extended data" - EVENT_HEADER_FLAG_EXTENDED_INFO in Flags, and the ExtendedData member of EVENT_RECORD? Does this format support saving that:

ExtendedData

One or more extended data items that ETW collects. The extended data includes some items, such as the security identifier (SID) of the user that logged the event, only if the controller sets the EnableProperty parameter passed to the EnableTraceEx or EnableTraceEx2 function. The extended data includes other items, such as the related activity identifier and decoding information for trace logging, regardless whether the controller sets the EnableProperty parameter passed to EnableTraceEx or EnableTraceEx2. For details, see the EVENT_HEADER_EXTENDED_DATA_ITEM structure

Those look sort of like TLVs.

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 7, 2020

For details, see the EVENT_HEADER_EXTENDED_DATA_ITEM structure

Those look sort of like TLVs.

This specification will not include everything of MSDN, it will be supper long if it does that. This specification only defines the Binary format of LINKTYPE_ETW, how the provider interpret of this binary format and generate this binary format is up to the provider. Right?

Like "extended data", if a provider needs persist the extended data from an event, the provider can choose to leverage UserData to accommodate that. In UserData, the provider can save whatever is interesting and decode it, it can be another struct specific to the provider

@wiresharkyyh
Copy link
Contributor Author

What, if anything, do EVENT_HEADER_FLAG_32_BIT_HEADER, EVENT_HEADER_FLAG_64_BIT_HEADER, and EVENT_HEADER_FLAG_CLASSIC_HEADER indicate? What, if anything, would they change in the dissection?

Again, they won't change, they are just the replication from the EVENT_HEADER when the event is collected on Windows. Dissection just show this number and doesn't need do anything special with it

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 7, 2020

It requires knowledge of the format. One form that this knowledge takes is an XML manifest; another form is an MOF file. Perhaps others have formats described by TMF files, or by some other files. If EVENT_HEADER_FLAG_STRING_ONLY is set in Flags, it's just a UTF-16LE string.

This is the provider responsibility to understand the knowledge of those FLAGs. When a provider decided to use the LINKTYPE_ETW, the provider is responsible to understand how to decode the those data from original ETL file and write it as in LINKTYPE_ETW format. Right?

For example, assume I am Mobile Broadband Interface Model provider, when I write LINKTYPE_ETW format, I know if I need check those FLAGS and what bytes need be wrote to UserData. https://gitlab.com/wireshark/wireshark/-/merge_requests/468, I believe this PR should be the good example to demonstrate how to generate this DLT_ETW binary format from the Windows ETW source

@mcr
Copy link
Member

mcr commented Nov 7, 2020

So, in order to parse that data, would the program 1) have to be running on Windows, 2) have a manifest, MOF, or TMF file that defines the event format, and 3) use those APIs?

I sure agree that it would be great if we could parse the data elsewhere, and certainly anyone trying to make a tcpdump or wireshark dissector needs that.... but are we being too heavy-weighted on DLT values here?
So, let's get as many pointers to MSDN articles, etc. as we can, but let's not insist on every detail.
The key question is really: is MS going to change the format without telling anyone? That would render it useless for even people running on Windows. Is there a version number that we can rely upon?

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 9, 2020

So, let's get as many pointers to MSDN articles, etc. as we can, but let's not insist on every detail.

@mcr, sounds like you agree to link to MSDN in stead of explain those fields in the this document again. Is this document good enough for publish? Let me know if anything else need be updated

I sure agree that it would be great if we could parse the data elsewhere, and certainly anyone trying to make a tcpdump or wireshark dissector needs that.... but are we being too heavy-weighted on DLT values here?

The DLT_ETW binary format itself can be parsed elsewhere other than windows. Windows is only needed when the DLT_ETW binary format is generated since its source is windows EVENT_RECORD. Then Wireshark, tcpdump or any other application can parse it on any platform. So the answer is it doesn't need run on Windows, or need manifest, MOF or TMF to parse the data. Check this Wireshark PR https://gitlab.com/wireshark/wireshark/-/merge_requests/697 as an example that dissects this binary format and it works on Linux also

The key question is really: is MS going to change the format without telling anyone? That would render it useless for even people running on Windows. Is there a version number that we can rely upon?

At first, I don't think MS will likely change this format. MS will definitely publish the change if in worst case MS changed it because the data structure is published on MSDN

@guyharris
Copy link
Member

Pointing to the MSDN structures works, if eventually there are links for all of Microsoft's primitive types such as USHORT, ULONG, etc., so people know what values are 1-byte signed/unsigned values, what values are 2-byte signed/unsigned values, what values are 4-byte signed/unsigned values, what values are 8-byte signed/unsigned values, etc.. ("long" doesn't necessarily mean "32-bit" in UN*Xland; it may also mean "64-bit".)

We should, however, provide the values for flag bits, unless Microsoft defines them in a document rather than in a header file.

@wiresharkyyh
Copy link
Contributor Author

("long" doesn't necessarily mean "32-bit" in UN*Xland; it may also mean "64-bit".)

We should, however, provide the values for flag bits, unless Microsoft defines them in a document rather than in a header file.

@guyharris, here is the MSDN document for the size of different types on Windows. https://docs.microsoft.com/en-us/cpp/cpp/data-type-ranges?view=msvc-160

I will have it in the document and also have the values for the flag bit in the document.

I might not answer every question, hopefully I clarified most of them. Thanks

@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 11, 2020

@guyharris, @mcr, Is it good to merge? Two PRs in Wireshark are waiting for this DLT_ETW to enable supporting ETW so Windows developer can benefit from this new functionality. Thanks very much.

@wiresharkyyh
Copy link
Contributor Author

I have updated the document, @guyharris and @mcr, could you please take a look and merge it if it looks good? Thanks

@mcr
Copy link
Member

mcr commented Nov 13, 2020

@guyharris , I haven't read this deeply, but I'm happy with this.

@mcr mcr merged commit f10c232 into the-tcpdump-group:master Nov 13, 2020
@wiresharkyyh
Copy link
Contributor Author

wiresharkyyh commented Nov 13, 2020

@mcr, Thanks very much to merge it. Could you please help to merge the-tcpdump-group/libpcap#978 that add DLT_ETW value in the code so Wireshark can move on with the value?

@infrastation
Copy link
Member

IMO, Guy was doing this review better than anyone else could.

@wiresharkyyh
Copy link
Contributor Author

@infrastation, not sure if @guyharris started Thanksgiving vacation. He seems not responding for a week.

Because the Wireshark PRs are waiting for the DLT_ETW value, could you please merge the-tcpdump-group/libpcap#978 so its value can be locked? I will update the specification if @guyharris has other feedback

guyharris added a commit that referenced this pull request Jan 27, 2021
Finishes the job started in pull request #18.
guyharris added a commit that referenced this pull request Jan 27, 2021
Finishes the job started in pull request #18.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants