@@ -8,7 +8,7 @@ It is inspired by, and is significantly based on,
88[ DER ASCII] ( https://github.com/google/der-ascii ) , a similar tool for working with
99DER and BER, wire formats of ASN.1.
1010
11- Unlike most Protobuf tools, it is completely ignorant of schemata specified in ` .proto `
11+ Unlike most Protobuf tools, it is normally ignorant of schemata specified in ` .proto `
1212files; it has just enough knowledge of the wire format to provide primitives for
1313constructing messages (such as field tags, varints, and length prefixes). A disassembler
1414is included that uses heuristics to try convert encoded Protobuf into Protoscope,
@@ -19,15 +19,158 @@ tool, which can be installed with the Go tool via
1919
2020 go install github.com/protocolbuffers/protoscope/cmd/protoscope...@latest
2121
22- These tools may be used to create test inputs by taking an existing proto,
23- dissassembling with ` protoscope ` , making edits, and then reassembling with
24- ` protoscope -s ` . This avoids having to manually fix up all the length prefixes.
25- They may also be used to inspect proto files (or things that look like them.)
26-
2722For the language specification and basic examples, see [ language.txt] ( /language.txt ) .
2823Example disassembly can be found under [ ./testdata] ( /testdata ) .
2924
30- ## Backwards compatibility
25+ ## Cookbook
26+
27+ Protoscope can be used in a number of different ways to inspect or create binary Protobuf
28+ data. This isn't the full breadth of usecases, but they are the ones Protoscope
29+ (and its ancestor, DER ASCII) were designed for.
30+
31+ ### Exploring Binary Dumps
32+
33+ Sometimes, while working on a library that emits wire format, it may be necessary to debug
34+ the precise output of a test failure. If your test prints out a hex string, you can use
35+ the ` xxd ` command to turn it into raw binary data and pipe it into ` protoscope ` :
36+
37+ ``` sh
38+ $ cat hexdata.txt
39+ 0a400a26747970652e676f6f676c65617069732e636f6d2f70726f746f332e546573744d65737361676512161005420e65787065637465645f76616c756500000000
40+ $ xxd -r -ps hexdata.txt | protoscope
41+ 1: {
42+ 1: {" type.googleapis.com/proto3.TestMessage" }
43+ 2: {` 1005420e65787065637465645f76616c756500000000` }
44+ }
45+ $ xxd -r -ps <<< " 1005420e65787065637465645f76616c756500000000" | protoscope
46+ 2: 5
47+ 8: {" expected_value" }
48+ ` 00000000`
49+ ```
50+
51+ This reveals that four zero bytes sneaked into the output!
52+
53+ If your test failure output is made up of C-style escapes and text, the ` printf ` command
54+ can be used instead of ` xxd ` :
55+
56+ ``` sh
57+ $ printf ' \x10\x05B\x0eexpected_value\x00\x00\x00\x00' | protoscope
58+ 2: 5
59+ 8: {" expected_value" }
60+ ` 00000000`
61+ ```
62+
63+ The ` protoscope ` command has many flags for refining the heuristic used to decode the
64+ binary.
65+
66+ If an encoded ` FileDescriptorSet ` proto is available that contains your message's type,
67+ you can use it to get schema-aware decoding:
68+
69+ ``` sh
70+ $ cat hexdata.txt
71+ 086510661867206828d20130d4013d6b000000416c000000000000004d6d000000516e000000000000005d0000de42610000000000005c40680172033131357a0331313683018801758401
72+ $ xxd -r -ps hexdata.txt | protoscope \
73+ -descriptor-set path/to/fds.pb -message-type unittest.TestAllTypes \
74+ -print-field-names
75+ 1: 101 # optional_int32
76+ 2: 102 # optional_int64
77+ 3: 103 # optional_uint32
78+ 4: 104 # optional_uint64
79+ 5: 105z # optional_sint32
80+ 6: 106z # optional_sint64
81+ 7: 107i32 # optional_fixed32
82+ 8: 108i64 # optional_fixed64
83+ 9: 109i32 # optional_sfixed32
84+ 10: 110i64 # optional_sfixed64
85+ 11: 111.0i32 # optional_float, 0x42de0000i32
86+ 12: 112.0 # optional_double, 0x405c000000000000i64
87+ 13: true # optional_bool
88+ 14: {" 115" } # optional_string
89+ 15: {" 116" } # optional_bytes
90+ 16: ! { # optionalgroup
91+ 17: 117 # a
92+ }
93+ ```
94+
95+ You can get an encoded ` FileDescriptorSet ` by invoking
96+
97+ ``` sh
98+ protoc -Ipath/to/imported/protos -o my_fds.pb my_proto.proto
99+ ```
100+
101+ ### Modifying Existing Files
102+
103+ Suppose that we have a proto file ` foo.bin ` of unknown schema:
104+
105+ ``` sh
106+ $ protoscope foo.bin
107+ 1: 42
108+ 2: {
109+ 42: {" my awesome proto" }
110+ }
111+ ```
112+
113+ Modifying the embedded string with a hex editor is very painful, because it's possible that
114+ the length prefix needs to be updated, which can lead to the length prefix on outer messages
115+ needing to be changed as well. This is made worse by length prefixes being varints, which may
116+ grow or shrink and feed into further outer length prefix updates.
117+
118+ But ` protoscope ` makes this into a simple disassemble, edit, assembly loop:
119+
120+ ``` sh
121+ $ xxd foo.bin
122+ 00000000: 082a 1213 d202 106d 7920 6177 6573 6f6d .* .....my awesom
123+ 00000010: 6520 7072 6f74 6f e proto
124+
125+ $ protoscope foo.bin > foo.txt # Disassemble.
126+ $ cat foo.txt
127+ 1: 42
128+ 2: {
129+ 42: {" my awesome proto" }
130+ }
131+
132+ $ vim foo.txt # Make some edits.
133+ $ cat foo.txt
134+ 1: 43
135+ 2: {
136+ 42: {" my even more awesome awesome proto" }
137+ }
138+
139+ $ protoscope -s foo.txt > foo.bin # Reassemble.
140+ $ xxd foo.bin
141+ 00000000: 082b 1225 d202 226d 7920 6576 656e 206d .+.%.." my even m
142+ 00000010: 6f72 6520 6177 6573 6f6d 6520 6177 6573 ore awesome awes
143+ 00000020: 6f6d 6520 7072 6f74 6f ome proto
144+ ` ` `
145+
146+ The ` -message-type` option from above can be used when you know the schema to make it easier
147+ to find specific fields.
148+
149+ # ## Describing Invalid Binaries
150+
151+ Because Protoscope has a very weak understanding of Protobuf, it can be used to create
152+ invalid encodings to verify that some invariant is actually checked by a production parser.
153+
154+ For example, the following Protoscope text can be used to create a test that ensures
155+ a too-long length prefix is rejected as invalid.
156+
157+ ` ` `
158+ 1: {
159+ 2:LEN 5 # Explicit length prefix.
160+ " oops" # One byte too short.
161+ }
162+ ` ` `
163+
164+ This is more conveinent than typing out bytes by hand, because Protoscope takes care of
165+ tedious details like length prefixes, varint encoding, float encoding, and other things
166+ not relevant to the test. It also permits comments, which can be used to specify why the
167+ Protoscope snippet produces a broken binary.
168+
169+ Protoscope itself generates test data using Protoscope, which is then checked in. Other
170+ projects can either check in binary data directly, or use the build system to invoke
171+ ` protoscope` , such as with a Bazel ` genrule()` .
172+
173+ # # Backwards Compatibility
31174
32175The Protoscope language itself may be extended over time, but the intention is
33176for extensions to be backwards-compatible. Specifically:
0 commit comments