Skip to content

Commit 7ea8143

Browse files
committed
Add usage examples to the README
1 parent 8b1d639 commit 7ea8143

File tree

1 file changed

+150
-7
lines changed

1 file changed

+150
-7
lines changed

README.md

Lines changed: 150 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ It is inspired by, and is significantly based on,
88
[DER ASCII](https://github.com/google/der-ascii), a similar tool for working with
99
DER and BER, wire formats of ASN.1.
1010

11-
Unlike most Protobuf tools, it is completely ignorant of schemata specified in `.proto`
11+
Unlike most Protobuf tools, it is normally ignorant of schemata specified in `.proto`
1212
files; it has just enough knowledge of the wire format to provide primitives for
1313
constructing messages (such as field tags, varints, and length prefixes). A disassembler
1414
is included that uses heuristics to try convert encoded Protobuf into Protoscope,
@@ -19,15 +19,158 @@ tool, which can be installed with the Go tool via
1919

2020
go install github.com/protocolbuffers/protoscope/cmd/protoscope...@latest
2121

22-
These tools may be used to create test inputs by taking an existing proto,
23-
dissassembling with `protoscope`, making edits, and then reassembling with
24-
`protoscope -s`. This avoids having to manually fix up all the length prefixes.
25-
They may also be used to inspect proto files (or things that look like them.)
26-
2722
For the language specification and basic examples, see [language.txt](/language.txt).
2823
Example disassembly can be found under [./testdata](/testdata).
2924

30-
## Backwards compatibility
25+
## Cookbook
26+
27+
Protoscope can be used in a number of different ways to inspect or create binary Protobuf
28+
data. This isn't the full breadth of usecases, but they are the ones Protoscope
29+
(and its ancestor, DER ASCII) were designed for.
30+
31+
### Exploring Binary Dumps
32+
33+
Sometimes, while working on a library that emits wire format, it may be necessary to debug
34+
the precise output of a test failure. If your test prints out a hex string, you can use
35+
the `xxd` command to turn it into raw binary data and pipe it into `protoscope`:
36+
37+
```sh
38+
$ cat hexdata.txt
39+
0a400a26747970652e676f6f676c65617069732e636f6d2f70726f746f332e546573744d65737361676512161005420e65787065637465645f76616c756500000000
40+
$ xxd -r -ps hexdata.txt | protoscope
41+
1: {
42+
1: {"type.googleapis.com/proto3.TestMessage"}
43+
2: {`1005420e65787065637465645f76616c756500000000`}
44+
}
45+
$ xxd -r -ps <<< "1005420e65787065637465645f76616c756500000000" | protoscope
46+
2: 5
47+
8: {"expected_value"}
48+
`00000000`
49+
```
50+
51+
This reveals that four zero bytes sneaked into the output!
52+
53+
If your test failure output is made up of C-style escapes and text, the `printf` command
54+
can be used instead of `xxd`:
55+
56+
```sh
57+
$ printf '\x10\x05B\x0eexpected_value\x00\x00\x00\x00' | protoscope
58+
2: 5
59+
8: {"expected_value"}
60+
`00000000`
61+
```
62+
63+
The `protoscope` command has many flags for refining the heuristic used to decode the
64+
binary.
65+
66+
If an encoded `FileDescriptorSet` proto is available that contains your message's type,
67+
you can use it to get schema-aware decoding:
68+
69+
```sh
70+
$ cat hexdata.txt
71+
086510661867206828d20130d4013d6b000000416c000000000000004d6d000000516e000000000000005d0000de42610000000000005c40680172033131357a0331313683018801758401
72+
$ xxd -r -ps hexdata.txt | protoscope \
73+
-descriptor-set path/to/fds.pb -message-type unittest.TestAllTypes \
74+
-print-field-names
75+
1: 101 # optional_int32
76+
2: 102 # optional_int64
77+
3: 103 # optional_uint32
78+
4: 104 # optional_uint64
79+
5: 105z # optional_sint32
80+
6: 106z # optional_sint64
81+
7: 107i32 # optional_fixed32
82+
8: 108i64 # optional_fixed64
83+
9: 109i32 # optional_sfixed32
84+
10: 110i64 # optional_sfixed64
85+
11: 111.0i32 # optional_float, 0x42de0000i32
86+
12: 112.0 # optional_double, 0x405c000000000000i64
87+
13: true # optional_bool
88+
14: {"115"} # optional_string
89+
15: {"116"} # optional_bytes
90+
16: !{ # optionalgroup
91+
17: 117 # a
92+
}
93+
```
94+
95+
You can get an encoded `FileDescriptorSet` by invoking
96+
97+
```sh
98+
protoc -Ipath/to/imported/protos -o my_fds.pb my_proto.proto
99+
```
100+
101+
### Modifying Existing Files
102+
103+
Suppose that we have a proto file `foo.bin` of unknown schema:
104+
105+
```sh
106+
$ protoscope foo.bin
107+
1: 42
108+
2: {
109+
42: {"my awesome proto"}
110+
}
111+
```
112+
113+
Modifying the embedded string with a hex editor is very painful, because it's possible that
114+
the length prefix needs to be updated, which can lead to the length prefix on outer messages
115+
needing to be changed as well. This is made worse by length prefixes being varints, which may
116+
grow or shrink and feed into further outer length prefix updates.
117+
118+
But `protoscope` makes this into a simple disassemble, edit, assembly loop:
119+
120+
```sh
121+
$ xxd foo.bin
122+
00000000: 082a 1213 d202 106d 7920 6177 6573 6f6d .*.....my awesom
123+
00000010: 6520 7072 6f74 6f e proto
124+
125+
$ protoscope foo.bin > foo.txt # Disassemble.
126+
$ cat foo.txt
127+
1: 42
128+
2: {
129+
42: {"my awesome proto"}
130+
}
131+
132+
$ vim foo.txt # Make some edits.
133+
$ cat foo.txt
134+
1: 43
135+
2: {
136+
42: {"my even more awesome awesome proto"}
137+
}
138+
139+
$ protoscope -s foo.txt > foo.bin # Reassemble.
140+
$ xxd foo.bin
141+
00000000: 082b 1225 d202 226d 7920 6576 656e 206d .+.%.."my even m
142+
00000010: 6f72 6520 6177 6573 6f6d 6520 6177 6573 ore awesome awes
143+
00000020: 6f6d 6520 7072 6f74 6f ome proto
144+
```
145+
146+
The `-message-type` option from above can be used when you know the schema to make it easier
147+
to find specific fields.
148+
149+
### Describing Invalid Binaries
150+
151+
Because Protoscope has a very weak understanding of Protobuf, it can be used to create
152+
invalid encodings to verify that some invariant is actually checked by a production parser.
153+
154+
For example, the following Protoscope text can be used to create a test that ensures
155+
a too-long length prefix is rejected as invalid.
156+
157+
```
158+
1: {
159+
2:LEN 5 # Explicit length prefix.
160+
"oops" # One byte too short.
161+
}
162+
```
163+
164+
This is more conveinent than typing out bytes by hand, because Protoscope takes care of
165+
tedious details like length prefixes, varint encoding, float encoding, and other things
166+
not relevant to the test. It also permits comments, which can be used to specify why the
167+
Protoscope snippet produces a broken binary.
168+
169+
Protoscope itself generates test data using Protoscope, which is then checked in. Other
170+
projects can either check in binary data directly, or use the build system to invoke
171+
`protoscope`, such as with a Bazel `genrule()`.
172+
173+
## Backwards Compatibility
31174
32175
The Protoscope language itself may be extended over time, but the intention is
33176
for extensions to be backwards-compatible. Specifically:

0 commit comments

Comments
 (0)