Skip to content

Commit 0a176c8

Browse files
committed
docs: document Data and DataType
1 parent 99412b5 commit 0a176c8

File tree

6 files changed

+218
-40
lines changed

6 files changed

+218
-40
lines changed

docs/data.md

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Working with global data
2+
3+
Globally defined data is the second most important thing a reverse-engineer can find in a binary (the first
4+
most important is of course the code itself). That's why Ghidralib includes many helpful utilities
5+
to work with it. Most important Ghidralib wrappers used to work with global data are:
6+
7+
* [Data](reference.md#ghidralib.Data) - represents a fragment of binary that is used to store a piece of data. Wraps `ghidra.program.model.listing.Data`.
8+
* [DataType](reference.md#ghidralib.DataType) - all data objects have an assigned type, that determines many things, including the way it's displayed, decompiled and more. Wraps `ghidra.program.model.data.DataType`.
9+
10+
11+
### Defining data
12+
13+
When one runs auto-analysis, large chunks of the program are automatically analysed and marked as code
14+
or data. But sometimes, during analysis, we discover a new piece of data that was not previously
15+
defined. We may want to automate adding it. Ghidra's FlatProgramAPI is pretty good here - we
16+
have a lot of functions like `createByte`, `createChar`, `createDouble`, `createDWord`, etc.
17+
18+
But one very annoying problem with them is that they raise an exception when a data is already defined there.
19+
For example, given:
20+
21+
```asm
22+
00457994 34 32 dw 3234h
23+
```
24+
25+
When we attempt to:
26+
27+
```python
28+
createByte(toAddr(0x0457994)) # remember that you need toAddr here
29+
```
30+
31+
We'll get a long exception about conflicting data types. With Ghidralib we can do it
32+
a bit more safely by leveraging `Program.create_data`:
33+
34+
```python
35+
data = Program.create_data(0x0457994, "byte")
36+
```
37+
38+
Or alternatively, using a DataType object:
39+
40+
```python
41+
data = DataType("byte").create_at(0x0457994)
42+
```
43+
44+
As usual, we can also access the existing defined data:
45+
46+
```python
47+
data = Data(0x0457994) # Get by address
48+
data = Data("DAT_00457994") # Get by name, if exists
49+
```
50+
51+
With a `Data` instance we can easily access a lot of information, but most importantly we can:
52+
53+
* Access it's address, size, raw bytes, etc
54+
55+
```python
56+
>>> Data(0x0400078).address
57+
4194424L
58+
>>> Data(0x0400078).length
59+
248
60+
```
61+
62+
As a fun exercise, like with everything that occupies bytes in the binary address space, we
63+
can also highlight it in the listing:
64+
65+
```python
66+
Data(0x0400078).highlight()s
67+
```
68+
69+
* Get its type with `data_type` or `base_data_type`.
70+
71+
```python
72+
>>> Data(0x0457994).data_type
73+
word
74+
```
75+
76+
* Introspect it, for example `is_pointer`, `is_constant`, `is_writable`, `is_array`, `is_structure`, etc.
77+
78+
```python
79+
>>> Data(0x0400078).is_pointer
80+
False
81+
>>> Data(0x0400078).is_writable
82+
False
83+
```
84+
85+
* For primitive types, cast it to a Python type (when it makes sense):
86+
87+
```python
88+
>>> Data(0x0457994).value
89+
0x3234
90+
```
91+
92+
* For structures, access the nested fields with no boilerplate:
93+
94+
```python
95+
>>> Data(0x0400000).e_magic
96+
char[2] "MZ"
97+
>>> Data(0x0400000).e_magic.value
98+
'MZ'
99+
>>> Data(0x400078).OptionalHeader.DataDirectory[1].Size
100+
ddw 8Ch
101+
```
102+
103+
### Data types
104+
105+
Every `Data` object has a type assigned. Types are represented by a
106+
[DataType](reference.md#ghidralib.DataType) object. It can be used to query information about how
107+
that data behaves.
108+
109+
It's possible to get the type by name, or to enumarate all data types:
110+
111+
```python
112+
>>> len(DataType.all())
113+
110528
114+
>>> DataType("IMAGE_OPTIONAL_HEADER32")
115+
/PE/IMAGE_OPTIONAL_HEADER32
116+
pack(disabled)
117+
Structure IMAGE_OPTIONAL_HEADER32 {
118+
0 word 2 Magic ""
119+
2 byte 1 MajorLinkerVersion ""
120+
3 byte 1 MinorLinkerVersion ""
121+
...
122+
```
123+
124+
Currently Ghidralib has a limited support for data type introspetion - it's
125+
possible to get the type name, size in bytes, and not much more. For more advanced operations,
126+
it may be necessary to use the raw Java object directly. For example:
127+
128+
```python
129+
>>> DataType("IMAGE_OPTIONAL_HEADER32").raw.getPathName()
130+
u'/PE/IMAGE_OPTIONAL_HEADER32'
131+
>>> DataType("IMAGE_OPTIONAL_HEADER32").raw.getDescription()
132+
u''
133+
>>> DataType("IMAGE_OPTIONAL_HEADER32").raw.getAlignment()
134+
1
135+
```
136+
137+
As usual, in the future missing wrappers may be added.
138+
139+
One interesting feature is C code parsing, for example:
140+
141+
```python
142+
>>> DataType.from_c('typedef void* HINTERNET;')
143+
HINTERNET
144+
>>> DataType.from_c("struct test { short a; short b; short c;};")
145+
pack()
146+
Structure test {
147+
0 short 2 a ""
148+
2 short 2 b ""
149+
4 short 2 c ""
150+
}
151+
Length: 6 Alignment: 2
152+
```
153+
154+
Adding a data type programatically is sometimes much easier than doing it manually in the structure editor.
155+

docs/emulator.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,6 @@ This convenience wrapper is equivalent to the following code:
395395
Some other objects also provide helpers to do the obvious thing with emulator.
396396
For example, you can emulate a function call with:
397397

398-
399398
```python
400399
>>> emu = Function("test").emulate(10)
401400
>>> emu["EAX"]
@@ -418,7 +417,6 @@ library and `ghidralib` won't replace it. The only goal is really so Unicorn use
418417
can use familiar names if they forget ghidralib equivalents. If you are not
419418
an Unicorn user, don't use them.
420419

421-
422420
### Learn more
423421

424422
Check out relevant examples in the `examples` directory, especially:

docs/getting_started.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -374,9 +374,9 @@ I personally use is VS Code with Python extensions. If you install
374374
VsCode/VsCodium, a Python extension, and just drop ghidralib.py
375375
in the ghidra_scripts directory, then everything should "just work".
376376

377-
If for some reason your script lives in a different directory than
378-
ghidralib, override the PYTHONPATH so the typechecker knows how to
379-
import it:
377+
If ghidralib is not installed via pip, and your script lives in a
378+
different directory than ghidralib, override the PYTHONPATH so
379+
typechecker knows how to import it:
380380

381381
```json
382382
{

docs/index.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,16 @@ for a direct download link.
2929

3030
A tutorial describing specific features is in development. Finished chapters include:
3131

32-
* [Getting Started](./getting_started.md), with a brief description of useful API functions (recommended).
33-
* [Emulator](https://msm-code.github.io/ghidralib/emulator/)
32+
* [Getting Started](./getting_started.md) - a brief description of useful API functions (recommended).
33+
* [Emulator](./emulator.md) - detailed guide to using the emulator wrapper.
34+
* [Working with global data](./data.md) - basic information about working with global data.
3435

35-
If you prefer to learn by example, you can browse the [examples](https://github.com/msm-code/ghidralib/tree/master/examples).
36+
If you prefer to **learn by example**, you can browse the [examples](https://github.com/msm-code/ghidralib/tree/master/examples).
3637

37-
You can find the autogenerated API documentation [here](./reference.md).
38+
You can also read the **autogenerated API documentation** [here](./reference.md).
3839

3940
When in doubt, check out the source code at [Github](https://github.com/msm-code/ghidralib)
4041

4142
A fair warning: ghidralib is still actively developed and the API may change
42-
in the future. But this doesn't matter for your one-off scripts, does it?
43+
slightly in the future. But this doesn't matter for your one-off scripts, does it?
4344
Current compatibility status is documented [here](./compatibility.md).

0 commit comments

Comments
 (0)