|
| 1 | +# Working with global data |
| 2 | + |
| 3 | +Globally defined data is the second most important thing a reverse-engineer can find in a binary (the first |
| 4 | +most important is of course the code itself). That's why Ghidralib includes many helpful utilities |
| 5 | +to work with it. Most important Ghidralib wrappers used to work with global data are: |
| 6 | + |
| 7 | +* [Data](reference.md#ghidralib.Data) - represents a fragment of binary that is used to store a piece of data. Wraps `ghidra.program.model.listing.Data`. |
| 8 | +* [DataType](reference.md#ghidralib.DataType) - all data objects have an assigned type, that determines many things, including the way it's displayed, decompiled and more. Wraps `ghidra.program.model.data.DataType`. |
| 9 | + |
| 10 | + |
| 11 | +### Defining data |
| 12 | + |
| 13 | +When one runs auto-analysis, large chunks of the program are automatically analysed and marked as code |
| 14 | +or data. But sometimes, during analysis, we discover a new piece of data that was not previously |
| 15 | +defined. We may want to automate adding it. Ghidra's FlatProgramAPI is pretty good here - we |
| 16 | +have a lot of functions like `createByte`, `createChar`, `createDouble`, `createDWord`, etc. |
| 17 | + |
| 18 | +But one very annoying problem with them is that they raise an exception when a data is already defined there. |
| 19 | +For example, given: |
| 20 | + |
| 21 | +```asm |
| 22 | +00457994 34 32 dw 3234h |
| 23 | +``` |
| 24 | + |
| 25 | +When we attempt to: |
| 26 | + |
| 27 | +```python |
| 28 | +createByte(toAddr(0x0457994)) # remember that you need toAddr here |
| 29 | +``` |
| 30 | + |
| 31 | +We'll get a long exception about conflicting data types. With Ghidralib we can do it |
| 32 | +a bit more safely by leveraging `Program.create_data`: |
| 33 | + |
| 34 | +```python |
| 35 | +data = Program.create_data(0x0457994, "byte") |
| 36 | +``` |
| 37 | + |
| 38 | +Or alternatively, using a DataType object: |
| 39 | + |
| 40 | +```python |
| 41 | +data = DataType("byte").create_at(0x0457994) |
| 42 | +``` |
| 43 | + |
| 44 | +As usual, we can also access the existing defined data: |
| 45 | + |
| 46 | +```python |
| 47 | +data = Data(0x0457994) # Get by address |
| 48 | +data = Data("DAT_00457994") # Get by name, if exists |
| 49 | +``` |
| 50 | + |
| 51 | +With a `Data` instance we can easily access a lot of information, but most importantly we can: |
| 52 | + |
| 53 | +* Access it's address, size, raw bytes, etc |
| 54 | + |
| 55 | +```python |
| 56 | +>>> Data(0x0400078).address |
| 57 | +4194424L |
| 58 | +>>> Data(0x0400078).length |
| 59 | +248 |
| 60 | +``` |
| 61 | + |
| 62 | +As a fun exercise, like with everything that occupies bytes in the binary address space, we |
| 63 | +can also highlight it in the listing: |
| 64 | + |
| 65 | +```python |
| 66 | +Data(0x0400078).highlight()s |
| 67 | +``` |
| 68 | + |
| 69 | +* Get its type with `data_type` or `base_data_type`. |
| 70 | + |
| 71 | +```python |
| 72 | +>>> Data(0x0457994).data_type |
| 73 | +word |
| 74 | +``` |
| 75 | + |
| 76 | +* Introspect it, for example `is_pointer`, `is_constant`, `is_writable`, `is_array`, `is_structure`, etc. |
| 77 | + |
| 78 | +```python |
| 79 | +>>> Data(0x0400078).is_pointer |
| 80 | +False |
| 81 | +>>> Data(0x0400078).is_writable |
| 82 | +False |
| 83 | +``` |
| 84 | + |
| 85 | +* For primitive types, cast it to a Python type (when it makes sense): |
| 86 | + |
| 87 | +```python |
| 88 | +>>> Data(0x0457994).value |
| 89 | +0x3234 |
| 90 | +``` |
| 91 | + |
| 92 | +* For structures, access the nested fields with no boilerplate: |
| 93 | + |
| 94 | +```python |
| 95 | +>>> Data(0x0400000).e_magic |
| 96 | +char[2] "MZ" |
| 97 | +>>> Data(0x0400000).e_magic.value |
| 98 | +'MZ' |
| 99 | +>>> Data(0x400078).OptionalHeader.DataDirectory[1].Size |
| 100 | +ddw 8Ch |
| 101 | +``` |
| 102 | + |
| 103 | +### Data types |
| 104 | + |
| 105 | +Every `Data` object has a type assigned. Types are represented by a |
| 106 | +[DataType](reference.md#ghidralib.DataType) object. It can be used to query information about how |
| 107 | +that data behaves. |
| 108 | + |
| 109 | +It's possible to get the type by name, or to enumarate all data types: |
| 110 | + |
| 111 | +```python |
| 112 | +>>> len(DataType.all()) |
| 113 | +110528 |
| 114 | +>>> DataType("IMAGE_OPTIONAL_HEADER32") |
| 115 | +/PE/IMAGE_OPTIONAL_HEADER32 |
| 116 | +pack(disabled) |
| 117 | +Structure IMAGE_OPTIONAL_HEADER32 { |
| 118 | + 0 word 2 Magic "" |
| 119 | + 2 byte 1 MajorLinkerVersion "" |
| 120 | + 3 byte 1 MinorLinkerVersion "" |
| 121 | +... |
| 122 | +``` |
| 123 | + |
| 124 | +Currently Ghidralib has a limited support for data type introspetion - it's |
| 125 | +possible to get the type name, size in bytes, and not much more. For more advanced operations, |
| 126 | +it may be necessary to use the raw Java object directly. For example: |
| 127 | + |
| 128 | +```python |
| 129 | +>>> DataType("IMAGE_OPTIONAL_HEADER32").raw.getPathName() |
| 130 | +u'/PE/IMAGE_OPTIONAL_HEADER32' |
| 131 | +>>> DataType("IMAGE_OPTIONAL_HEADER32").raw.getDescription() |
| 132 | +u'' |
| 133 | +>>> DataType("IMAGE_OPTIONAL_HEADER32").raw.getAlignment() |
| 134 | +1 |
| 135 | +``` |
| 136 | + |
| 137 | +As usual, in the future missing wrappers may be added. |
| 138 | + |
| 139 | +One interesting feature is C code parsing, for example: |
| 140 | + |
| 141 | +```python |
| 142 | +>>> DataType.from_c('typedef void* HINTERNET;') |
| 143 | +HINTERNET |
| 144 | +>>> DataType.from_c("struct test { short a; short b; short c;};") |
| 145 | +pack() |
| 146 | +Structure test { |
| 147 | +0 short 2 a "" |
| 148 | +2 short 2 b "" |
| 149 | +4 short 2 c "" |
| 150 | +} |
| 151 | +Length: 6 Alignment: 2 |
| 152 | +``` |
| 153 | + |
| 154 | +Adding a data type programatically is sometimes much easier than doing it manually in the structure editor. |
| 155 | + |
0 commit comments