Skip to content

Commit 9d8c51d

Browse files
ivggitoleg
andauthored
adds the symbolic executor (#1105)
* changes the trace representation and some initial design * implements deduplication and classification incidence * enables wide generators Generators now are able to produce objects of random width modulo that the random value underneath the hood is still an integer. TODO: drop the width parameter from the byte generator. * hardcodes generators for registers and memory in promiscuous mode * makes wide generators available through memory-allocate * disables alloc-on-free hack it produces lots of false positives due to unwanted aliasing. * fixes the memcheck rules for strn* family of functions * EXPERIMENTAL: implements errno support It uses malloc to allocate a place for the register and lisp parameters to store the location. We should also consider clearing errno as soon as possible as it is the source of aliasing and lots of UAF and DF false positives. * EXPERIMENTAL: enables buffer-on-creation overflow tracking basically, the idea is that each function that defines a buffer or a string (e.g., malloc, strdup, memcpy, etc) imposes constraints on the buffer that we may track. For example, if we break the bounds of the strdup generated string it could be a soft alarm. * revamps Primus Generators We're now using a much more efficient MSG algorithm with the implementation that uses the new Bitvec module, so that we can now generate arbitrary number of random bits and specify any range we like. * enables wide generators for memory * hushes ida error message it is just a warning, there is no need to overreact * adds the generated observations for undefined vars and pointers * the initial implementation of the Primus symbolic executor * adds the oasis file for the symbolic executor * makes z3 typechecker happy coerces boolean operations to bitvectors (we will keep booleans only for actual logic formulas, i.e., for meta logic - to answer questions whether a path is feasible and so on). * abstracts the input type (previously known as origin) * adds the solver * implements eq and neq using only only the BitVector logic * adds salt to generators Now each generator will generate its own sequence. Note, we still need to create a separate generator for each independent data location. * UNFINISHED: publishes the of_iterator function TODO: document it TODO: publish bitvector interfaces * adds the random plugin this plugin will control all the random generators... as much as they could be controlled * makes the generators syntax a little bit less ugly * implements memory randomization * registers symbolic executor components using the new interface only * splits promiscuous mode into several independent components * restructures the core system * adds the translation from formula values to words * don't resubscribe printers every time a new system is started * implements the master-forker process * fixes a few more z3/bil discrepancies * propagates inputs from the worker to master * fixes the worklist filter Now we won't solve branches that were already visited by some other process. * moves mark-visited to library since so many components anyway track the visited terms it is better to have it available for everyone, instead of reduplicating the work. * symbolic executor now uses the mark-visited component so that we can propagate visited terms between different runs. * WIP: enables run-until-completion mode in which we run until all terms are visited. * cleans up the symexec code * adds a few more lisp stubs * adds the [del] operation to Env and Memory This operation deletes the bindings of the variable or an address. * a new component for handling unresolved calls The component will prevent linker from raising the unresolved call exception by trapping it. Note, that it will conflict with the default trap that is unconditionally set in the primus.lisp file, which we will remove in the next commit. * removes the primus.lisp feature First of all this feature shouldn't be in the posix module, second it was used to install the handler. * assumes the default C prototype if no available * fixes the component name * adds constraints to the formula solver and computer * records the path constraint on each taken jump * adds assume and assert primitives * prefixes assert/assume * fixes ordering primitives they weren't using binops * fixes a bug with missing constraints * adds lisp primitives to the symbolic computation system * adds the z3 dependency * fixes more primitives * fixes the component name in the legacy-main system * fixes a typo in the abi specifications * don't hush errors from Primus * moves and fixes subroutine pertubator it is now in the primus-random plugin and it is possible to control each argument individually * adds the global seed parameter * documents and cleans up the primus-random plugin also adds the global seed parameter and an option to keep page protection. * enables the upper limit if the limiter is used It was quite unintuitive that to limit the maximum path you need to enable the limiter and then set the limit to some finite number as by default it is unlimited. This were making sense, however, in the old days, when we had only one system and all components were automatically included into it. Therefore, to prevent the limiter from jumping in into the old style bap:legacy-main and to enable seamless backward compatibility, we did the following: 1) When either --primus-limit-length or --primus-limit-max-depth are specified we add the component to the legacy-main system. 2) We also add this component when the promiscuous mode is enabled. * fixes error messages in the main library * publishes the ite operator in the interpreter interface * rewrites the malloc allocator * prettifies the error messages on the disassemble command * FIXP to malloc * reimplements errno without malloc * FIXUP on limiter * enables autocoercion in the Primus interpreter * evaluates conditions in Primus Lisp transparently * adds a few more primitives to the region library - region-lower and region-upper return the corresponding bounds of the region. - region-count counts the number of regions in the set of regions. * adds three more primitives to Lisp dictionaries - dict-first d - is the first key in the dictionary; - dict-next d k - is the next key after k - dict-length d - is the length The first two functions enables iteration over dictionaries. * a couple of bugfixes in the lisp code * adds prototypes to the ctypes functions * cleans up the error messages in the run plugin * adds a new eval-lisp command This command can be used to evaluate lisp code without having to provide a binary. * drops bogus stubs * implements support for modern C runtime In the modern C runtime the __libc_start_main function is called not via the PLT indirection, but through a .got.plt (or .got.plt, can't remember which is which) table which is zeroed by default (and it is expected that the loaded will fill in the values from the relocation table). Our existing support for C runtime was searching the libc_start_main location from the entry point - the first function called by the entry point is assumed to be th libc_start_main, and the first argument of that function is assumed to the main function. The problem is that since there is no longer the plt entry for libc_start_main we can't really link its stub, so the fix is simple, just insert the synthetic subroutine for it. And everything else works more or less fine. This commit also adds the stack canary provision code, for systems that require it. It allocates the space for the canary and initializes the FS_BASE variable. Sine FS_BASE is not added to the Primus environment by default we also modify the x86-primus-loader plugin and initialize all registers from x86 family (including the ymms registers). * makes branching and looping interpreted. * fixes the path condition computation in the symbolic executor the false conditions should be refuted * makes symbols interning global so that different machines have the same vision on symbols * opens the bap package in primus systems to enable writing the system name without the bap package * implements primus closure context this makes it easier to define several closures in the same module. * makes Env.mem public * makes addr_size and data_size public in the Descriptor module * implements the symbolic primitives now it is possible to create symbolic values and symbolic memories * switches to Z.t from int64 as the word representation in Lisp * simplifies the lisp primitives module collapses all primitives into a single module, less code and merlin is no that mad * adds the system name and its components to the lisp context * the initial implementation of the symbolic input/output * intialializes global and static variables in Primus Lisp We don't treat static any differently right now. Later we can enable their mangling so that the static variable is only available in the scope of its definition. * tweaks symbolic files to start with a small initial size this is a workaround of a bigger problem that we gonna handle later - how to deal with large memory mappings. Also relevant for malloc. * do not restart when no solution found * [RANDOM] fixes a bug in the remembering random values We were emitting the generated event for values read from static memory. * enables a deeper search of the execution tree Instead of investigating each linearly independent path we can re-execute the same path several times, depending on how much information it will give us. Still highly experimental. * disables handling of OCaml exceptions in the closure application We can't really catch them on that level, we need to modify the bind operator. And we don't want to do this (even if we can), because this will lead to stack overflows, as each binding will push an exception handler to the stack. * fixes the Lisp `msg` operator and tweaks the symbol printing The OCaml side-effectful computations were escaping from the monad, so the messages were sometimes clobbered. Also tweaked printing the symbols, if a value has an interned symbol we print it. There could be some false-positives, so probably we could also check the size of symbol or introduce tagging. * Lisp global variables should be properly initialized. * fixes symbolic stdin initialization also declares the symbolic context for all definitions in the file. * fixes a potential bug in symbol-of-string primitive do not use side-effects inside a monad. * tracks only symbolic values in the symbolic computer. There is no need to store numerals as we can always compute them on flight. We are also no longer creating tasks for non-symbolic path conditions. * switches to the partial model evaluation * chains tasks into trees Each task now puts the parent task into its constraints. Note, it would probably be more optimal to to store the models, but I am not sure that it would be correct. On the first thought, if we will inherit the model from the parent task we will significantly limit the solution space. * threads fgets via fgetc to enable symbolic fgets also fixes a bug in fgets that was never returning an EOF. * fixes symbolic memory generators They are generating random values of correct width * implements symbolic streams * adds the missing update_hash operation we were running all jobs under the same stack-hash. * fixes the Primus destructor behavior The destructor wasn't called when the exception was thrown from the start handler. Also adds the fini callback for finilization, symmetical to the init. (Honestly, not sure is it symmetrical, probably it should be run, _after_ the fini observation, then it will be symmetrical, will think about it later). * fixes parent refutation and keeps parents inputs First of all, parent task constraints weren't refuted and were added as if we were always taking the yes-branch. We hotfix it to the negation of the condition, when it was refuted, but in general we should rething this parent constraints. Next, we didn't consider the parent inputs so while the model included them they were never set, since the child thread wasn't considering them as inputs (as they were already set). * a massive cleanup of the SMT interface * disables input creation on symbolic-{value,memory} operations We do not create new inputs explicitly but let the generators to do their work. As a result if the input is a part of the model it will no longer be treated as symbolic. * adds symbolic getchar * stores the symbolic context in a separate component * stops spawning tasks before they are solved * switches to priority queue for the worklist right now the priority is just LIFO as it was before but we later may prioritize by the amount of information that task will provide * adds the model of fflush and flushes after each puts * adds the model of fdopen and fileno * fixes the fgets stub There were a few off-by-one, forgot-the-corner-case bugs that were revealed by the symbolic executor. Good boy, symbolic executor. * fixes tons of bugs in the symbolic stdio * implements the comparable interface for Formula * removes the dependency on 'a printer from Regular in Bap.Std it was impossible to use the printing functions without linking in the regular library. * adds the symexec tests * adds the progress report to the symbolic executor. * changes the cutoff algorithm * code cleanup * adds the fflush prototype * adds the only-queue option to the run-entry-points parameter * adds crc32 test to the testsuite forgot to commit it * adds symbols to the test binaries to enable test passing on Travis * moves the glibc runtime support code to a separate plugin Not only it is possible to enable/disable it, but it is also now possible to provide alternative competing runtime support without modifying the existing code. However, the main motivaiton to move it was the desire to get access to the project data structure that enables a much more robust runtime identification that works even in the absence of symbols. * fixes type errors in strcat and strncat fixes a type error in strcat fixes an error in strncat * fixes issues in the tests * uses core_kernel in primus-random to enable older ocaml versions * fixes the cache test * fixes few small bugs 1) the greedy-promiscuous-executor was still mentioned by other systems 2) yet another type error in strings * uses static binding for the errno-location instead of the parameters * renames symbolic-io to symbolic-stdio adds symbolic-stdlib The convention is for each concrete feature `xxx` that needs a symbolic counterpart to provide the `symbolic-xxx` feature. The stdlib is right now empty we will start to fill it in with the stuff after the release. * adds the new Memory.add_region function It is the same as the old `allocate` (which is still available, no worries), but takes two words instead of one word and int, which makes it possible to create regions whose size do not fit into OCaml int, which is quite possible during execution in a randomized mode. And makes the code much eaiser to write as now we don't need to deal with conversion errors. * switches to add_region in primus-loader * switches the memory-allocate primitive to add_region it will no longer fail on big numbers * primus random now also uses add_region so it is possible to map the whole 2^64 memory in one run * switches the symbolic memory to add_region * adds the documentation for the Generator.of_iterator function * drops a bogus observation Co-authored-by: oleg <[email protected]>
1 parent 53388e9 commit 9d8c51d

File tree

85 files changed

+4420
-1480
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+4420
-1480
lines changed

OMakefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ include _oasis_lib.om
2727
# At this point you may fine-tune the configuration, and modify defaults:
2828
#
2929
# OCAML_LIB_MODULE_SUFFIXES += .ml (install also .ml files with libs)
30-
# OCAMLFINDFLAGS += -verbose (be verbose)
31-
# OCAMLFLAGS += -bin-annot (create .cmt/.cmti files)
30+
# OCAMLFINDFLAGS += -thread
31+
OCAMLFLAGS += -bin-annot # (create .cmt/.cmti files)
3232

3333
OCAMLFLAGS_ANNOT = -annot -bin-annot
3434
OCAMLFLAGS += $(OCAMLFLAGS_ANNOT)

lib/bap/bap.mli

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -680,7 +680,7 @@ module Std : sig
680680

681681
(** Lazy sequence *)
682682
module Seq : module type of Seq
683-
with type 'a t = 'a Sequence.t
683+
with type 'a t = 'a Base.Sequence.t
684684
(** type abbreviation for ['a Sequence.t] *)
685685
type 'a seq = 'a Seq.t [@@deriving bin_io, compare, sexp]
686686

@@ -776,7 +776,7 @@ module Std : sig
776776
777777
will create a printer for a [String.Trie] that is populated by
778778
integers. *)
779-
val pp : 'a printer -> 'a t printer
779+
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
780780
end
781781
end
782782
module V2 : sig
@@ -1339,7 +1339,7 @@ module Std : sig
13391339
Note, the [printf] function from examples refers to the
13401340
[Format.printf], thus it is assumed that the [Format] module
13411341
is open in the scope. *)
1342-
val pp : t printer
1342+
val pp : Format.formatter -> t -> unit
13431343

13441344
(** [printf "%a" pp_hex x] prints [x] in the hexadecimal format omitting
13451345
suffixes, and the prefix if it is not necessary.
@@ -1353,7 +1353,7 @@ module Std : sig
13531353
]}
13541354
@since 1.3
13551355
*)
1356-
val pp_hex : t printer
1356+
val pp_hex : Format.formatter -> t -> unit
13571357

13581358
(** [printf "%a" pp_dec x] prints [x] in the decimal format omitting
13591359
suffixes and prefixes.
@@ -1367,7 +1367,7 @@ module Std : sig
13671367
]}
13681368
@since 1.3
13691369
*)
1370-
val pp_dec : t printer
1370+
val pp_dec : Format.formatter -> t -> unit
13711371

13721372
(** [printf "%a" pp_oct x] prints [x] in the octal format omitting
13731373
suffixes, and the prefix if it is not necessary.
@@ -1379,7 +1379,7 @@ module Std : sig
13791379
# printf "%a\n" pp_oct (Word.of_int32 0x1);;
13801380
1
13811381
]} *)
1382-
val pp_oct : t printer
1382+
val pp_oct : Format.formatter -> t -> unit
13831383

13841384
(** [printf "%a" pp_bin x] prints [x] in the binary (0 and 1) format omitting
13851385
suffixes, and the prefix if it is not necessary.
@@ -1393,7 +1393,7 @@ module Std : sig
13931393
]}
13941394
@since 1.3
13951395
*)
1396-
val pp_bin : t printer
1396+
val pp_bin : Format.formatter -> t -> unit
13971397

13981398
(** [printf "%a" pp_hex x] prints [x] in the hexadecimal format omitting
13991399
suffixes, and the prefix if it is not necessary.
@@ -1405,7 +1405,7 @@ module Std : sig
14051405
# printf "%a\n" pp_hex (Word.of_int32 0x1);;
14061406
1
14071407
]} *)
1408-
val pp_hex : t printer
1408+
val pp_hex : Format.formatter -> t -> unit
14091409

14101410
(** [printf "%a" pp_dec x] prints [x] in the decimal format omitting
14111411
suffixes and prefixes.
@@ -1419,7 +1419,7 @@ module Std : sig
14191419
]}
14201420
@since 1.3
14211421
*)
1422-
val pp_dec : t printer
1422+
val pp_dec : Format.formatter -> t -> unit
14231423

14241424
(** [printf "%a" pp_oct x] prints [x] in the octal format omitting
14251425
suffixes, and the prefix if it is not necessary.
@@ -1433,7 +1433,7 @@ module Std : sig
14331433
]}
14341434
@since 1.3
14351435
*)
1436-
val pp_oct : t printer
1436+
val pp_oct : Format.formatter -> t -> unit
14371437

14381438
(** [printf "%a" pp_bin x] prints [x] in the binary (0 and 1)
14391439
format omitting suffixes, and the prefix if it is not necessary.
@@ -1447,7 +1447,7 @@ module Std : sig
14471447
]}
14481448
@since 1.3
14491449
*)
1450-
val pp_bin : t printer
1450+
val pp_bin : Format.formatter -> t -> unit
14511451

14521452
(** [printf "%a" pp_hex_full x] prints [x] in the hexadecimal format with
14531453
suffixes, and the prefix if it is necessary.
@@ -1459,7 +1459,7 @@ module Std : sig
14591459
# printf "%a\n" pp_hex_full (Word.of_int32 0x1);;
14601460
1:32u
14611461
]} *)
1462-
val pp_hex_full : t printer
1462+
val pp_hex_full : Format.formatter -> t -> unit
14631463

14641464
(** [printf "%a" pp_dec_full x] prints [x] in the decimal format with
14651465
suffixes and prefixes.
@@ -1473,7 +1473,7 @@ module Std : sig
14731473
]}
14741474
@since 1.3
14751475
*)
1476-
val pp_dec_full : t printer
1476+
val pp_dec_full : Format.formatter -> t -> unit
14771477

14781478
(** [printf "%a" pp_oct_full x] prints [x] in the octal format with
14791479
suffixes, and the prefix if it is necessary.
@@ -1485,7 +1485,7 @@ module Std : sig
14851485
# printf "%a\n" pp_oct_full (Word.of_int32 0x1);;
14861486
1:32u
14871487
]} *)
1488-
val pp_oct_full : t printer
1488+
val pp_oct_full : Format.formatter -> t -> unit
14891489

14901490
(** [printf "%a" pp_bin_full x] prints [x] in the binary (0 and 1)
14911491
format omitting suffixes, and the prefix if it is necessary.
@@ -1497,7 +1497,7 @@ module Std : sig
14971497
# printf "%a\n" pp_bin_full (Word.of_int32 0x1);;
14981498
1:32u
14991499
v} *)
1500-
val pp_bin_full : t printer
1500+
val pp_bin_full : Format.formatter -> t -> unit
15011501

15021502
(** [pp_generic ?case ?prefix ?suffix ?format ppf x] - a printer
15031503
to suit all tastes.
@@ -1534,7 +1534,7 @@ module Std : sig
15341534
?prefix:[`auto | `base | `none | `this of string ] ->
15351535
?suffix:[`none | `full | `size ] ->
15361536
?format:[`hex | `dec | `oct | `bin ] ->
1537-
t printer
1537+
Format.formatter -> t -> unit
15381538

15391539
(** [string_of_value ?hex x] returns a textual representation of
15401540
the [x] value, i.e., ignores size and signedness. If [hex] is
@@ -1969,15 +1969,15 @@ module Std : sig
19691969
val slot : (Theory.Program.Semantics.cls, stmt list) Knowledge.slot
19701970

19711971
(** [printf "%a" pp_binop op] prints a binary operation [op]. *)
1972-
val pp_binop : binop printer
1972+
val pp_binop : Format.formatter -> binop -> unit
19731973

19741974
(** [printf "%a" pp_unop op] prints an unary operation [op] *)
1975-
val pp_unop : unop printer
1975+
val pp_unop : Format.formatter -> unop -> unit
19761976

19771977
(** [printf "%a" pp_cast t] prints a cast type [t]
19781978
@since 1.3
19791979
*)
1980-
val pp_cast : cast printer
1980+
val pp_cast : Format.formatter -> cast -> unit
19811981

19821982
(** [string_of_binop op] is a textual representation of [op].
19831983
@since 1.3
@@ -3614,7 +3614,7 @@ module Std : sig
36143614
val eval : t -> Bil.value
36153615

36163616
include Regular.S with type t := t
3617-
val pp_adt : t printer
3617+
val pp_adt : Format.formatter -> t -> unit
36183618
end
36193619

36203620
(** [Regular] interface for BIL statements *)
@@ -3895,7 +3895,7 @@ module Std : sig
38953895

38963896
include Regular.S with type t := t
38973897

3898-
val pp_adt : t printer
3898+
val pp_adt : Format.formatter -> t -> unit
38993899
end
39003900

39013901
(** Architecture *)
@@ -4432,7 +4432,7 @@ module Std : sig
44324432

44334433
(** [pp pp_elem] creates a vector printer that uses [pp_elem] to
44344434
print elements. *)
4435-
val pp : 'a printer -> 'a t printer
4435+
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
44364436
end
44374437

44384438
(** BAP IR.
@@ -5214,7 +5214,7 @@ module Std : sig
52145214
val elements : ('a t -> 'a seq) ranged
52155215

52165216
(** [pp printer] - creates a printer for table from value printer *)
5217-
val pp : 'a printer -> 'a t printer
5217+
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
52185218
end
52195219

52205220
(** A locations of a chunk of memory *)
@@ -5687,7 +5687,7 @@ module Std : sig
56875687
include Container.S1 with type 'a t := 'a t
56885688

56895689
(** [pp pp_elem] constracts a printer for a memmap to the given element. *)
5690-
val pp : 'a printer -> 'a t printer
5690+
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
56915691
end
56925692

56935693
(** Symbolizer defines a method for assigning symbolic names to addresses *)
@@ -6321,7 +6321,7 @@ module Std : sig
63216321

63226322
(** [pp_adt] prints instruction in ADT format, suitable for reading
63236323
by evaluating in many languages, e.g. Python, Js, etc *)
6324-
val pp_adt : t printer
6324+
val pp_adt : Format.formatter -> t -> unit
63256325

63266326
(** {3 Prefix Tree}
63276327
This module provides a trie data structure where a sequence of
@@ -8206,10 +8206,10 @@ module Std : sig
82068206
include S with type ('a,'e) state = ('a,'e) Monad.State.t
82078207

82088208
(** print a set of taints *)
8209-
val pp_set : set printer
8209+
val pp_set : Format.formatter -> set -> unit
82108210

82118211
(** print a taint map *)
8212-
val pp_map : map printer
8212+
val pp_map : Format.formatter -> map -> unit
82138213

82148214
module Map : Regular.S with type t = map
82158215
end [@@deprecated "[since 2018-03] use the Bap Taint Framework instead"]
@@ -9052,7 +9052,7 @@ module Std : sig
90529052
value for the ['a] type. *)
90539053
type 'a converter
90549054

9055-
val converter : 'a parser -> 'a printer -> 'a -> 'a converter
9055+
val converter : 'a parser -> (Format.formatter -> 'a -> unit) -> 'a -> 'a converter
90569056

90579057
(** Default deprecation warning message, for easy deprecation of
90589058
parameters. *)

0 commit comments

Comments
 (0)