4_Fuzzing
4_Fuzzing
Security Testing
especially
Fuzzing
Erik Poll
Security in the SDLC
Last week: static analysis aka code review tools (SAST)
Focus of this lecture – and group assignment – is on testing C(++) code for
memory corruption
2
Fuzzing group project
• Form a team with 4 students
• Try out the fuzzing tools (Radamsa, zuff, and afl) with/without
instrumentation for additional checks on memory safety
(valgrind, ASan)
• Optional variations: report any bugs found, check against known CVEs,
test older vs newer release, try different settings or inputs for the tool,
try another fuzzing tool, …
3
Overview
• Testing basics
• Abuse cases & negative tests
• Fuzzing
– Dumb fuzzing
– Mutational Fuzzing
• example: OCPP
– Generational aka grammar-based fuzzing
• example: GSM
– Whitebox fuzzing with SAGE
• looking at symbolic execution of the code
– Evolutionary fuzzing with afl
• grey-box, observing execution of the (instrumented) code
4
Testing basics
5
SUT, test suite & test oracle
To test a SUT (System Under Test) we need two things
2. test oracle
6
Code coverage criteria
Code coverage criteria to measure how good a test suite is include
• statement coverage
• branch coverage
Statement coverage does not imply branch coverage; eg for
void f (int x, y) { if (x>0) {y++};
y--; }
throw (SecurityException); }
}
If the green defensive code, ie. the if- & catch-branches, is hard to
trigger in test, programmers may be tempted (or forced?) to
remove this code to improve test coverage...
8
Abuse cases
&
Negative testing
10
Testing for functionality vs testing for security
• Normal testing will look at right, wanted behaviour for sensible
inputs (aka the happy flow), and some inputs on borderline
conditions
11
Security testing is HARD
space of all possible inputs
. some input
normal
. input that triggers . . . .. inputs
security bug .
12
Abuse cases & negative test cases
• Thinking about abuse cases is a useful way to come up with
security tests
13
Abuse cases – early in the SDCL
14
iOS goto fail SSL bug
...
goto fail;
goto fail;
goto fail;
goto fail;
goto fail;
err = sslRawVerify(...);
. . .
15
Negative test cases for flawed certificate chains
• David Wheeler's 'The Apple goto fail vulnerability: lessons
learned' gives a good discussion of this bug & ways to prevent
it, incl. the need for negative test cases
http://www.dwheeler.com/essays/apple-goto-fail.html
• Code coverage requirements on the test suite would also have helped.
16
Fuzzing
17
The idea
Suppose some C(++) binary asks from some input
Please enter your username
>
What would you try?
2.%x%x%x%x%x%x%x%x
To see if there is a format string vulnerability
• The original form of fuzzing: generate very long inputs and see if
the system crashes with a segmentation fault.
19
Simple fuzzing ideas
What inputs would you use for fuzzing?
20
Pros & cons of fuzzing
Pros
•Very little effort:
– the test cases are automatically generated,
and test oracle is simply looking for
crashes
•Fuzzing of a C/C++ binary can quickly give a good picture of
robustness of the code
Cons
•Will not find all bugs
– For programs that take complex inputs, more work will be
needed to get good code coverage, and hit interesting test
cases. This has led to lots of work on 'smarter' fuzzers.
•Crashes may be hard to analyse; but a crash is a clear true positive
that something is wrong!
– unlike a complaint from a static analysis tool like PREfast
21
Improved crash/error detection
Making systems crash on errors is useful for fuzzing!
22
Types of fuzzers
1) Mutation-based: apply random mutations to set of valid inputs
• Eg observe network traffic, than replay with some modifications
• More likely to produce interesting invalid inputs than just random input
25
Example: Fuzzing OCPP [research internship Ivar
Derksen]
• OCPP is a protocol for charge points to talk to
a back-end server
"retries": 5,
"retryInterval": 30,
"startTime": "2018-10-27T19:10:11",
"stopTime": "2018-10-27T22:10:11" }
26
Example: Fuzzing OCPP
Simple classification of messages into 1 malformed JSON
1.malformed JSON/XML
3.well-formed OCPP
Note: this does not require any understanding of the protocol semantics yet!
Figuring out correct responses to type 3 would.
27
Test results with fuzzing OCPP server
• Mutation fuzzer generated 26,400 variants from 22 example OCPP
messages in JSON format
• Problems spotted by this simple test oracle:
– 945 malformed JSON requests (type 1) resulted in malformed JSON
response
Server should never emit malformed JSN!
– 75 malformed JSON requests (type 1) and 40 malformed OCPP
requests (type 2) result in a valid OCPP response that is not an error
message.
Named after Jon Postel, who wrote early version of TCP spec.
30
CVEs as inspiration for fuzzing file formats
• Microsoft Security Bulletin MS04-028
Buffer Overrun in JPEG Processing (GDI+) Could Allow Code
Execution
Impact of Vulnerability: Remote Code Execution
Maximum Severity Rating: Critical
Recommendation: Customers should apply the update immediately
31
Generation- aka model-based fuzzing
For a given file format or communication protocol, a generational
fuzzer tries to generate files or data packets that are slightly
malformed or hit corner cases in the spec.
Possible starting :
a grammar defining legal inputs,
or a data format specification
33
SMS message fields
Field size
Message Type Indicator 2 bit
Reject Duplicates 1 bit
Validity Period Format 2 bit
User Data Header Indicator 1 bit
Reply Path 1 bit
Message Reference integer
Destination Address 2-12 byte
Protocol Identifier 1 byte
Data Coding Scheme (CDS) 1 byte
Validity Period 1 byte/7 bytes
User Data Length (UDL) integer
User Data depends on CDS and UDL
34
Example: GSM protocol fuzzing
Lots of stuff to fuzz!
35
Example: GSM protocol fuzzing
Fuzzing SMS layer of GSM reveals weird functionality in GSM
standard and in phones
36
Example: GSM protocol fuzzing
Fuzzing SMS layer of GSM reveals weird functionality in GSM
standard and in phones
38
Our results with GSM fuzzing
• Lots of success to DoS phones: phones crash, disconnect from
the network, or stop accepting calls
– eg requiring reboot or battery removal to restart, to accept calls
again, or to remove weird icons
– after reboot, the network might redeliver the SMS message, if no
acknowledgement was sent before crashing, re-crashing phone
But: not all these SMS messages could be sent over real network
• There is surprisingly little correlation between problems and
phone brands & firmware versions
– how many implementations of the GSM stack did Nokia have?
• The scary part: what would happen if we fuzz base stations?
[Fabian van den Broek, Brinio Hond and Arturo Cedillo Torres, Security Testing
of GSM Implementations, Essos 2014]
Example dangerous
SMS text message
40
Example: Fuzzing fonts
Google’s Project Zero found many Windows kernel vulnerabilities by
fuzzing fonts in the Windows kernel
‘https://googleprojectzero.blogspot.com/2017/04/notes-on-windows-uniscribe-fuzzing.html
41
Even handling simple input languages can go wrong!
Sending an extended length APDU can crash a contactless
payment terminal.
[Jordi van den Breekel, A security evaluation and proof-of-concept relay attack
on Dutch EMV contactless transactions, MSc thesis, 2014]
42
Whitebox fuzzing with SAGE
43
Whitebox fuzzing using symbolic execution
• The central problem with fuzzing:
how can we generate inputs that trigger interesting code
executions?
45
Symbolic execution
m(int x,y){ Suppose x = N and y = M.
x = x + y; x becomes N+M
y = y – x; y becomes M-(N+M) = -N
We can use SMT solver (Yikes, Z3, ...) aka constraint solver for this:
given a set of constraints such a tool produces test data that
meets them, or proves that they are not satisfiable.
This generates test data (i) automatically and (ii) with good
coverage
• These tools can also be used in static analyses as in PREfast, or more
generally, for program verification
46
Symbolic execution for test generation
• Symbolic execution can be used to automatically generate test
cases with good coverage
48
SAGE example
Example program
SAGE executes the code for some concrete input, say 'good'
It then collects path constraints for an arbitrary symbolic input of
the form i0i1i2i3
49
Search space for interesting inputs
Based on this one execution, combining the 4 constraints founds
yields 24 = 16 test cases
i0 ≠ 'b' i0 = 'b'
i2 ≠ 'd' i2 = 'd'
i3 ≠ '!' i3 = '!'
Note: the initial execution with the input ‘good’ was not very
interesting, but some of these others are
50
SAGE success
• SAGE proved successful at uncovering security bugs, eg
51
Evolutionary Fuzzing with afl
(American Fuzzy Lop)
52
Evolutionary Fuzzing with afl
• Downside of generation-based fuzzing:
– lots of work work to write code to do the fuzzing, even if you use
tools to generate this code based on some grammar
• Downside of mutation-based fuzzing:
– chance that random changes in inputs hits interesting cases is
small
• afl (American Fuzzy Lop) takes an evolutionary approach to learn
interesting mutations based on measuring code coverage
shared_mem[cur_location ^ prev_location]++;
Intuition: for every jump from src to dest in the code a different
byte in shared_mem is changed.
55
Cool example: learning the JPG file format
• Fuzzing a program that expects a JPG as input, starting with
'hello world' as initial test input, afl can learn to produce legal
JPG files
– along the way producing/discovering error messages such as
• Not a JPEG file: starts with 0x68 0x65
• Not a JPEG file: starts with 0xff 0x65
• Premature end of JPEG file
• Invalid JPEG file structure: two SOI markers
• Quantization table 0x0e was not defined
[Source http://lcamtuf.blogspot.nl/2014/11/pulling-jpegs-out-of-thin-air.html]
56
Vulnerabilities found with afl
IJG jpeg 1
libjpeg-turbo 1 2
libpng 1
libtiff 1 2 3 4 5
mozjpeg 1
PHP 1 2 3 4 5
Mozilla Firefox 1 2 3 4
Internet Explorer 1 2 3 4
Apple Safari 1
LibreOffice 1 2 3 4
poppler 1
freetype 1 2
GnuTLS 1
GnuPG 1 2 3 4
OpenSSH 1 2 3
PuTTY 1 2
ntpd 1 2
nginx 1 2 3
bash (post-Shellshock) 1 2
tcpdump 1 2 3 4 5 6 7 8 9
JavaScriptCore 1 2 3 4
pdfium 1 2
ffmpeg 1 2 3 4 5
libmatroska 1
BIND 1 2 3 ...
QEMU 1 2
lcms 1
Oracle BerkeleyDB 1 2
Android / libstagefright 1 2
iOS / ImageIO 1
Info-Zip unzip 1 2
libtasn1 1 2 ...
OpenBSD pfctl 1
NetBSD bpf 1
man & mandoc 1 2 3 4 5 ...
IDA Pro [reported by authors]
mutt 1
procmail 1
fontconfig 1
pdksh 1 2
Qt 1 2...
wavpack 1
redis / lua-cmsgpack 1
taglib 1 2 3
privoxy 1 2 3
perl 1 2 3 4 5 6 7...
libxmp radare2 1 2
SleuthKit 1
fwknop [reported by author]
X.Org 1 2
57
exifprobe 1
jhead [?]
capnproto 1
Xerces-C 1 2 3
metacam 1
djvulibre 1
Moral of the story
• If you ever produce code that handles some non-trivial input
format, run a tool like afl to look for bugs
58
Conclusions
• Fuzzing is great technique to find (a certain kind of) security flaws!
• If you ever write or deploy C(++) code, you should fuzz it.
• The bottleneck: how to do smart fuzzing without too much effort
Successful approaches include
– White-box fuzzing based on symbolic execution with SAGE
– Evolutionary mutation-based fuzzing with afl
• A newer generation of tools not only tries to find security flaws,
but also to then build exploits, eg. angr
To read (see links on the course page)
• David Wheeler, The Apple goto fail vulnerability: lessons learned
• Patrice Godefroid et al., SAGE: whitebox fuzzing for security testing
• Mathias Payer, The Fuzzing Hype-Train: How Random Testing Triggers
Thousands of Crashes
59