0% found this document useful (0 votes)
8 views

4_Fuzzing

Uploaded by

legogos967
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

4_Fuzzing

Uploaded by

legogos967
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 57

Software Security

Security Testing
especially

Fuzzing

Erik Poll
Security in the SDLC
Last week: static analysis aka code review tools (SAST)

This week: security testing (DAST)

Security testing can be used to find many kinds of security flaws.

Focus of this lecture – and group assignment – is on testing C(++) code for
memory corruption

2
Fuzzing group project
• Form a team with 4 students

• Choose an open-source C(++) application that can take input


from the command line in some complex file format
– For instance, any graphics library for image manipulation

– Check if this application is mentioned on


http://lcamtuf.coredump.cx/ - if so you may want to test old version

• Try out the fuzzing tools (Radamsa, zuff, and afl) with/without
instrumentation for additional checks on memory safety
(valgrind, ASan)
• Optional variations: report any bugs found, check against known CVEs,
test older vs newer release, try different settings or inputs for the tool,
try another fuzzing tool, …

3
Overview
• Testing basics
• Abuse cases & negative tests
• Fuzzing
– Dumb fuzzing
– Mutational Fuzzing
• example: OCPP
– Generational aka grammar-based fuzzing
• example: GSM
– Whitebox fuzzing with SAGE
• looking at symbolic execution of the code
– Evolutionary fuzzing with afl
• grey-box, observing execution of the (instrumented) code

4
Testing basics

5
SUT, test suite & test oracle
To test a SUT (System Under Test) we need two things

1. test suite, ie. collection of input data

2. test oracle

that decides if a test was passed ok or reveals an error


- ie. some way to decide if the SUT behaves as we
want
Both defining test suites and test oracles can be a lot of work!
• In the worst case, a test oracle is a long list which for every
individual test case, specifies exactly what should happen
• A simple test oracle: just looking if application doesn’t crash
Moral of the story: crashes are good ! (for testing)

6
Code coverage criteria
Code coverage criteria to measure how good a test suite is include

• statement coverage
• branch coverage
Statement coverage does not imply branch coverage; eg for
void f (int x, y) { if (x>0) {y++};
y--; }

Statement coverage needs 1 test case, branch coverage needs 2

• More complex coverage criteria exists, eg MCDC (Modified


condition/decision coverage), commonly used in avionics

–How many of you are taking Jan Tretmans Testing Techniques


course?
Possible perverse effect of coverage criteria
High coverage criteria may discourage defensive programming,
eg.
void m(File f){
if <security_check_fails> {log (...);
throw (SecurityException);}
try { <the main part of the method> }
catch (SomeException) { log(...);
<some corrective action>;

throw (SecurityException); }
}

If the green defensive code, ie. the if- & catch-branches, is hard to
trigger in test, programmers may be tempted (or forced?) to
remove this code to improve test coverage...
8
Abuse cases
&
Negative testing

10
Testing for functionality vs testing for security
• Normal testing will look at right, wanted behaviour for sensible
inputs (aka the happy flow), and some inputs on borderline
conditions

• Security testing also requires looking for the wrong, unwanted


behaviour for really strange inputs

• Similarly, normal use of a system is more likely to reveal


functional problems than security problems:

– users will complain about functional problems,


hackers won't complain about security
problems

11
Security testing is HARD
space of all possible inputs

. some input
normal
. input that triggers . . . .. inputs
security bug .

12
Abuse cases & negative test cases
• Thinking about abuse cases is a useful way to come up with
security tests

– what would an attacker try to do?


– where could an implementation slip up?

• This gives rise to negative test cases,

i.e. test cases which are supposed to fail


as opposed to positive test cases, which are meant to succeed

13
Abuse cases – early in the SDCL

14
iOS goto fail SSL bug
...

if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)

goto fail;

if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)

goto fail;

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

goto fail;

goto fail;

if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

goto fail;

err = sslRawVerify(...);

. . .

15
Negative test cases for flawed certificate chains
• David Wheeler's 'The Apple goto fail vulnerability: lessons
learned' gives a good discussion of this bug & ways to prevent
it, incl. the need for negative test cases

http://www.dwheeler.com/essays/apple-goto-fail.html

• The FrankenCert test suite provides (broken) certificate chains to test


for flaws in the program logic for checking certificates.
[Brubaker et al, Using Frankencerts for Automated Adversarial Testing of Certificate
Validation in SSL/TLS Implementations, Oakland 2014]

• Code coverage requirements on the test suite would also have helped.

16
Fuzzing

17
The idea
Suppose some C(++) binary asks from some input
Please enter your username
>
What would you try?

1.ridiculous long input, say a few MB


If there is a buffer overflow, a long input is likely to trigger a SEG
FAULT

2.%x%x%x%x%x%x%x%x
To see if there is a format string vulnerability

3.Other malicious inputs, depending on back-ends, technologies or


APIs used : eg SQL, XML, …

Out of scope for the project assignment


18
Fuzzing
• Fuzzing aka fuzz testing is a highly effective, largely automated,
security testing technique

• Basic idea: (semi) automatically generate random inputs and


see if an application crashes
– So we are NOT testing functional correctness (compliance)

• The original form of fuzzing: generate very long inputs and see if
the system crashes with a segmentation fault.

19
Simple fuzzing ideas
What inputs would you use for fuzzing?

•very long or completely blank strings

•min/max values of integers, or simply zero and negative values


•depending on what you are fuzzing, include special values,
characters or keywords likely to trigger bugs, eg
– nulls, newlines, or end-of-file characters
– format string characters %s %x %n
– semi-colons, slashes and backslashes, quotes
– application specific keywords halt, DROP TABLES, ...
– ....

20
Pros & cons of fuzzing
Pros
•Very little effort:
– the test cases are automatically generated,
and test oracle is simply looking for
crashes
•Fuzzing of a C/C++ binary can quickly give a good picture of
robustness of the code
Cons
•Will not find all bugs
– For programs that take complex inputs, more work will be
needed to get good code coverage, and hit interesting test
cases. This has led to lots of work on 'smarter' fuzzers.
•Crashes may be hard to analyse; but a crash is a clear true positive
that something is wrong!
– unlike a complaint from a static analysis tool like PREfast
21
Improved crash/error detection
Making systems crash on errors is useful for fuzzing!

So when fuzzing C(++) code, the memory safety checks listed in


the SoK paper (discussed in week 2 & 3) can be deployed to make
crash in the event of memory corruptions more likely

– eg using tools like


• valgrind
• MemCheck
• AddressSanitizer (Asan)
– Ideally for both spatial bugs (buffer overruns)

& temporal bugs (malloc/free bugs)

22
Types of fuzzers
1) Mutation-based: apply random mutations to set of valid inputs
• Eg observe network traffic, than replay with some modifications
• More likely to produce interesting invalid inputs than just random input

2) Generation-based aka grammar-based aka model-based:


generate semi-well-formed inputs from scratch, based on knowledge of
file format or protocol
• with tailor-made fuzzer for a specific input format,
or a generic fuzzer configured with a grammar
• Downside?
more work to construct this fuzzer or grammar
3) Evolutionary: observe execution to try to learn which mutations are
interesting
• For example, afl, which uses a greybox approach

4) Whitebox approaches: analyse source code to construct inputs

• For example, SAGE


24
Example mutational fuzzing

25
Example: Fuzzing OCPP [research internship Ivar
Derksen]
• OCPP is a protocol for charge points to talk to
a back-end server

• OCPP can use XML or JSON messages

Example message in JSON format


{ "location": NijmegenMercator215672,

"retries": 5,

"retryInterval": 30,

"startTime": "2018-10-27T19:10:11",

"stopTime": "2018-10-27T22:10:11" }

26
Example: Fuzzing OCPP
Simple classification of messages into 1 malformed JSON
1.malformed JSON/XML

(eg missing quote, bracket or comma)


2.well-formed JSON/XML, but not legal OCPP
2 correct JSON

(eg with field names not in OCPP specs) 3 valid OCPP

3.well-formed OCPP

can be used for a simple test oracle:


•Malformed messages (type 1 & 2) should generate generic error response

•Well-formed messages (type 3) should not

•The application should never crash

Note: this does not require any understanding of the protocol semantics yet!
Figuring out correct responses to type 3 would.

27
Test results with fuzzing OCPP server
• Mutation fuzzer generated 26,400 variants from 22 example OCPP
messages in JSON format
• Problems spotted by this simple test oracle:
– 945 malformed JSON requests (type 1) resulted in malformed JSON
response
Server should never emit malformed JSN!
– 75 malformed JSON requests (type 1) and 40 malformed OCPP
requests (type 2) result in a valid OCPP response that is not an error
message.

Server should not process malformed requests!


• One root cause of problems: the Google’s gson library for parsing JSON
by default uses lenient mode rather than strict mode
– Why does gson even have a lenient mode, let alone by default?

• Fortunately, gson is written in Java, not C(++), so these flaws do not


result in exploitable buffer overflows
28
Postel’s Law aka Robustness Principle

“Be conservative in what you send,

be liberal in what you accept”

Named after Jon Postel, who wrote early version of TCP spec.

Is this A) good or B) bad?

•Good for getting interoperable implementations up & running 


•Bad for security, as lots of these implementations will have non-
standard behavior, deviating from the official specs, in corner
cases, which may lead to weird behaviour and bugs  
29
Generational fuzzing
aka
Grammar-based fuzzing

30
CVEs as inspiration for fuzzing file formats
• Microsoft Security Bulletin MS04-028
Buffer Overrun in JPEG Processing (GDI+) Could Allow Code
Execution
Impact of Vulnerability: Remote Code Execution
Maximum Severity Rating: Critical
Recommendation: Customers should apply the update immediately

Root cause: a zero sized comment field, without content


• CVE-2007-0243
Sun Java JRE GIF Image Processing Buffer Overflow
Vulnerability Critical: Highly critical Impact: System access
Where: From remote

Description: A vulnerability has been reported in Sun Java Runtime


Environment (JRE). … The vulnerability is caused due to an error when
processing GIF images and can be exploited to cause a heap-based
buffer overflow via a specially crafted GIF image with an image width
of 0. Successful exploitation allows execution of arbitrary code.
Note: a buffer overflow in (native library of) a memory-safe language

31
Generation- aka model-based fuzzing
For a given file format or communication protocol, a generational
fuzzer tries to generate files or data packets that are slightly
malformed or hit corner cases in the spec.

Possible starting :
a grammar defining legal inputs,
or a data format specification

Typical things to fuzz:

• many/all possible value for specific fields


esp undefined values, or values Reserved for Future Use (RFU)

• incorrect lengths, lengths that are zero, or payloads that are


too short/long

Tools for building such fuzzers:


SNOOZE, SPIKE, Peach, Sulley,
antiparser, Netzob, ...
32
Example: generation based fuzzing of GSM
[MSc theses of Brinio Hond and Arturo Cedillo Torres]

GSM is a extremely rich & complicated protocol

33
SMS message fields
Field size
Message Type Indicator 2 bit
Reject Duplicates 1 bit
Validity Period Format 2 bit
User Data Header Indicator 1 bit
Reply Path 1 bit
Message Reference integer
Destination Address 2-12 byte
Protocol Identifier 1 byte
Data Coding Scheme (CDS) 1 byte
Validity Period 1 byte/7 bytes
User Data Length (UDL) integer
User Data depends on CDS and UDL

34
Example: GSM protocol fuzzing
Lots of stuff to fuzz!

We can use a USRP

with open source cell tower software (OpenBTS)

to fuzz any phone

35
Example: GSM protocol fuzzing
Fuzzing SMS layer of GSM reveals weird functionality in GSM
standard and in phones

36
Example: GSM protocol fuzzing
Fuzzing SMS layer of GSM reveals weird functionality in GSM
standard and in phones

– eg possibility to receive faxes (!?) you have a fax!

Only way to get rid if this icon; reboot the phone


37
Example: GSM protocol fuzzing
Malformed SMS text messages showing raw memory contents,
rather than content of the text message

38
Our results with GSM fuzzing
• Lots of success to DoS phones: phones crash, disconnect from
the network, or stop accepting calls
– eg requiring reboot or battery removal to restart, to accept calls
again, or to remove weird icons
– after reboot, the network might redeliver the SMS message, if no
acknowledgement was sent before crashing, re-crashing phone
But: not all these SMS messages could be sent over real network
• There is surprisingly little correlation between problems and
phone brands & firmware versions
– how many implementations of the GSM stack did Nokia have?
• The scary part: what would happen if we fuzz base stations?

[Fabian van den Broek, Brinio Hond and Arturo Cedillo Torres, Security Testing
of GSM Implementations, Essos 2014]

[Mulliner et al., SMS of Death, USENIX 2011]


39
Security problem with more complex input formats

Example dangerous
SMS text message

• This message can be sent over the network

• Different characters sets or characters encoding, are a constant


source of problems. Many input formats rely on underlying notion of
characters.

40
Example: Fuzzing fonts
Google’s Project Zero found many Windows kernel vulnerabilities by
fuzzing fonts in the Windows kernel

‘https://googleprojectzero.blogspot.com/2017/04/notes-on-windows-uniscribe-fuzzing.html
41
Even handling simple input languages can go wrong!
Sending an extended length APDU can crash a contactless
payment terminal.

Found accidentally, without even trying to fuzz,


when sending legal (albeit non-standard) messages

[Jordi van den Breekel, A security evaluation and proof-of-concept relay attack
on Dutch EMV contactless transactions, MSc thesis, 2014]

42
Whitebox fuzzing with SAGE

43
Whitebox fuzzing using symbolic execution
• The central problem with fuzzing:
how can we generate inputs that trigger interesting code
executions?

– Eg fuzzing the procedure below is unlikely to hit the error


case
int foo(int x) {
y = x+3;
if (y==13) abort(); // error
}
• The idea behind whitebox fuzzing: if we know the code, then by
analysing the code we can find interesting input values to try.

• SAGE (Scalable Automated Guided Execution) is a tool from


Microsoft Research that uses symbolic execution of x86 binaries
to generate test cases.
44
m(int x,y){ Can you provide values for x and
y that will trigger execution of
x = x + y;
the two if-branches?
y = y – x;

if (2*y > 8) { ...

else if (3*x < 10){ ...

45
Symbolic execution
m(int x,y){ Suppose x = N and y = M.

x = x + y; x becomes N+M

y = y – x; y becomes M-(N+M) = -N

if (2*y > 8) { ... if-branch taken if -2N > 8, i.e. N < -4


Aka the path condition
}
else if (3*x < 10){ ... 2nd if-branch taken if
} } N ≥ -4 & 3(M+N) < 10

We can use SMT solver (Yikes, Z3, ...) aka constraint solver for this:
given a set of constraints such a tool produces test data that
meets them, or proves that they are not satisfiable.
This generates test data (i) automatically and (ii) with good
coverage
• These tools can also be used in static analyses as in PREfast, or more
generally, for program verification
46
Symbolic execution for test generation
• Symbolic execution can be used to automatically generate test
cases with good coverage

• Basic idea of symbolic execution:


instead of giving variables concrete values (say 42), variables
are given symbolic values (say α or N), and program is executed
with these symbolic values to see when certain program points
are reached
• Downside of symbolic execution?
– It is very expensive (in time & space)
– Things explode with loops
– You cannot pass symbolic variables as input to some APIs,
system calls, I/O peripherals, …
SAGE mitigates this by using a single concrete execution to
obtain symbolic constraints to generate many test inputs for
many execution paths
47
SAGE example
Example program

void top(char input[4]) {


int cnt = 0;
if (input[0] == 'b') cnt++;
if (input[1] == 'a') cnt++;
if (input[2] == 'd') cnt++;
if (input[3] == '!') cnt++;
if (cnt >= 3) crash();
}

What would be interesting test cases?


Do you think a fuzzer could find them?
How could you find them?

48
SAGE example
Example program

void top(char input[4]) {


int cnt = 0;
path contraints:

if (input[0] == 'b') cnt++; i0 ≠ 'b'

if (input[1] == 'a') cnt++;


i1 ≠ 'a'
if (input[2] == 'd') cnt++;
i2 ≠ 'd'
if (input[3] == '!') cnt++;
if (cnt >= 3) crash(); i3 ≠ '!'
}

SAGE executes the code for some concrete input, say 'good'
It then collects path constraints for an arbitrary symbolic input of
the form i0i1i2i3

49
Search space for interesting inputs
Based on this one execution, combining the 4 constraints founds
yields 24 = 16 test cases
i0 ≠ 'b' i0 = 'b'

i1 ≠ 'a' i1 = 'a' i1 ≠ 'a' i1 = 'a'

i2 ≠ 'd' i2 = 'd'

i3 ≠ '!' i3 = '!'

Note: the initial execution with the input ‘good’ was not very
interesting, but some of these others are

50
SAGE success
• SAGE proved successful at uncovering security bugs, eg

Microsoft Security Bulletin MS07-017 aka CVE-2007-0038: Critical


Vulnerabilities in GDI Could Allow Remote Code Execution
Stack-based buffer overflow in the animated cursor code in
Microsoft Windows 2000 SP4 through Vista allows remote attackers
to execute arbitrary code or cause a denial of service (persistent
reboot) via a large length value in the second (or later) anih block
of a RIFF .ANI, cur, or .ico file, which results in memory corruption
when processing cursors, animated cursors, and icons
Security vulnerablity in parsing ANI/cur/ico-formats. SAGE generated
(semi)well-formed input triggering the bug without knowing these
formats
• First experiments with SAGE also found bugs in handling a compressed
file format, media file formats, and generated 43 test cases to crash
Office 2007

51
Evolutionary Fuzzing with afl
(American Fuzzy Lop)

52
Evolutionary Fuzzing with afl
• Downside of generation-based fuzzing:

– lots of work work to write code to do the fuzzing, even if you use
tools to generate this code based on some grammar
• Downside of mutation-based fuzzing:
– chance that random changes in inputs hits interesting cases is
small
• afl (American Fuzzy Lop) takes an evolutionary approach to learn
interesting mutations based on measuring code coverage

– basic idea: if a mutation of the input triggers a new execution


path through the code, then it is an interesting mutation & it is
kept; if not, the mutation is discarded.
– by trying random mutations of the input and observering their
effect on code coverage, afl can learn what interesting inputs
are
53
afl [http://lcamtuf.coredump.cx/afl]
• Supports programs written in C/C++/Objective C and variants for
Python/Go/Rust/OCaml
• Code instrumented to observe execution paths:
– if source code is available, by using modified compiler
– if source code is not available, by running code in an
emulator
• Code coverage represented as a 64KB bitmap, where control
flow jumps are mapped to changes in this bitmap
– different executions could result in same bitmap, but chance is small

• Mutation strategies include: bit flips, incrementing/decrementing


integers, using pre-defined interesting integer values (eg. 0, -1,
MAX_INT,....), deleting/combining/zeroing input blocks, ...

• The fuzzer forks the SUT to speed up the fuzzing

• Big win: no need to specify the input format!


54
afl’s instrumentation of compiled code
Code is injected at every branch point in the code
cur_location = <SOME_RANDOM_NUMBER_FOR_THIS_CODE_BLOCK>;

shared_mem[cur_location ^ prev_location]++;

prev_location = cur_location >> 1;

where shared_mem is a 64 KB memory region

Intuition: for every jump from src to dest in the code a different
byte in shared_mem is changed.

This byte is determined by the compile time randoms inserted


at source and destination of every jumps.

55
Cool example: learning the JPG file format
• Fuzzing a program that expects a JPG as input, starting with
'hello world' as initial test input, afl can learn to produce legal
JPG files
– along the way producing/discovering error messages such as
• Not a JPEG file: starts with 0x68 0x65
• Not a JPEG file: starts with 0xff 0x65
• Premature end of JPEG file
• Invalid JPEG file structure: two SOI markers
• Quantization table 0x0e was not defined

and then JPGs like

[Source http://lcamtuf.blogspot.nl/2014/11/pulling-jpegs-out-of-thin-air.html]
56
Vulnerabilities found with afl
IJG jpeg 1
libjpeg-turbo 1 2
libpng 1

libtiff 1 2 3 4 5
mozjpeg 1
PHP 1 2 3 4 5

Mozilla Firefox 1 2 3 4
Internet Explorer 1 2 3 4
Apple Safari 1

Adobe Flash / PCRE 1 2 3 4


sqlite 1 2 3 4...
OpenSSL 1 2 3 4 5 6 7

LibreOffice 1 2 3 4
poppler 1
freetype 1 2

GnuTLS 1
GnuPG 1 2 3 4
OpenSSH 1 2 3

PuTTY 1 2
ntpd 1 2
nginx 1 2 3

bash (post-Shellshock) 1 2
tcpdump 1 2 3 4 5 6 7 8 9
JavaScriptCore 1 2 3 4

pdfium 1 2
ffmpeg 1 2 3 4 5
libmatroska 1

BIND 1 2 3 ...
QEMU 1 2
lcms 1

Oracle BerkeleyDB 1 2
Android / libstagefright 1 2
iOS / ImageIO 1

FLAC audio library 1 2


libsndfile 1 2 3 4
less / lesspipe 1 2 3

strings (+ related tools) 1 2 3 4 5 6 7


file 1 2 3 4
dpkg 1 2

Info-Zip unzip 1 2
libtasn1 1 2 ...
OpenBSD pfctl 1

NetBSD bpf 1
man & mandoc 1 2 3 4 5 ...
IDA Pro [reported by authors]

clang / llvm 1 2 3 4 5 6 7 8 ...


nasm 1 2
ctags 1

mutt 1
procmail 1
fontconfig 1

pdksh 1 2
Qt 1 2...
wavpack 1

redis / lua-cmsgpack 1
taglib 1 2 3
privoxy 1 2 3

perl 1 2 3 4 5 6 7...
libxmp radare2 1 2

SleuthKit 1
fwknop [reported by author]
X.Org 1 2
57
exifprobe 1
jhead [?]
capnproto 1

Xerces-C 1 2 3
metacam 1
djvulibre 1
Moral of the story
• If you ever produce code that handles some non-trivial input
format, run a tool like afl to look for bugs

58
Conclusions
• Fuzzing is great technique to find (a certain kind of) security flaws!
• If you ever write or deploy C(++) code, you should fuzz it.
• The bottleneck: how to do smart fuzzing without too much effort
Successful approaches include
– White-box fuzzing based on symbolic execution with SAGE
– Evolutionary mutation-based fuzzing with afl
• A newer generation of tools not only tries to find security flaws,
but also to then build exploits, eg. angr
To read (see links on the course page)
• David Wheeler, The Apple goto fail vulnerability: lessons learned
• Patrice Godefroid et al., SAGE: whitebox fuzzing for security testing
• Mathias Payer, The Fuzzing Hype-Train: How Random Testing Triggers
Thousands of Crashes
59

You might also like