0% found this document useful (0 votes)
44 views

CryptDB Mechanism On Graph Databases Nahla Naser Aburawi University of Liverpool

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

CryptDB Mechanism On Graph Databases Nahla Naser Aburawi University of Liverpool

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

CryptDB Mechanism on Graph Databases

Thesis submitted in accordance with the requirements of the University of


Liverpool for the degree of Doctor in Philosophy by

Nahla Naser Aburawi

March 2020
Dedication

To the memory of my Father Naser Aburawi.

i
Glossary

Asymmetric encryption A form of cryptosystem in which encryption and de-


cryption are performed using two different keys.
Ciphertext The output of an encryption algorithm.
Cryptography The branch of cryptology dealing with the design of
algorithms for encryption and decryption.
Cypher A declarative graph query language that allows for
expressive and efficient querying a property graph.
Decryption The translation of encrypted data into original data.
Encryption The conversion of plaintext or data into unintelligible
form based on a translation algorithm.
Hash function A function that maps a variable-length data block or
message into a fixed-length value called a hash code.
Initialization vector A random block of data that is used to begin the
encryption of multiple blocks of plaintext.
Message Digest Hash function.
Multiple encryption Repeated use of an encryption function, with different
keys.
Neo4j A graph database management system developed by
Neo4j, Inc.
Plaintext The input to an encryption function or the output of
a decryption function.
Secret key The key used in a symmetric encryption system.
Symmetric Encryption A form of cryptosystem in which encryption and de-
cryption are performed using the same key.

iii
List of Abbreviations

ACID Atomicity, Consistency, Isolation, and Durability


AES Advanced Encryption Standard
API Application Programming Interface
CBC Cipher Block Chaining
CSV Comma-Separated Values
DBA Database Administrator
DBMS Database Management System
DES Data Encryption Standard
DET Deterministic
DSA Digital Signature Algorithm
ECB Electronic Codebook
HOM Homomorphic Encryption
IV Initialization Vector
INT Integer
JCA Java Cryptography Architecture
JDBC Java Database Connectivity
MS Millisecond
MK Master Key
OKM Onion Key Manager
OPE Order-Preserving Encryption
OPE-JOIN Order-Preserving Encryption-JOIN

v
Nahla Naser Aburawi CryptDB Mechanism on Graph Databases

PRP Pseudo-Random Permutation


QRE Query Rewriter Encryptor
RC4 Rivest Cipher 4
RD Result Decryptor
RND Random Encryption
RSA Rivest-Shamir-Adleman
SHA Secure Hash Algorithm
SQL Structured Query Language
TAEA Traversal-Aware Encryption Adjustment
UDF User Defined Functions

vi
Acknowledgements

All praise is due to Allah, the Most Merciful and the Most Gracious, who gave me
the strength and power to overcome all the challenges that came to me during my
Ph.D. study.
I would like to express my sincere gratitude to my primary supervisor Dr. Alexei
Lisitsa for the continuous support of my Ph.D. study and related research, for
his patience, motivation, constant patience and encouragement throughout this
journey. His guidance helped me in all the time of research and writing of this
thesis. He is always making time for discussions that have made the completion
of my Ph.D. possible. I could not have imagined having a better supervisor and
mentor for my Ph.D. study.
I would also like to express my gratitude to my second supervisor Prof. Frans
Coenen, for his suggestions, assistance, and valuable comments. I would like to
extend my profound gratitude to both advisors: Pro. Sven Schewe and Dr. Andre
Hernich for providing me with constructive feedback and suggestions at various
times. All staff members and colleagues at the computer science department at
the University of Liverpool, all were helpful whenever necessary. My gratitude also
extends to the Libyan government/Ministry of Higher Education for awarding me
this opportunity so that my dream of getting a Ph.D. can come true.
My sincere thanks to my husband and my children for their prayers, support,
trust, and encouragement and I am profoundly sorry about what you experienced
while I was busy. My deepest thanks also go to my mother, brothers, and sister for
supporting me spiritually throughout writing this thesis and my life in general.

viii
Nahla Naser Aburawi CryptDB Mechanism on Graph Databases

Abstract

The work presented in this thesis is concerned with the database security as-
pects. In particular, we address the problem of querying encrypted data in graph
databases. The thesis considers the most popular databases security methods
from the literature: (i) multi-layered encryption and (ii) encryption adjustment.
The encryption is one of the effective ways to protect sensitive data in a database
from various attacks. Querying encrypted data includes two challenges. Either the
data should be decrypted before the querying, leaving it vulnerable to server-side
attacks, or one has to apply computationally expensive methods for querying en-
crypted data. The approach presented in this thesis is inspired by CryptDB system
for relational databases (R. A. Popa et al, 2010). Before processing a graph query
is translated into an encrypted form which then executed on a server without full
decryption of the data; the encrypted results are sent back to a client where they
are finally decrypted. In this way, data privacy is protected at the server-side. A
flexible mechanism of executing the queries over encrypted graph databases called
CryptGraphDB was proposed in this thesis. It utilizes multi-layered encryption
and encryption adjustment in order to provide a reasonable trade-off between
data security protection and data processing efficiency. The thesis presents the
design principles, the prototype implementation and reports the empirical data ob-
tained by the experimentation with the prototype. The prototype was implemented
for Neo4j graph DBMS and Cypher query language in two different versions (i)
utilizing Java API and (ii) using user-defined functions UDFs. The efficiency of
query execution for various types of queries on encrypted and non-encrypted
Neo4j graph databases are reported. In the context of CryptGraphDB approach
proposed in this thesis two encryption adjustment policies have been considered:
simple adjustment and traversal-aware adjustment. The first policy assumes that
all encryption levels adjustment are performed statically before a query execution,
while in the second the encryption levels are updated dynamically. We show that
by dynamically adjusting encryption layers as query execution progresses, we can
correctly execute the query on the encrypted graph store revealing less information
to the adversary than in the case of static adjustment done prior to execution.

x
Contents

Dedication i

Glossary iii

List of Abbreviations v

Acknowledgements viii

Abstract x

Contents xiv

List of Figures xv

List of Tables xvii

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background and Related Work 8


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 An Introduction to Cryptography . . . . . . . . . . . . . . . . . . . . 8
2.3 CryptDB System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Design of CryptDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 SQL-aware Encryption Strategy . . . . . . . . . . . . . . . . . 13
2.4.2 Adjustable Query-based Encryption . . . . . . . . . . . . . . . 14
2.4.3 Query Execution for Read Queries . . . . . . . . . . . . . . . . 16
2.4.4 Query Execution for Write Queries . . . . . . . . . . . . . . . 17

xii
CryptDB Mechanism on Graph Databases Nahla Naser Aburawi

2.4.5 Computing Joins . . . . . . . . . . . . . . . . . . . . . . . . . . 17


2.5 Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 neo4j database . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 Cypher Language . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Other Querying Encrypted Database Technologies . . . . . . . . . . 23
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Querying Encrypted Graph Databases 25


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Proposed Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Cypher-aware encryption strategy . . . . . . . . . . . . . . . . 27
3.2.2 Adjustable query-based encryption . . . . . . . . . . . . . . . 27
3.2.3 Onion Layers of Encryption . . . . . . . . . . . . . . . . . . . . 28
3.2.4 Join Operator in Relational CryptDB . . . . . . . . . . . . . . 29
3.2.5 Information leaks in Relational CryptDB . . . . . . . . . . . . 30
3.2.6 CryptGraphDB principles . . . . . . . . . . . . . . . . . . . . . 32
3.2.7 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Implemented Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.3 Results/ Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Traversal-Aware Encryption Adjustment for Graph Databases 42


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Encryption Layers and Simple Adjustment . . . . . . . . . . . . . . . 43
4.3 Traversal-Aware Encryption Adjustment TAEA . . . . . . . . . . . . 45
4.3.1 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Towards Implementation of TAEA 54


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Implementation of Traversal-Aware Encryption Adjustment . . . . . 55
5.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Experiments and Performance Analysis . . . . . . . . . . . . . . . . 65
5.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.2 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

xiii
Nahla Naser Aburawi CryptDB Mechanism on Graph Databases

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Implementation of the TAEA based on extended databases 75


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Experiments and Performance Analysis . . . . . . . . . . . . . . . . 76
6.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.2 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Conclusions and Future Works 85


7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Main Findings and Contributions . . . . . . . . . . . . . . . . . . . . 86
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

References 91

xiv
List of Figures

2.1 CryptDB architecture (from [34]) . . . . . . . . . . . . . . . . . . . . . 11


2.2 CryptDB system overview (from [36]) . . . . . . . . . . . . . . . . . . 12
2.3 Onion layers of encryption and the classes of computation they allow.
(from [34, 36]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 The connections [relationships] between different Employee (nodes). 19

3.1 The typical query flow in Graph CryptDB (adapted from [34]) . . . . 26
3.2 Onion layers of encryption and the groups of computation they per-
form (adapted from [36]). . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 The CryptGraphDB Design. . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Three nodes schema at server-side (part a) along with the correspond-
ing plaintexts and data under CryptGraphDB (part b) along with the
corresponding ciphertexts. Ciphertexts shown are not full-length. . 34
3.5 Querying Non-Encrypted and Encrypted Graph Database with Java
Interface and Neo4j Interface. . . . . . . . . . . . . . . . . . . . . . . 40

4.1 Data layout schema at the server of the graph database. . . . . . . 47


4.2 Encryption at the RND layer. Ciphertexts shown are not full-length. 49
4.3 Encryption at the DET layer. Ciphertexts shown are not full-length. 49
4.4 Adjustment using traversal-aware encryption adjustment. Cipher-
texts shown are not full-length. . . . . . . . . . . . . . . . . . . . . . 50
4.5 Comparison of applying the original CryptGraphDB adjustment with
traversal-aware encryption adjustment. Ciphertexts shown are not
full-length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1 Example data layout schema at the server of the graph database. . 60

6.1 Comparison between the proposed CryptGraphDB using simple ad-


justment and TAEA on different databases sizes. . . . . . . . . . . . 83

xv
List of Tables

2.1 The onion layers and operations they provide. . . . . . . . . . . . . . 15


2.2 Data layout at the server. When the front end creates a table with
the schema on the left, the table created at the DBMS server is the
one shown on the right (adopted from [36]) . . . . . . . . . . . . . . 16

3.1 Data layout at the server when the frontend creates encrypted tables
using the schema at the top; the tables created at the server are
given at the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 The structural queries on non-encrypted and encrypted databases
with Java interface and Neo4j interfaces (time in millisecond). . . . 39
3.3 The data queries on non-encrypted and encrypted databases with
Java interface and Neo4j interfaces (time in millisecond). . . . . . . 40

4.1 Data layout at the server (left part), where the application table
created at the server (middle and left parts). . . . . . . . . . . . . . . 44

5.1 Plain text data, encryption at the RND layer and encryption at the
DET layer (Ciphertexts shown are not full-length). . . . . . . . . . . 62
5.2 A different graph database sizes. . . . . . . . . . . . . . . . . . . . . 66
5.3 Retrieval times of queries (Q1 − Q5 ) over non-encrypted data using
different graph database sizes, queries (Q01 − Q05 ) over encrypted
databases using simple encryption adjustment, and queries (Q001 − Q005 )
over encrypted databases using traversal-aware encryption adjust-
ment, (time in millisecond). . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4 Retrieval times of encryption adjustment using simple adjustment
and traversal-aware encryption adjustment (time in millisecond). . 72

xvi
CryptDB Mechanism on Graph Databases Nahla Naser Aburawi

6.1 Retrieval times of queries (Q1 − Q5 ) over non-encrypted databases


(upper part), queries (Q01 − Q05 ) over encrypted databases using simple
encryption adjustment (middle part), and queries (Q001 − Q005 ) over
encrypted databases using traversal-aware encryption adjustment
(lower part), (time in millisecond). . . . . . . . . . . . . . . . . . . . . 82

xvii
Chapter 1

Introduction

1.1 Overview

Database security attracts considerable interest due to the increasingly used of


the database in organizations of different sizes ranging from large commercial
originations to small individual businesses. Along with this growing use, different
threats emerging from unauthorized access and/or modification of sensitive data
stored on the database. One possible way to copes with this hazardous situation
is by applying appropriate encryption algorithms. In this thesis, we consider
the security of database paying particular attention to the using of encryption
methods for both the whole data stored in the database and on the way of its
access. By the way of its access, we mean how to process queries that enable to
provide the required data to users.
With the advent of new technologies, different types of database have been
evolved. The most known ones are the relational database and non-relational (also
called graph) database. The relational database stores data in tables based on
a relational model. This model organizes data into one or more column and row
tables, with each row being identified with a unique key. Rows are also referred
to as tuples or records. Also called columns as attributes [38]. While the graph
database, as its name suggests, stores data in graph based on data model that
represents data using a graphical structure which consists of nodes and edges
[39]. In addition, with relational database, the data is dealt with (either searching
or storing) according to records instead of paths in case of the graphical database.

1
2 Nahla Naser Aburawi

In this thesis, we consider the graph database as it is faster on traversal queries


due to its graphical structure that facilitates the querying process.
Many systems have been proposed to tackle the security issue of databases. For
instance, the CryptDB system, developed by [34], provides a powerful and elegant
mechanism for security protection of relational database against server threats.
Typically, CryptDB works as a proxy between a user and intended database to
encrypt or decrypt data depending on queries provided by this user. CryptDB
stipulates that the encrypted data stored in database never need to be decrypted at
the server side, not even during the execution of the queries. In this way, neither
a curious database administrator, nor an attacker having full access to the server
can learn sensitive data. Furthermore, CryptDB utilizes multi-layered encryption
and decryption adjustment, in order to provide a reasonable trade-off between
data security protection and data processing efficiency.
Notice that, the CryptDB uses several layers of encryption organized into en-
cryption onions [36], such that each layer comes with a certain functionality.
Multi-layered encryption allows to control, to some extent, the release of infor-
mation about data elements required for a query execution. The highest level of
protection can be achieved by the application of random layer of encryption (RND),
meaning that even equal elements become different after encryption. However, if
the query execution over encrypted data requires equality checks this cannot be
done at random layer of encryption. In this case the encryption level should be
adjusted prior to the query execution to the deterministic layer (DET) which allows
for equality checks, but reveal no more information.
The work presented in this thesis is concerned with the development of CryptDB-
like mechanism for graph databases. The application of CryptDB principles
in the context of graph databases brings challenges with the encryption layer
adjustment similar to that of the original CryptDB. The fundamental characteristic
of encryption adjustment is a selection of an appropriate encryption layer that
reveals an information about an encrypted data that is needed for executing the
query, but it does not reveal more information about the data than required.
For example, for relational databases, it has been noticed in [34] that there is a
possibility to leak some unnecessary information (cross-column equalities) when
applying DET layer during the execution of Join operator. A new Join-aware
encryption scheme has been proposed to solve this issue [34]. In graph databases
Chapter 1. Introduction 3

no need to perform join [39] and it is absent in the Cypher query language [19].
Nevertheless, as it was noticed in [4] the issue of unnecessary leaks remains for
graph databases as well.
The rest of this introductory chapter is organised as follows. The research
question and associated research issues are then discussed in Section 1.2. Section
1.3 lists the contributions of the thesis. Section 1.4 presents details of published
work produced as a result of the presented research, followed in Section 1.5 with
an outline of the structure of the remainder of this thesis.

1.2 Research Question


Given the above motivation the work presented in this thesis is therefore directed
at taking CryptDB principles as they implemented for relational databases and
apply them on graph database, we first elaborate requirements for CryptGraphDB
component for Cypher query languages and Neo4j database. The overriding
research question is thus:

"How to query encrypted graph database? Further, more detailed question: How
do we encrypt graph database and apply encrypted queries on it using
CryptGraphDB approach?"

The term CryptGraphDB used here refers to the prototype system that we have
developed. CryptGraphDB works by allowing the graph DBMS server to execute
Cypher queries on encrypted data almost as if it was executing the same queries
on plain-text data.
To provide an answer to the above research question the following research sub-
questions need to be considered:

1. What are the features of CryptDB that can best be adopted to be implemented
on graph databases?

2. What is the effect on adjusting the encryption layers dynamically, by using


traversal-aware encryption adjustment scheme?

3. Given a set of the Cypher queries how should these queries implemented
over encrypted graph databases by using the proposed approach?
4 Nahla Naser Aburawi

1.3 Research Contributions

The main research contribution of the thesis is developing the principles and
the design of CryptGraphDB system for querying encrypted graph databases.
The system ensures protecting the data confidentiality even against attackers
who can access to all server data. In order to protect data confidentiality, data
encryption was used. In order to provide a reasonable trade-off between data
security protection and data processing efficiency, one of the most challenging and
important aspects of data encryption is how to query the encrypted data, avoiding
full data decryption.
Additionally, using multi-layered encryption and appropriate encryption ad-
justment procedures is the key to finding the right balance between security and
query execution performance. Moreover, the graph query is translated into an
encrypted form before processing. The encryption layers of the data are adjusted
accordingly at the server side. Subsequently, the query is executed on a server
and the encrypted results are sent back to a user where they are finally decrypted.
In this thesis, a novel encryption adjustment scheme called traversal-aware
encryption adjustment for graph database was used. The proposed approach
reveals only the information required to execute the query. The scheme is dynamic
and the encryption layer adjustment happens not before the query execution, but
rather it gradually progresses alongside the execution.
The contributions are presented in the thesis as outlined below.

1. Querying Encrypted Graph Databases (Chapter 3)

2. Traversal-Aware Encryption Adjustment for Graph Databases (Chapter 4)

3. Towards Implementation of TAEA (chapter 5)

4. Implementation of the TAEA based on extended databases (Chapter 6)

As demonstrated later in the thesis, the proposed CryptGraphDB approach


outperformed the known CryptDB system (from the literature) in some cases. Note
also that in the work in this thesis, only the variants of symmetric encryption as,
opposed to asymmetric (public key) encryption are used.
Chapter 1. Introduction 5

1.4 Publications
Three peer reviewed publications have arisen out of the work presented in this
thesis. These are listed below together with a brief description of each.

1. Aburawi Nahla, Lisitsa Alexei and Coenen Frans (2018). "Querying en-
crypted graph databases", In Proceedings of 4th International Confer-
ence on Information Systems Security and Privacy (ICISSP 2018),
SCITEPRESS-Science and Technology Publications, pp. 447-451. This
paper proposed the first approach to query encrypted data in a graph
database presented in this thesis, the main concept of this approach is
that before processing a graph query is translated into encrypted form which
then executed on a server without decrypting any data; the encrypted results
are sent back to a client where they are finally decrypted. In this way data
privacy is protected at the server side. Also the design of the system and
empirical data obtained by experimentation with a prototype was presented,
implemented for Neo4j graph DBMS and Cypher query language, utilizing
Java API. The efficiency of query execution for various types of queries on
encrypted and non-encrypted Neo4j graph databases was reported. The
content of this paper is included in Chapter 3.

2. Aburawi Nahla, Coenen Frans and Lisitsa Alexei (2018). "Traversal-


aware Encryption Adjustment for Graph Databases", In Proceedings of
7th International Conference on Data Science, Technology and Applica-
tions (DATA 2018), SCITEPRESS-Science and Technology Publications,
pp. 381-387. This paper proposed a traversal-aware encryption adjustment
TAEA approach when data processing methods allowing to query encrypted
data in graph databases. In order to provide a sensible trade-off between data
security protection and data processing efficiency, TAEA utilizes multi-layered
encryption and encryption adjustment that allows controlling to some extent
the release of information about data elements required for query execution.
We show that by dynamically adjusting encryption layers as query execution
progresses, we can correctly execute the query on the encrypted graph store
revealing less information to the adversary than in the case of static adjust-
ment done prior to execution. The content of this paper is covered in the
6 Nahla Naser Aburawi

context of Chapter 4.

3. Aburawi Nahla, Coenen Frans and Lisitsa Alexei (2020). "Querying En-
crypted Data in Graph Databases". In: Khalaf M., Al-Jumeily D., Lisitsa
A. (eds) Applied Computing to Support Industry: Innovation and Tech-
nology. (ACRIT 2019). Communications in Computer and Information
Science, vol 1174, pp. 367-382. Springer, Cham. This paper proposed
the encryption as an effective way to protect the sensitive data in a database
from various attacks. Querying encrypted data, however, becomes a chal-
lenge. Either the data should be decrypted before the querying, leaving it
vulnerable for the server side attacks, or one has to apply computationally
expensive methods for querying encrypted data. A flexible mechanism for
the execution of queries over encrypted graph databases was presented in
the paper. The data privacy is protected at the server side, through the use
of multi-layered encryption and encryption layers adjustment done dynam-
ically during the execution of queries. The proposed scheme reveals less
information to the adversary than in the case of static adjustment done prior
to execution. The implementation of the scheme done for a subset of Cypher
graph queries (graph traversal queries) running on Neo4j graph database.
The experimental results show the efficiency of query execution for various
types of queries on encrypted data in graph database stores. The content of
this paper is covered in Chapter 5.

1.5 Structure of Thesis

The rest of this thesis is organized as follows:

• Chapter 2: Presents a review of the relevant literature, An introduction of


cryptography operations depends on the number of keys that used. The
chapter also provides a review of CryptDB system and graph databases.
Lastly some technologies of querying an encrypted database was presented

• Chapter 3: Provides a detailed review of the proposed approach CryptGraphDB


that allow to query encrypted graph databases.
Chapter 1. Introduction 7

• Chapter 4: Presents traversal-aware encryption adjustment TAEA for graph


databases. And also give a comparisons between the TAEA approach and
simple encryption adjustment in the original CryptGraphDB.

• Chapter 5: Considers the implementation of traversal-aware encryption


adjustment. An evaluation was conducted using different types of Cypher
query.

• Chapter 6: Presents an evaluation of using traversal-aware encryption ad-


justment on extended datasets.

• Chapter 7: The concluding chapter of the thesis which presents a summary


of the research presented, the main research findings and some suggested
directions for the future work.
Chapter 2

Background and Related Work

2.1 Introduction

This chapter presents a review of the background and related work with respect to
the work described in this thesis. As mentioned in Chapter 1, the work presented
in this thesis is directed at exploring and proposing an approach to querying
encrypted data in graph databases. The chapter commences, Section 2.2, with a
review of the background of Cryptography. Section 2.3 then presents a review of
the background of CryptDB system. The chapter then continues, in Section 2.4,
with a more detailed look at CryptDB design and how does it work. This is then
followed by Section 2.5 which considers graph Databases (Neo4j) and the query
language (Cypher) that used. Section 2.6 then analyses other querying encrypted
database technologies. The chapter then ends with a summary in Section 2.7

2.2 An Introduction to Cryptography

This section presents an overview of Cryptography. Cryptography is a method of


protecting information by using mathematical functions so that only authorized
people can read and process it. Many applications use cryptography, for example
online banking, credit card, and computer passwords. Typically, cryptography
involves preforming one of the following operations: encryption, decryption, and
hashing. The encryption is related to an operation of converting original text, called
plaintext, into obscured text, called ciphertext. While the decryption operation

8
Chapter 2. Background and Related Work 9

is achieved by changing back the ciphertext into the original one. These two
operations use also a parameter known as a key, that enables to conduct them.
Differently from these two operations, a hashing does not imply recovering the
original text from the cipher one.
Depending on the number of keys that are employed for cryptographic opera-
tions, the following techniques are defined:

Symmetric Encryption. It is a type of encryption where both the sender and


receiver share one key, known as a secret key [42]. This key is used for
both encryption and decryption and it is secretly kept between its owners.
In general, the sender uses it to encrypt plain text and send the obtained
ciphertext to the receiver, while the receiver applies it to decrypt the received
text to extract the plaintext. The symmetric encryption can operate in one of
the following two modes: stream mode or block mode. In case of the stream
mode is used, then one digit of data is encrypted/decrypted at a time, while
a fixed size block of a data is encrypted/decrypted when the block mode
is used. There are different types of block mode, such as Electronic Code
Book (ECB) mode and Cipher Block Chaining (CBC) mode. In the ECB mode,
each block of data is encrypted separately with a key, whereas in the CBC
mode, the encryption of each block is based on the key and the obtained
ciphertext of the previous block. In addition, there is another direction to
improve the CBC mode by using a vector called Initialization Vector (IV). It
is a block of bits that is generated arbitrarily to randomize the encryption.
In CBC mode with IV, the encryption is sequential, each plaintext block is
XORing with the previous ciphertext block before encryption. To give an
illustration of this, each block of ciphertext depends on all blocks of plaintext
build up to that point. An initialization vector must be used in the first
block to make each message unique. The well-known algorithms that use
the symmetric encryption are Advanced Encryption Standard (AES), Data
Encryption Standard (DES), and Rivest Cipher 4 (RC4).

Asymmetric Encryption. It is also known as public key cryptography because


there is no secret key shared between the owners [47]. This encryption in-
volves a pair of keys: a public key which is known to each participant in
public, and a private key that is only known to its participants. Usually, the
10 Nahla Naser Aburawi

public key is used for encryption and the private key is used for decryption.
However, some public key techniques allow to use the private key for encryp-
tion and the public key for decryption. The common public cryptographic
algorithms are Rivest-Shamir-Adleman (RSA) and Digital Signature Algorithm
(DSA).

Hash function. It is a mathematical algorithm that maps data of arbitrary size


to a fixed size. This algorithm does not use any key [21]. A hash function
generates values called hash values or digests. The values are used to
index a table called a hash table with a fixed size. Using a hash function is
called hashing to index a hash table. The most known hashing algorithms
are Secure Hash Algorithms (SHA-1, SHA-256, etc.), and Message Digest 5
algorithm (MD5).

The work presented in this thesis uses the symmetric encryption technique, in
particular, the AES encryption algorithm using ECB and CBC modes.

2.3 CryptDB System

This section presents an overview of CryptDB System [34]. CryptDB is a system


that protects the privacy and the confidentiality of the data in the database
by executing SQL queries over the encrypted database. CryptDB was designed
to address two threats. The first threat is a curious database administrator
(DBA) who like to know the sensitive data, CryptDB keeps the DBA from learning
private information. The second threat is the hacker of the database who tries to
discover private data by spying on the database management system (DBMS) server.
Because the entire data in the database is encrypted, the database administrator
cannot learn any private data. CryptDB was designed to overcome these risks
by allowing the DBMS server to implement SQL queries on encrypted data using
a collection of efficient SQL-aware encryption schemes. Figure 2.1 presents the
architecture of CryptDB.
Chapter 2. Background and Related Work 11

Figure 2.1: CryptDB architecture (from [34])

CryptDB has completely explored access control for SQL queries on encrypted
relational data. The CryptDB architecture supposes a proxy between the users
and the server. More details on this will be given in the next section. The basic idea
is that by selecting an appropriate encryption scheme, at the server side no need
to decrypt the data that stored in the database server in an encrypted form, not
even during the execution of the queries. One of the CryptDB advantages is that
no need to change the server software. All process is implemented by intercepting
users’ queries, rewrites them and passes to the server for execution. In [34, 36] The
issue of information leaks during encryption adjustment in relational databases
is discussed and special Join-aware encryption scheme to reduce the leaks is
proposed [35].

There are three novel concepts that CryptDB follows: (i) SQL-aware encryption
strategy that converts SQL operations to encryption schemes; (ii) adjustable query-
based encryption that adjusts the encryption level of each data value based on user
queries; and (iii) onion encryption to change data encryption levels in efficient way.
The main purpose of CryptDB is to guarantee the privacy of data in a relational
database (using SQL queries) to face any adversary that can access the server of
the database. It covers a set of motivating scenarios, such as protecting against
a curious database administrator (DBA), guarding against attackers that get the
access into the database server machine, and outsourcing a SQL database to the
cloud [17, 36]. As shown in Figure 2.2, the assumption of the application and
the CryptDB frontend are not compromised, and the adversary cannot obtain the
keys.
12 Nahla Naser Aburawi

Figure 2.2: CryptDB system overview (from [36])

2.4 Design of CryptDB

As mentioned in Section 2.3 the design of CryptDB architecture will be presented in


more details in this section. As discussed earlier that CryptDB works by letting the
DBMS server to execute SQL queries on encrypted data nearly as if these queries
were running on plain-text data [48]. In CryptDB, the query structure for the
encrypted case remains the same as in the original case. At the same time some
queries comparison operations, such as a summation or equality comparison, are
processed on ciphertexts, and in some cases use modified operators. CryptDB
only allows the server to execute queries that the users asked for, and performs
most extreme security given the mix of queries issued by the users. The database
server completely evaluates queries on encrypted data and sends the result back
to the customer for final decryption; client machines do not perform any query
processing and client-side applications run unchanged. CryptDB does not change
the innards of existing DBMS.
As shown in Figure 2.2, the architecture of CryptDB is consist of two parts:
a trusted client-side frontend, and an untrusted DBMS server. The task of the
frontend is to keep track of a secret master key MK, the schema of the database
without encryption, and the current level of onion encryption that exposed at the
server for each data item. However, the server holds track of the encrypted schema,
auxiliary tables used by CryptDB, and the user encrypted data. Referring back
Chapter 2. Background and Related Work 13

again to Figure 2.2, the typical query flow in CryptDB is presented, as follows:

• In step (1), An SQL query is issued by the user, that passed through the
query rewriter/encryptor (QRE). In this stage, anonymizes each table and
column name by using the master key MK, and encrypts each constant in
the query with an encryption scheme that allow the required operation,

• In step (2), Passing the query to the onion key manager OKM. Then checks
if the server need to know the onion keys to execute the query. In that case,
the OKM gives the necessary onion keys by having an UPDATE query at the
server,

• In step (3), The anonymized query is forwarded to the server by the OKM,
that executes it using standard SQL,

• In step (4), The query result is return back,

• In step (5), The results are decrypted by the result decryptor (RD) and sends
them back to the user.

2.4.1 SQL-aware Encryption Strategy

We observe from Table 2.1 that R. A. Popa et al. [36] used some existing encryption
schemes for implementing encryption that allows processing SQL query and
concluded that CryptDB uses the same layer of encryption to encrypt all data
values in a given column, that to perform the same computation on every item
in that column. In order to see the security aspects that CryptDB requests from
each encryption type, there are a variety of encryption schemes as follows:

• Random (RND). It gives the maximum of the privacy for the layer encryption.
RND scheme was conducted to that any two equal values will be mapped
to different encryptions. No computation to be performed efficiently on the
ciphertext at RND layer. To implement RND, AES in UFE mode was used
[11, 20, 30].

• Deterministic (DET). It provides a lightly weaker guarantee, as it leaks which


encrypted data correspond to the same plain data [10, 13, 29]. The benefit of
14 Nahla Naser Aburawi

using DET is allowing the server to process equality checks (i.e. selects with
equality filters, equality joins etc.). Here, DET needs to be a pseudo-random
permutation (PRP) [25]. To implement DET, AES and Blowfish were used for
64−bit and 128−bit values [11, 20, 30].

• Order-preserving encryption (OPE). It is used to establish order relations


between data values based on the encrypted values. However, OPE does not
leak any other information about the data content. By using OPE, the server
can implement MIN, MAX, SORT, ORDER BY. The main disadvantage of OPE
is that revealing the order [5, 37]. In this case, the CryptDB proxy will reveal
the encrypted columns to the server based on the user request. In [12] OPE
has proved security guarantees. In this case, the encryption is equivalent to
a random mapping that keeps the order.

• Homomorphic encryption (HOM). It is an encryption scheme, that allows


the server to perform computations on ciphertexts, then generating the
encrypted result that decrypted at the proxy. The decrypted results should
matches the result of the operations as if they were executed on the plaintext
[16, 24, 32].

• Word search (SEARCH). It allows to execute searches on encrypted text to


allow operations such as MySQL’s LIKE operator [18]. The cryptographic
protocol of Song et al. [43] has been implemented.

• Join (JOIN and OPE−JOIN). In order to allow equality JOIN between two
columns, a separate encryption scheme is used. Another feature of JOIN is
that JOIN supports all operations allowed by DET [27].

In this thesis the proposed approach CryptGraphDB for querying encrypted


graph database, described in Chapter 3 used some of the above encryption schemes
for implementing encryption level that allows the server to perform equality checks.

2.4.2 Adjustable Query-based Encryption

Adjustable Query-based Encryption is a crucial part of CryptDB design, that de-


cides dynamically which level of encryption to reveal to the server. The encryption
Chapter 2. Background and Related Work 15

Table 2.1: The onion layers and operations they provide.

Onion Layer Operation


RND update, delete, insert, project on column C, sum on column C
DET group by, count, distinct, equality joins, equality filters
OPE order by, MIN, MAX, sort, selection based on comparison
HOM addition operation
SEARCH allow word searches (e.g., using the “ILIKE” keyword)
JOIN join by equality on column a and b
OPE-JOIN Range Join: (greater than, less than) predicates.

adjustment level is based on the queries that required over the data: if there is
no function needed the column will be encrypted with RND, while if the columns
need to perform equality checks, DET suffices. Initially, each cell in the database
is encrypted independently into an onion, each item in the table is encrypted with
the strongest encryption layers, as detailed in Figure 2.3. each layer provides
specific function, as reported in subsection 2.4.1.

According to the above, the maximum privacy comes from layers RND and HOM,
while layers DET and OPE provide more functionality. CryptDB uses three onions
for numeric values, whereas using two onions for string values. The frontend of
CryptDB anonymizes the schema as well, to prevent the server from seeing the
information that comes from a column or table names.

Figure 2.3: Onion layers of encryption and the classes of computation they allow.
(from [34, 36])
16 Nahla Naser Aburawi

2.4.3 Query Execution for Read Queries

Turning now to show how to execute a query for read queries. Before executing
the SQL query over an encrypted data, the frontend encrypts, anonymizes and
rewrites the query then sends it to the untrusted DBMS server. Table 2.2 shows
the anonymized schema that needs to rewrite a query. In addition, the query
constant is encrypted with the encryption scheme that matching to layer of the
onion.

Table 2.2: Data layout at the server. When the front end creates a table with the
schema on the left, the table created at the DBMS server is the one shown on the
right (adopted from [36])

Students Table1
ID Name C1-Onion1 C1-Onion2 C1-Onion3 C2-Onion1 C2-Onion2
30 Smith xr32g1 xt43q1 xo89u2 xq22e4 xu4o98

To demonstrate how this works, suppose an example scheme consisting of a


table Students, that has two columns ID and Name. To begin, each column in the
table is encrypted at all onions of encryption (RND and HOM as the outermost
layers), as presented in Figure 2.3. At this stage, the only things that the server
can recognize are the number of rows, columns, and data size but not the data
content. In order to see how the onion layers removed, consider the following SQL
query:

SELECT ID
FROM Students
WHERE name = ’Smith’

Lowering encryption of name to the DET layer is required. The frontend creates
the query

UPDATE Table1 SET C2-Onion1 = DECRYPT_RND(K1,2,RND, C2-Onion1)

Next, SELECT C1-Onion1 FROM Table1 WHERE C2-Onion1 = x9z63r2 where


x9z63r2 is an encryption of Smith with key K1,2,DET . The result is decrypted and
sends back to the user.
Chapter 2. Background and Related Work 17

2.4.4 Query Execution for Write Queries

Another feature is supported by CryptDB which is modify data on the server


(INSERT, DELETE, and UPDATE). For all modification queries, the frontend follows
exactly the same processing for read queries to the predicates such as WHERE
clause. For INSERT queries, each inserted column’s item is encrypted by the
frontend with each onion layer that has not been decrypted yet. For DELETE
queries,no additional processing is processed. For UPDATE queries that adjust
the value of a column to a constant, as for INSERT, the value is encrypted with
the appropriate onions.
The following is when UPDATE queries update a column value based on another
column value, such as column=column+1. In this case, HOM is used to perform
UPDATE, as it allows additions. Another observation indicates that an encryption
scheme that allows both comparison and addition at the same time is principally
insecure. If a malicious server can recognize values order (OPE) and can increase
the value by one, then the server continue adding one to each item in homomorphic
manner, until the item be as equal as other value. In this case, the difference
between any two values in the database can be computed by the server (equivalent
to knowing their values). To address this problem, when the column needs to be
incremented and then only projected (no comparisons), when processing the value
of this field, use the value of Onion HOM rather than Onion Equality or Order, as
Onion HOM is up-to-date.

2.4.5 Computing Joins

An important area in the field of supporting joins. If two columns need to be


joined, they need to be encrypted with respect to the same key for levels JOIN or
OPE−JOIN. The best privacy for equality joins comes when the server does not
join columns for which the user did not ask for a join. In this case, the columns
that are never joined do not need to encrypted with the same cryptographic JOIN
keys.
R. A. Popa et al. [36] discussed on the problem for computing Joins and
concluded that when the user requests a join of columns C1 and C2 , and a join of
columns C3 and C4 , the server cannot join C2 and C3 . As no knowledge in advance
about what columns will be joined. Thus, the challenge is, which JOIN keys that
18 Nahla Naser Aburawi

each column uses to be encrypted with.


R. A. Popa et al. [35] developed on a new cryptographic primitive for computing
Joins and concluded that lets the server to dynamically adjust the encryption
keys of JOIN of each column. At first, each column is encrypted with a different
JOIN key. When the user query needs a join, the server will receive an onion key
from the frontend, to allow two columns to be re-encrypted with the same JOIN
key that allows the joins between the two columns. Issues regarding computing
joins method presented by R. A. Popa et al. and that proposed in this thesis is
presented in Chapter 3.

2.5 Graph Databases

In this section graph database concepts (more specifically Neo4j) will be present
[39]. Graph databases have recently become very popular. The graph structures
with nodes and edges constitute a convenient data model allowing to model all
kinds of scenarios. Querying graph databases may also be more efficient as
compared with relational databases, especially by data traversal queries.
Several implementations of graph DBMS are available, including GraphDB,
Neo4j, OrientDB, to name a few [14, 23, 50]. A graph database consists of two
elements: a node and a relationship. The node represents an entity (a person,
place, thing), and the relationship represents how two nodes are connected. For
example, the two nodes (Blue) and (color) would have the relationship [is a type of]
pointing from (Blue) to (color).
As can be seen in Figure 2.4, some nodes (Employee) represented in a graph
data model. Each node (labeled “Employee”) belongs to a single person and is
connected with relationships describing how each Employee is connected. As
shown, (Smith) knows each of (Tom) and (Jones), as does (Jones) knows (Tom).
The remainder of this section comprises two subsections. Firstly Sub-section
2.5.1 presents graph database (Neo4j). Secondly Sub-section 2.5.2 presents
Cypher query language that used to query Neo4j database.
Chapter 2. Background and Related Work 19

Figure 2.4: The connections [relationships] between different Employee (nodes).

2.5.1 neo4j database

As mentioned earlier in this thesis, the fundamental approach to query encrypted


graph database applies on Neo4j database. Neo4j is a graph database management
system developed by Neo Technology Inc. in 2007. Neo4j is an open-source graph
database written in Java. Also, it supports an ACID-compliant transactional
backend for the applications. Neo4j supports both an enterprise edition and a
community edition of the database [26]. The edition package used to analyze the
data in this thesis was the enterprise edition. More details on this will be given
in Chapters 3 and 5. Neo4j DBMS comes with the graph query language Cypher
which is discussed in details in the next sub-section [19, 33].
Here follows the Neo4j data model description [14, 39]. Neo4j graph database
uses the property graph model to store the data. Below is a list of the main aspects
20 Nahla Naser Aburawi

of property graph model.

• The data is represented in nodes, relationships, and properties.

• Properties are key-value pairs.

• Nodes are presented by circles and relationships by arrow keys.

• Nodes connected by relationships.

• Nodes and relationships have properties.

The fundamental concepts of the property graph as follows:

Nodes

• Nodes contain properties. The simplest graph is a single node.

• A node can have zero or more outgoing relationships.

• A node can have zero or more incoming relationships.

Relationships

• Relationships use to connect two nodes directly, the source node and
the target node.

• An outgoing relationship is a directed relationship from the source node.

• An incoming relationship is a directed relationship to the target node.

Labels

• Labels use to set the common name to a group of nodes. A node might
have one or more labels.

Properties

• Properties are key-value pairs used to add features to nodes and rela-
tionships.
Chapter 2. Background and Related Work 21

2.5.2 Cypher Language

Cypher is a declarative query language for querying a Neo4j database (it is more
on expressing what to return from the graph the result rather than on ways how
to get the result). The syntax of the Cypher gives a natural way to express the
patterns of nodes and relationships in the graph to be matched during the query
execution. It is not required to describe how we can do the select, insert, update
or delete from our graph data [1, 31, 33, 46].

Cypher Clauses

The Cypher query language is comprised of several distinct clauses as follows.

Querying the graph

• MATCH that is used to perform matching between nodes in the given


graph pattern.
• WHERE that allow to filters the matched nodes using predicate elements.
• RETURN which returns the result data, and also handles aggregation.
• ORDER BY which permits to sort the query result.
• SKIP/LIMIT that paginates the query result.

Updating the graph

• CREATE which uses to create nodes and relationships.


• MERGE that is typically used to create nodes uniquely.
• CREATE UNIQUE which allows to uniquely create relationships.
• DELETE that employs to remove nodes and relationships.
• SET which updates properties and labels.
• REMOVE that removes properties and labels.
• FOREACH which is used to perform updating actions once per element
in a list.
• WITH that divides a query into multiple, distinct parts and passes results
from one to the next.
22 Nahla Naser Aburawi

The Cypher general structure as follows:

MATCH (node1: label1)-[:Relationship]->(node2: label2)


WHERE node1.propertyA = {value1} AND node2.propertyB = {value2}
RETURN node2

Cypher Examples

To present the fundamental ideas of the Cypher query execution, the example
graph in Figure 2.4 will be used. Many Cypher queries will be considered as
follows:

Example 1.

Find Employee nodes in the graph that have a name of Tom. First, we need
to name a variable and this variable is used as a reference to Tom node. The
answer is Tom.
MATCH (n: Employee)-[:KNOWS]->( )
WHERE n.name = {Tom}
RETURN n

Example 2.

Find the age of Tom. Note, we need to use the same variable for both name and
age to point to the same node which is Tom. The answer is 33.
MATCH (n: Employee)-[:KNOWS]->( )
WHERE n.name = {Tom}
RETURN n.age

Example 3.

Find WHO does knows Tom. In this case, we need to specify the direction of the
relationship, to know who knows Tom, we need the incoming relationship direction.
The answer is Jones.
MATCH (n: Employee)<-[:KNOWS]-(m)
WHERE n.name = {Tom}
RETURN m
Chapter 2. Background and Related Work 23

Example 4.

Find WHO does Tom know. In this case, we need to specify the direction of the
relationship, to know who Tom knows, we need the outgoing relationship direction.
The answer is Smith.

MATCH (n: Employee)-[:KNOWS]->(m)


WHERE n.name = {Tom}
RETURN m

2.6 Other Querying Encrypted Database Technologies


This section presents the background and related work with respect to the work
described in this thesis, as follows:
Recently, the work in [41] presents the idea of CryptDB that has been further
developed to include fine grained access control using advanced cryptographic
primitives. The proposed encryption procedure takes into account the data re-
quirements of the query as well as the users and/or groups of users access
rights.
Also the work in [52] proposes CryptGraph system, which aims to run graph
analytics on encrypted graph to keep the privacy of analytic results and user
graph data. Accordingly, they present CryptGraph to enable the user to encrypt
its graphs before uploading them to the cloud. Furthermore, this CryptGraph
permits the user to obtain the results back after analysing the encrypted graph.
In this way, only the user can decrypt both the analytic results and the user graph
data to extract the plaintext form.
In [48], a MONOMI system is presented to securely execute analytical queries
over private data when an untrusted server is used. Typically, this system operates
by encrypting the whole database and executing queries over the encrypted data.
The queries are executed based on introducing several techniques such as, grouped
homomorphic addition and pre-filtering techniques.
Inspiring from the CryptDB and PIRATTE concepts, the work in [8] focuses
on the document database. The PIRATTE is a proxy that is employed to share
encrypted file between the data owner and several users using social network.
Generally, the work in [8] proposes a specific scheme called Secure Document
24 Nahla Naser Aburawi

DataBase (SDDB) to meet three security requirements: confidentiality, flexible


access control, and querying over encrypted data. The proposed scheme en-
hances the CryptDB security by supporting sharing data between multi-users via
PIRTATTE concept.
A comparison between graph database and relational database is presented
in [49]. This work concludes that the graph database performed better than the
relational database at the structural type queries. This conclusion is drawn based
on executing each query 10 times on 12 databases and calculating the average
execution time for that in milliseconds.
The works in [43] and [18] pay attention to the issue of searching over encrypted
data and proving the security of this search. Typically, these works present some
techniques to handle this issue to gain the advantage of not letting the untrusted
server to recognize the plain text from the search result.

2.7 Summary
This chapter has provided the reader with a review of the existing work that
underpins the work presented in the thesis. The chapter commenced with a
review of the Cryptography concept. The CryptDB system and its design was then
presented, together with discussion of both the SQL-aware encryption strategy
and an adjustable query-based encryption with the onion encryption. Also, query
execution for both read and write queries. The central theme of the work presented
in this thesis is to address CryptDB system principles using graph database. There
are a variety of graph databases that fall within the domain of this approach, the
one used in this thesis was Neo4j. The chapter then went on to consider the graph
database which the work presented in this thesis is directed; especially the Neo4j
database that have been proposed in the literature. The significance of the later
was that Cypher query language was used for evaluation purposes with respect
to the CryptGraphDB approach proposed later in the thesis. The next chapter
describes the querying encrypted graph databases mechanism.
Chapter 3

Querying Encrypted Graph


Databases

3.1 Introduction

This chapter presents the proposed CryptGraphDB approach for querying graph
databases. CryptGraphDB is a database system that protects the privacy of
the data in the graph database by executing Cypher queries over the encrypted
database, more details on this will be given in this chapter. Another key point
to remember is that CryptGraphDB deals with the threat of either an Inquisitive
Database Administrator (DBA) or database hacker who attempts to know private
data (For example details concerning: online banking, medical records, billing
data, employment records) by spying on the DBMS server. Because the server can
see the database only in encrypted form and never receives the decryption key,
the DBA can learn nothing about the data. Figure 3.1 details CryptGraphDB’s
architecture.
In this work, we aim to take CryptDB principles [34] as they are implemented
for relational databases and transfer them to graph databases. We first elaborate
on the requirements for the CryptGraphDB component of Cypher query language
and the Neo4j database system. It has been found that SQL-aware encryption
schemes can be seamlessly re-used as Cypher-aware schemes, at the same time
preserving the performance advantages of traversal graph queries over equivalent
relational ones, more details on this will be given in the next section.

25
26 Nahla Naser Aburawi

Figure 3.1: The typical query flow in Graph CryptDB (adapted from [34])

CryptGraphDB protects the confidentiality of data in graph databases using


an appropriate encryption scheme based on the query. Initially, the database is
encrypted and the database server executes Cypher queries over the encrypted
data, depending on the query the data may need to be decrypted. The results of
the query are decrypted only at the client side. The fundamental characteristics
of CryptGraphDB are: (1) a Cypher-aware encryption strategy that maps Cypher
operations to encryption schemes, (2) adjustable query-based encryption that lets
CryptGraphDB adjust the encryption level of each data item based on user queries,
and (3) onion encryption to change data encryption levels in an efficient way. The
good performance of CryptGraphDB has been achieved by utilizing a collection
of Cypher-aware encryption schemes so that only the necessary information to
execute the various types of Cypher queries is revealed, whilst keeping the rest of
the data itself hidden.

CryptGraphDB was implemented in two different ways: (i) using Neo4j embed-
ded in Java applications by including the Neo4j library jars in the build; and (ii) by
creating a set of User-Defined Functions (UDFs) to be called directly from Cypher
query.

The structure of the remainder of this chapter is as follows. Section 3.2 presents
the proposed design. Section 3.3 introduces the implementation of the prototype.
Finally Section 3.4 presents a summary of the chapter and the main findings.
Chapter 3. Querying Encrypted Graph Databases 27

3.2 Proposed Design

This section describes the CryptGraphDB architecture (a schematic is given in


Figure 3.1), the implemented prototype and the experiments results. The work
presented in this thesis proposes a new approach that takes the principles of
CryptDB implemented in relational databases and applies them to graph databases.
To begin we elaborate requirements for the proposed CryptGraphDB component
for Neo4j database system and Cypher query language.
As stated in the Introduction, the main target of CryptGraphDB is to protect
data confidentiality against any threat by executing Cypher queries over encrypted
data on the DBMS server. CryptGraphDB works by enabling the execution of
Cypher queries over encrypted data. Three ideas are adopted, originally proposed
in CryptDB for relational DBMS: a Cypher-aware encryption strategy, adjustable
query-based encryption, and onion encryption.
The main requirement for encryption adjustment is that it has to select an
appropriate encryption layer which would: (1) reveal enough information about
encrypted data for query execution; and (2) do not reveal more information than
necessary.

3.2.1 Cypher-aware encryption strategy

In order to implement an encryption that allows Cypher queries to be processed,


an existing encryption schemes was used. CryptGraphDB encrypts all data items
in a given node to the same layer of encryption, as a result the same operation can
be applied to every value in that node. The encryption types that CryptGraphDB
uses are presented in sub-sections 3.2.3.

3.2.2 Adjustable query-based encryption

The adjustable query-based encryption is the main feature of CryptGraphDB.


Some queries do not require any checking or comparison, so the nodes properties
will be encrypted with RND layer (random level of encryption that does not revealing
any information; different instances of the same plain text value are likely to be
mapped to different encrypted texts). For the node properties that require checking
for equality during query execution the DET layer is used; that is the deterministic
28 Nahla Naser Aburawi

level of encryption that maps the same plain text values to the same encrypted
texts. As a result, we need to allow CryptGraphDB to adjust the encryption level
of each data item based on user queries. This could be done either in advance, if a
set of queries is fixed in advance, or during run time, in which case an adjustable
onion layered encryption should be used.

3.2.3 Onion Layers of Encryption

Onion layers of encryption allow changes to data encryption levels in an efficient


way. To implement adjustable query-based encryption, the onion encryption
supports multiple encryptions of each data item. Each layer of each onion enables
some kinds of functionality. As an example, the outermost layers, such as the
RND and HOM layers, give maximum security; while the inner layers, such as, the
OPE and DET layers, provide more functionality. Thus the adjustable query-based
encryption dynamically adapts the layer of encryption on the DBMS server as
illustrated in Figure 3.2.

Figure 3.2: Onion layers of encryption and the groups of computation they perform
(adapted from [36]).

Different cryptographic systems are available to be cascaded into onions layers


(here we follow the original CryptDB design):

• Random (RND). It is a probabilistic scheme that provides the maximum


security, given two equal plain text values will be mapped to different cipher-
texts. RND does not reveal any information on the plain text and does not
allow any computation over the ciphertexts.

• Homomorphic encryption (HOM). It is a method of encryption that allows


to perform calculations on encrypted data in a secure manner.
Chapter 3. Querying Encrypted Graph Databases 29

• Word search (SEARCH). It allows the implementation of a cryptographic


protocol for keyword searches on encrypted text. SEARCH allows the server
to detect repeating words in a given node.

• Deterministic (DET). It only leaks encrypted values that match to the same
data value, and no more. DET was implemented to allow equality checks to
be performed.

• Order-preserving encryption (OPE). It allows order relations and compari-


son between data values based on their encrypted values.

3.2.4 Join Operator in Relational CryptDB

In order to combine two tables, or more, in a relational database a join operator


is used based on some common information. According to that to implement
encryption that allows SQL query processing, CryptDB uses a new cryptographic
primitive called JOIN. The JOIN primitive allows the server to compare values
from two data columns. To achieve a join there is a join layer in the onion layers
of encryption. If two columns are to be joined, they need to be encrypted with the
same key for levels JOIN or OPE-JOIN. It has been noted that for CryptDB [34, 48]
the implementation of the DET layer for SQL queries that include, a join operation
might lead to unnecessary leaks of information, such as cross-column equalities.
Join-aware encryption schemes were designed to overcome this drawback. The
authors of [34] and [48] proposed JOIN and OPE-JOIN as the DET encryption
needs to use different keys for each column to allow equality joins between two
columns; so a separate encryption level is necessary. For inequality joins, OPE-
JOIN is used to perform inequality joins by the server on the columns at that
encryption level.
In [39] it was concluded that in graph databases there was no need to perform
joins and therefore it is absent in the Cypher query language. There is no other
way to make use of the join operator because of the expressive structure of graph
databases, as a graph database uses the nodes to represent the entities and the
relationships to express how those nodes are connected to each other. In order to
follow some paths we need to find a starting node in the graph database, this can
be done by using the index free adjacency characteristic, and then move from one
30 Nahla Naser Aburawi

node to the next one over the connecting relationships. Every hop over the path
will be equivalent to a join operation. As a result the JOIN layer in the onion layers
of encryption in [4] were omitted. More details on this topic can be found in [35].
Regardless above the above, under the natural translation of relational databases
to graph databases, such as by the work flow proposed in the Neo4j manual (export
relational database instance to CSV file and upload it to Neo4j as a graph DB in-
stance), one may show that the issue of unnecessary leaks remains. Furthermore
the original Join-aware encryption scheme can be used as a "property-aware"
encryption scheme for Cypher query execution. More details on related strategies
of encryption adjustment will be given in the next section. Further improvement in
security protection in CryptGraphDB will come from the development of Traversal-
aware Encryption Adjustment applied dynamically during the execution of graph
traversal queries. More details on the development of such schemata, and an
analysis of the related trade-offs between efficiency and security protection, will
be given in the next section.

3.2.5 Information leaks in Relational CryptDB

As reported above, the focus of CryptDB [34] is on allowing the DBMS to implement
relational queries over encrypted data in a manner roughly synonymous with
implementing the same queries on the original data. To determine whether some
queries need individual operations such as equality comparison or summation,
only these modified operators are used on ciphertexts. In order to implement
encryption that empowers SQL query processing, existing encryption schemes are
used.
In this scheme if two columns are to be joined, they need to be encrypted with
the same key for levels JOIN or OPE-JOIN. To allow the equality joins between two
columns a separate encryption level is necessary. For inequality joins, OPE-JOIN
is used to perform inequality joins by the server on the columns at that encryption
level.
Table 3.1 illustrates the issue with information leaks for simple encryption
adjustment schemes. The upper part represents two tables (left for Student)
and (right for Enroll), while the subsequent parts represent RND and DET layers
respectively.
Chapter 3. Querying Encrypted Graph Databases 31

Table 3.1: Data layout at the server when the frontend creates encrypted tables
using the schema at the top; the tables created at the server are given at the
bottom.

STUDENT ENROLL
Non-encrypted database
stuID Name Major stuID Name ClasssNumber
1 Smith Math 1 Smith MTH200
2 Tom CSC 3 Mary CSC201
3 Mary CSC 4 Jones CSC205
An encrypted database RND layer
stuID Name Major stuID Name ClasssNumber
X98e7 x5a8c X4a87 qwe70 x9g6h oi54f
X7a8q x78as gh66y asd8q xu33k 86uyt
Xq74f x1cf8 tri13 qwe4a x7j8u dft63
An encrypted database DET layer
stuID Name Major stuID Name ClasssNumber
as99e X88df RtE54 as99e X88df khj33
uyt57 x3s4t FM98w nhy97 X6s5g qw12l
nhy97 X6s5g FM98w opo44 x7rz7p hg80u

Consider the following SQL query:

SELECT Name
FROM Student, Enroll
WHERE Student.Name = Enroll.Name

The outcome of executing the above query would be Smith and Mary. According
to the CryptDB mechanism, initially the RND is performed as illustrated in the
middle part of Table 3.1. Later the DET level is used to lower the encryption of
Name, as shown in the lower part of Table 3.1. After that a query to update the
table is required, where x88df and x6s5g are an encryption of Smith and Mary
respectively. The results demonstrate two things. First, we can recognize that
there are two values correspond to the same non-encrypted values: x88df and
32 Nahla Naser Aburawi

x6s5g. Second, unnecessary information, such as cross-column equalities, is


revealed, such as stuID.

3.2.6 CryptGraphDB principles

More detail on how CryptGraphDB executes Cypher queries over encrypted data
will be given in this sub-section. The required steps to process the query by
CryptGraphDB are illustrated in Figure 3.3. This involves the following steps:

1. The application issues a query that is intercepted and rewritten by the proxy:
it anonymizes each label, node, and relationship name, and encrypts each
constant in the query with an encryption scheme that allows the required
operation.

2. By using multi-layered encryption and encryption adjustment, adjust en-


cryption layers before executing the query if DBMS needs to do. If so, issues
an UPDATE query to adjust the encryption layer of the appropriate node,
while the semantics of the query are preserved.

3. The encrypted query is sent by the proxy to the DBMS server, then be executed
using a standard Cypher.

4. The DBMS server returns the encrypted result of the query to the proxy, then
the Proxy decrypts, and sends them back to the application.

In order to rewrite the query in encrypted format. The syntactical structure


of the query remains the same, but its syntactical components are encrypted/
replaced as follows:

• Property names: (encrypted/renamed).

• Property values: (comes from the current layer encryption value).

• Node names: (encrypted/renamed).

• Relationship names: (encrypted/renamed).

• Relationship values: (comes from the current layer encryption value).


Chapter 3. Querying Encrypted Graph Databases 33

Figure 3.3: The CryptGraphDB Design.

3.2.7 Case Study

To illustrate how CryptGraphDB processes a Cypher query, consider an example


scenario as shown in Figure 3.4 (upper part), consisting of a graph database label,
nodes, and relationships, as follows:

Nodes:

(a:actor{name:’Tom Hanks’})
(m:movie{title:’The Matrix’})
(d:director{name:’Andy Wach’})

Relationships:

(a)-[:Acted_In{role:’Keneu’}]->(m)
(d)-[:Directed{In:’1999’}]->(m)

Initially, each data item is adapted with onion of encryption either at most 3
onions (for integers) or 2 onions (for non-integer) values, within the RND and the
34 Nahla Naser Aburawi

Figure 3.4: Three nodes schema at server-side (part a) along with the corresponding
plaintexts and data under CryptGraphDB (part b) along with the corresponding
ciphertexts. Ciphertexts shown are not full-length.

HOM as the outermost layer. At this stage, the server (or rather an attacker taken
over the server, or curious administrator) can recognize nothing about the data
content other than the number of nodes, properties, and relationships.
Given a query of the form:

MATCH (a:actor)->[m]<-(d:director)
WHERE a.name = "Tom Hanks"
RETURN m.title

This query requires an onion of equality to be performed (need to get to different


layer of that Onions). The adjustment results are shown in Figure 3.4 (lower part).
Chapter 3. Querying Encrypted Graph Databases 35

Step 1. an update query is needed to UPDATE actor to actor1 and decrypts


the name property (value) to the DET layer,

UPDATE actor1 SET P-Onion = DECRYPT-RND(P-Onion)


SET P1-Eq = DECRYPT RND(KeyN1,P1,Eq,RND, P1-Eq, P1-IV)
SET P3-Eq = DECRYPT RND(KeyN3,P3,Eq,RND, P3-Eq, P3-IV)

Now, P1-Eq and P3-Eq properties are decrypted at the DET layer:

DKeyN1,P1,Eq,RND(XQ55U, XE23R) = x7bf8


DKeyN3,P3,Eq,RND(XS36E, XWW12) = xbg78

Step 2. Proxy encrypts Tom Hanks to its EQ onion, DET layer encryption value
of x7bf8 via:

EkeyN1,P1,Eq,DET(Tom Hanks).

Then, the proxy generates a query and sends it to DBMS to be executed:

Match (aa: Actor1)-->[mm]<--(dd:director1)


Where aa.P1-Eq = x7bf8
Return mm.P2-Eq, mm.P2-IV.

Here, x7bf8 is encryption of Tom Hanks. The proxy should request the random
IV from property P1-IV in order to decrypt the RND ciphertext from P1-Eq.
Step 3. The proxy receives an encrypted RND level result x684 and decrypts it
using:

DKeyN1,P1,Eq,DET(DKeyN1,P1,Eq,RND(x684, x456))= The Matrix

Finally, The proxy sends the decrypted result The Matrix to the application.

In this way, CryptGraphDB, similarly to CryptDB could use various encryp-


tion schemes that support several operations, such as check the equality, order
comparisons, and some arithmetic calculations. As encryption layer adjustment
concerned, in both, original CryptDB and CryptGraphDB it happens before query
36 Nahla Naser Aburawi

execution. It has noticeable consequences in the context of CryptGraphDB. In


principle, CryptGraphDB can work with all these onion layers of encryption, but
in This thesis only the equality checking is considered. The required adjustment
is performed everywhere in the database instance. So, for example, if the query
requires access to equality of values for any property, the encryption layers for
this property values are adjusted in all nodes where the property is present. As
we will see that may lead to unnecessary information leaks. In Chapter 4, an
approach which allows revealing less information when performing an encryption
adjustment at least for some types of queries was presented.

3.3 Implemented Prototype

The first prototype of a fragment of the above CryptGraphDB design was imple-
mented using a Java API. The main focus here was to evaluate the efficiency of
execution of Cypher queries over an encrypted Neo4j database. Several ways for
integrating Neo4j with client applications have been proposed. These include
Neo4j Server REST extension, Neo4j server with JDBC, and Neo4j’s Embedded
Java API [45].
The embedded Java API provides for the tight integration of the client Java
program with the Neo4j server, that is embedding a server into an application. So
it does not allow for the actual separation of the CryptGraphDB component and
the server, as assumed by the design. However, it allows for quick prototyping,
where the client, the proxy and the server, are all parts of the same application,
and experimenting with the execution of the queries on the encrypted store.
For the first prototype, only one encryption layer was implemented, is the DET
layer. The Advanced Encryption Standard (AES) algorithm available in the Java
Cryptographic Architecture (JCA) package was used. The Neo4j Native Java API
was used to create a Neo4j database in the chosen path, as shown here:

GraphDatabaseFactory dbFactory = new GraphDatabaseFactory();


GraphDatabaseService db= dbFactory.newEmbeddedDatabase("C:/Neo4j");

Note that the follows transaction needs to be included, in order to start Neo4j
database transaction.
Chapter 3. Querying Encrypted Graph Databases 37

try (Transaction tx = graphDb.beginTx());tx.success();

To execute a Cypher query we need to call the function execute, as follows:

ExecutionResult execResult = execEngine.execute


("MATCH(n) WHERE not((n)-[ ]-( )) RETURN n");

In order to illustrate how CryptGraphDB processes a query Q, assume that we


have the follows Cypher query:

MATCH (A:PEOPLE)-[:KNOWS]-(B:PEOPLE)
WHERE A.name= "John"
RETURN B.name

Processing the above query Q in CryptGraphDB, involves three steps:

1. Each value in the graph is encrypted within the RND layer. Resolution of
the query requires the lowering of the encryption values to level DET, as we
need to check the equality, by issuing an update query command using SET
clauses.

2. Proxy encrypts "John" to its Equality onion, the DET layer encryption, then
the proxy generates the query and sends it to DBMS:

MATCH (AA:PEOPLE1)-[:KNOWS1]-(BB:PEOPLE1)
WHERE AA.name1="c9Yz1Og1PdfVKBrVnOk46Q"
RETURN BB.name1;

Where c9Yz1Og1PdfVKBrVnOk46Q is the encryption of John.

3. Proxy receives the encrypted result and decrypts it and finally sends it back
to the application.

3.3.1 Experimental Setup

The implemented prototype system was used to conduct the experiments and
study the efficiency of Cypher queries executed on an encrypted graph database.
38 Nahla Naser Aburawi

For the first group of experiments an instance of a graph database with 250 nodes
and 200 relationships was manually created. Some nodes were intentionally left
orphan, meaning they have neither incoming nor outgoing edges. Several types of
relationships were used and a variety of path lengths from a binary relationship
to a path of up to a length of ten.

3.3.2 Queries

The set of queries was designed to test some of the common types of queries. For
example, traversals are important to define data objects (nodes) that originate
from or are affected by some starting object or node. Another popular operation
is searching for particular values within a specific property. The queries were
divided into two types: structural and data queries.

The structural queries:

• S0 : Find all orphan nodes. Which means, find all nodes in the graph with
no incoming edges and no outgoing edges.

• S1 : Traverse all nodes, transitive closure / regular expression.

• S2 : Traverse all the nodes in the graph.

• S3 : Find all nodes with the incoming relationship, and count them.

• S4 : Find all nodes with the outgoing relationship, and count them.

• S5 : Find all nodes with the incoming relationship and outgoing relationship,
and count them.

• S6 : Traverse Nodes depending on their ID.

The data queries:

• C1 : Count the number of nodes whose data is equal to a specific property.

• C2 : Get the nodes with a path according to a specific relationships.

• C3 : Count the number of nodes in the graph.


Chapter 3. Querying Encrypted Graph Databases 39

3.3.3 Results/ Analysis

Each query was run 12 times on the database and execution times were collected.
All times are in milliseconds (ms). We have dropped the longest times and the
shortest times, the remaining ten times were averaged. We have done this to make
sure that the caching does not affect the timing. The data values were chosen
randomly and in advance all the same values were implemented for both the
non-encrypted databases and the encrypted databases.
A summary of the execution time for structural queries, both non-encrypted
and encrypted instances of the same database, is given in Table 3.2. For the
structure queries, S0 − S6 , the non-encrypted database was slightly faster, as
shown in Table 3.2 (upper part). On the other hand, there was also a small
difference in the execution time in the data queries, C1 − C3 , as detailed in Table
3.3 (upper part). Overall the slowdown and acceleration, demonstrated for some
of the queries, were insignificant.

Table 3.2: The structural queries on non-encrypted and encrypted databases with
Java interface and Neo4j interfaces (time in millisecond).

Query S0 S1 S2 S3 S4 S5 S6
Java interface
Non-encrypted DB 5030.8 5115.7 5256.6 5038.1 4931.1 5326 5455.5
Encrypted DB 5058.4 5092.5 5252.1 5081.8 4881.2 5338.3 5471.9
Slowdown 0.54% -0.45% -0.085% 0.86% -1.012% 0.23% 0.3%
Neo4j interface
Non-encrypted DB 278.5 49 76.3 64.3 68.4 89.1 94.8
Encrypted DB 360.7 46.1 80.3 75.1 41 63 85.2
Slowdown 29.5% -5.92% 5.24% 16.8% -40.1% -29.3% -10.13%

For the sake of comparison, we executed the same sets of structural queries,
again over encrypted and non-encrypted instances using the native Neo4j interface
(Enterprise edition). The results can be seen in Table 3.2 (lower part). Additionally,
the results of the same group of data queries can be seen in Table 3.3 (lower
part). Unlike the case of the Java API, where the wall time was measured using
Java methods, here the processor time reported by Neo4j is shown. This explains
40 Nahla Naser Aburawi

Table 3.3: The data queries on non-encrypted and encrypted databases with Java
interface and Neo4j interfaces (time in millisecond).

Query C1 C2 C3
Java interface
Non-encrypted DB 5491.2 5248.8 5216.8
Encrypted DB 5478.2 5287.1 5510.4
Slowdown -0.24% 0.73% 5.63%
Neo4j interface
Non-encrypted DB 63.8 71.8 79.2
Encrypted DB 59.5 40.7 53.8
Slowdown -6.74% -43.31% -32.07%

one or two orders of magnitude difference between the tables. While slowdown is
reported for S0 and S2 queries run over encrypted instance, other queries were in
fact accelerated. The results look encouraging for using CryptDB-like approaches
for graph databases. However, more empirical evidence is required with respect to
larger databases. The results of execution time are compared in Figure 3.5.

Figure 3.5: Querying Non-Encrypted and Encrypted Graph Database with Java
Interface and Neo4j Interface.
Chapter 3. Querying Encrypted Graph Databases 41

3.4 Summary
In this chapter the proposed CryptGraphDB approach for executing queries over
encrypted graph databases has been presented. The design of CryptGraphDB
was outlined and experiments with an initial prototype system implemented with
Embedded Java API for Neo4j graph DBMS reported on. It was shown that the per-
formance using CryptGraphDB demonstrated promising results which warranted
further development of the system. The future directions in the development of the
encryption adjustment schemes specific for graph databases was also outlined,
such as property-aware and traversal-aware schemes. The implementation of
these schemes and the investigation of related trade-offs between efficiency and
security is a topic for future research. In the next chapter a dynamic scheme
called traversal-aware encryption adjustment TAEA is presented.
Chapter 4

Traversal-Aware Encryption
Adjustment for Graph Databases

4.1 Introduction

In recent years there has been growing interest in querying encrypted data in
databases [6, 7, 9, 28, 40], to address such data security concerns. Until very
recently most of the work was on relational databases. In [3] we extended CryptDB
to on graph databases. The challenge is how best to transform the query into a
format appropriate for querying an encrypted database. In the context of data
processing methods to query encrypted data. This approach is inspired by the
work in [34]. CryptDB provides a powerful mechanism for security protection of
data against server-based attacks. One of the techniques used in the CryptDB
and CryptGraphDB is layered common encryption and encryption of adjustment.
Both original approaches apply static adjustment before query execution.
In this chapter, the proposed Traversal-Aware Encryption Adjustment (TAEA)
mechanism for graph databases is presented. TAEA is directed at adjusting the
encryption layers in the dynamic context (whereas the CryptGraphDB approach
using simple encryption adjustment presented in the previous chapter was directed
at static adjustment context). The TAEA mechanism adopts a data processing to
query encrypted data in graph databases.
The main idea is when applied TAEA to graph database querying lead to
demonstrably less unnecessary leaks of information. The scheme is dynamic

42
Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 43

and the encryption layer adjustment happens not before the query execution,
but rather it gradually progresses alongside the execution. In order to provide a
sensible trade-off between data security protection and data processing efficiency,
TAEA utilizes multi-layered encryption and encryption adjustment that allows
controlling to some extent the release of information about data elements required
for query execution. However, it is argued here that using simple adjustment
policies (CryptGraphDB) requires less queries and updates to be followed; in
other words, unnecessary data may reveal when adjusting the encryption layers.
Instead, the proposed TAEA mechanism, as the name suggests, uses the concept
of dynamically adjusting encryption layers as query execution progresses. In this
way less information is revealed to any adversary watching the execution of the
query on the encrypted store; The TAEA method is fully described later in this
chapter together with how it was implemented and its evaluation.
The remainder of this chapter is organised as follows: Section 4.2 demonstrates
the encryption layers and simple adjustment. Section 4.3 then presents the
proposed Traversal-Aware Encryption Adjustment method and its evaluation.
Finally Section 4.4 gives a summary of the chapter.

4.2 Encryption Layers and Simple Adjustment

In this section, we outline briefly the concepts of encryption layers and encryption
adjustment in the context of encrypted databases querying. Onion Layers of
encryption considered in [4, 34] allow changing data encryption levels on-demand
in an efficient way. The main idea is to encrypt each data item in one or more
onions, where each layer of each onion enables some kinds of functionality as
explained in [4, 34]. In the beginning, each data item in the database is encrypted
in all onions of encryption, started with the most secure encryption scheme as
outermost layers. At this point, the server can know nothing about the data other
than the number of nodes, properties, and data size, whilst the inner layers such as
the OPE and the DET provide more functionality. Depending on the requirements
of a particular query for data access the level of encryption is adjusted before
query execution. Different cryptographic algorithms are available to be cascaded
into onion layers as mentioned in sub-section 3.2.3.
44 Nahla Naser Aburawi

The fundamental characteristic of encryption adjustment is a selection of an


appropriate encryption layer that reveals information about an encrypted data
that is needed for executing the query, but it does not reveal more information
about the data than required.

An example of query processing with simple encryption adjustment


Assume we have graph data store as illustrated in the (left part) in Table 4.1 that
have two properties name and age. Given a query of the form:

MATCH (node:person)-[:knows]->( )
WHERE node.name = {"Tom"}
RETURN node.age

The above Cypher query requires the equality check, to do this, the query
needs to pass from the RND layer as shown in the (middle part) to the DET layer as
detailed in the (right part) in Table 4.1. Both values in the middle part and the
left part of the table are corresponding to ciphertexts. (ciphertexts shown are not
full-length)

Table 4.1: Data layout at the server (left part), where the application table created
at the server (middle and left parts).

person
name age name at RND age at RND name at DET age at DET
Tom 29 a4a895a87052 e6ba69bdf08c UD82Pv8uGNi7 33TPfYgeYDKb
Smith 22 9d60b415e6e7 686097aa7a7a j39IjDVyx/+ NMtqlsMp8Qaf

Step(1) Implement onion layer decrypting using UDFs that run on the DBMS
server, more details on this will be given in the next Chapter . In order to decrypt
onion equality of column 2 to layer DET, the proxy creates the following query by
using the DECRYPT RND UDF:

UPDATE person SET name = DECRYPT_RND(name)

Now, all name property at the DET layer:

DECRYPT_name, RND(a4a895a87052) = UD82Pv8uGNi7


Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 45

Step(2) The proxy encrypts Tom, to the DET layer encryption value of UD82Pv8uGNi7,
then, proxy generates query and sends it to DBMS:

MATCH (node:person)-[:knows]->( )
WHERE node.name = {"UD82Pv8uGNi7"}
RETURN node.age

Step(3) The proxy receives encrypted RND level result e6ba69bdf08c and decrypts
it using:

DECRYPT age, DET(DECRYPT age, RND(e6ba69bdf08c)) = 29

Lastly, proxy sends back the decrypted result 29 to the application.

4.3 Traversal-Aware Encryption Adjustment TAEA

The idea of traversal-aware encryption adjustment is quite natural and simple.


For some types of queries, the processing can be naturally split in a well-defined
sequence of stages. That is true for example for a path, or traversal queries, like
the following:
A. Bounded traversal

MATCH (node1:Label1)-[:RELATIONSHIP*1..n]-(node2:Label2)
WHERE node1.propertyA = {Value} AND ...
RETURN node1.propertyA, node2.propertyB

B. Unbounded traversal

MATCH (node1:Label1)-[:RELATIONSHIP*]-(node2:Label2)
WHERE node1.propertyA = {Value} AND ...
RETURN node1.propertyA, node2.propertyB

In both cases, during the query execution, the paths starting with nodes with
particular names values and progressing alongside specified relationships are
traversed. The execution may perform additional checks of some properties of
encountered nodes if required by conditions following AND in the above queries.
When executing such queries over encrypted graph database in the original
CryptGraphDB, the encryption layer adjustment may be required for the properties
46 Nahla Naser Aburawi

of all nodes which may be encountered during traversal, if conditions checks are
present in the query. As before query execution, it is not generally possible to
identify nodes that will be traversed, the simple encryption adjustment will be
done everywhere (all nodes) where the properties required for checks are present.
Whereas, the adjustment in TAEA will not be everywhere, but just along the
query execution path. The scheme of traversal-aware encryption adjustment is
dynamic and the encryption adjustment happens not before the query execution,
but rather it gradually progresses alongside the execution. The proposed scheme
follows the simple principles defined in [3]:

• Encryption adjustments and traversal query execution are interlaced;

• The adjustments happen in-between of traversal steps;

• The adjustment is performed to enable one step of traversal using all infor-
mation accumulated to this step, in particular, the set of nodes traversed so
far.

Intuitively, it is plausible that following these principles we have a chance to do,


the more focused adjustment, not everywhere, but just along the query execution
path. We confirm this intuition in the following subsection by considering a case
study.

4.3.1 Case Study


For the case study, a particular graph database instance presented in Figure 4.1,
was considered. In this example scenario, there are nodes with the label Person,
and two properties of interest: name and age. Consider the following Cypher
query:

MATCH (node1: person)-[:Knows]->(node2)


WHERE node1.name = "Tom" AND node2.age = "22"
RETURN node2

The execution of this query is considered in three modes: (1) non-encrypted;


(2) encrypted using original CryptGraphDB adjustment; and (3) encrypted using
traversal-aware adjustment.
Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 47

Figure 4.1: Data layout schema at the server of the graph database.

Non-encrypted mode

The search criteria WHERE clause consists of two conditions. Initially, WHERE name
= "Tom", the execution of it leads to have three nodes to be traversed/ checked
{Smith,22},{Smith,35},{Lee,18} (these are reachable from {Tom,29} node
in one step via Knows relation). Then check the second part of the query which is
the age = 22, based on the previous result, the final result is {Smith,22} node.

Original CryptGraphDB adjustment

Initially, each property in the graph is dressed in the onion of encryption with the
RND as the outermost layers. At this point, the server can learn nothing about the
data content other than the number of nodes, properties, and relationships. To
execute the query over the encrypted store it is required to lower encryption of
name and age to level DET (as we need equality checks). In this case, an UPDATE
query is required:

UPDATE Label SET P2 Onion1 = DECRYPT_RND

Then RETURN P1,P2,P3,etc., WHERE P1="D1" and P2="G83", where D1 and


48 Nahla Naser Aburawi

G83 are an encryption of Tom and 22, respectively. The results are decrypted and
return them to the user.
More details on this follows.
In step (1), proxy sends to the DBMS: UPDATE Database,

SET Property1 = DECRYPT_RND(Property)

Because all the properties on the RND layer, as illustrated in Figure 4.2. DBMS
decrypts entire name and age properties to the DET layer:

DECRYPT.P,Eq,RND(X11)=D1
DECRYPT.P,Eq,RND(G71)=G68
DECRYPT.P,Eq,RND(X77)=D6

Proxy updates its internal state to log that entire name and age properties are
now at the DET layer in the DBMS, as can be seen in Figure 4.3;
In step (2), proxy encrypts Tom and 22, to their Equality onion, DET encryption
layer value of D1 and G83, respectively. Proxy generates a query and sends it to
DBMS:

MATCH (node1: person1)-[:Knows1]->(node2)


WHERE node1.name1 = "D1" AND node2.age1 = "G83"
RETURN node2

In step (3), finally, the proxy sends decrypted result Smith and 22 back to the
application.
We notice that with this encryption adjustment procedure after the query
execution the equality of {name,age} for each node becomes apparent. This
information on equality is not related to the result of the query and is not strictly
necessary for computing the result, as these nodes are not connected to Tom node.

Traversal-Aware Encryption Adjustment

Consider the example schema shown in Figure 4.1. Initially, each node and
property in the graph is encrypted using the RND scheme, as shown in Figure 4.2.
Return now to our running query example Q:

MATCH (node1: person)-[:Knows]->(node2)


WHERE node1.name = "Tom" AND node2.age = "22"
RETURN node2
Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 49

Figure 4.2: Encryption at the RND layer. Ciphertexts shown are not full-length.

Figure 4.3: Encryption at the DET layer. Ciphertexts shown are not full-length.
50 Nahla Naser Aburawi

In order to execute this query we need to adjust the encryption of name to the
DET layer, by issuing this query:

UPDATE Label SET P1 Onion1 = DECRYPT_RND

Where P1 corresponds to name. Then we execute the query Q1 performing the


initial search for nodes when the path required in the original query Q may start:

MATCH (node1: person1)-[:Knows1]->(node2)


WHERE node1.name1 = "D1"
RETURN node2 AS result

Here result variable is used as an alias for the result column name of Q1 , property
name1 corresponds to name, and D1 is the encryption of Tom. The outcome shows
that there are only three nodes as the outgoing of Tom node. Before processing the
second part of the query Q, that is WHERE y.age = "22", lowering encryption of
age property of nodes in the result is needed to level DET, as illustrated in Figure
4.4.

Figure 4.4: Adjustment using traversal-aware encryption adjustment. Ciphertexts


shown are not full-length.
Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 51

Then we execute Q2 , to implement the next step of Q:


MATCH (node1: person1)-[:Knows1]->(result)
WHERE result.age1 = "G83"
RETURN result

Finally, Proxy receives the encrypted result of D6 and G83, decrypts them and
sends {Smith,22} back to the user.
This solution enhances the previous method original encryption adjustment
procedure by not revealing all the equality of age property to the DET layer. (DET
layer indicates that there are two values corresponds to the same value). As
in this graph, there are two nodes have the same age value {Perry,38}, after
implementing the above query both kept at RND layer with G36 and G72.

4.3.2 Discussion

The advantage of the proposed approach is would not reveal the information that
comes from the DET layer more than necessary. Figure 4.5 shows a significant dif-
ference in the results when applying both the original CryptGraphDB encryption
adjustment strategy part (a) and the traversal-aware encryption adjustment strat-
egy part (b). The results showed that when the original CryptGraphDB strategy is
applied it reveals more information than necessary, such as the age of other nodes
that are not connected to Tom node. By contrast, when traversal-aware encryption
adjustment strategy is applied, it reveals only the information required to execute
the query. Note that with respect to the implementation of the TAEA approach
only adjust age property to DET layer of nodes connected to Tom while keeping
the rest age property at the RND layer. Our technique shows a clear advantage
by dynamically adjusting encryption layers as query execution progresses. In
this way, less information is revealed to the potential adversary overseeing the
execution of the query on the encrypted store.
The results are presented in Figure 4.5. Note that the adjusting values for
both approaches have been reproduced from Figure 4.1. From the figure, it can
clearly be seen that the TAEA approach outperformed the original CryptGraphDB
approach by a significant margin. The TAEA approach was good at adjusting only
the required property value to execute the query while at the same time using the
original CryptGraphDB approach was adjusting all values.
52 Nahla Naser Aburawi

(a) Original CryptGraphDB Adjustment

(b) Traversal-Aware Encryption Adjustment

Figure 4.5: Comparison of applying the original CryptGraphDB adjustment with


traversal-aware encryption adjustment. Ciphertexts shown are not full-length.
Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 53

Given that from Figure 4.5 part (a) it can be seen that all age values are
adjusted at DET layer; therefore the similarity of age values between nodes are
revealed. That indicates the correlation of age values between two nodes (such as
G49), based on the original value of (38). While we observe from Figure 4.5 part
(b) that the majority of age values are kept at the RND layer, not all are adjusted
to the DET layer. So, when we return to the previous values of (age= 38) we can
notice that they are kept at the RND layer of values G36 and G72.

4.4 Summary
In this chapter, the Traversal-Aware Encryption Adjustment for graph database
approach has been presented. This approach is a novel solution that supports
executing cypher queries over encrypted graph databases. The approach offers
the dual advantages that: (i) it reveals only the information required to execute
the query and (ii) adjusting encryption layers dynamically as query execution
progresses. The evaluation was conducted by considering the execution of Cypher
query in three modes (non-encrypted, original CryptGraphDB adjustment, and
traversal-aware encryption adjustment). Comparisons were presented between
the proposed TAEA approach and original CryptGraphDB adjustment. The pro-
posed approach produced the better results, we have shown that when querying
encrypted graph databases, dynamic traversal-aware encryption adjustment pro-
vides the best security protection of database content, as compared with the static
adjustment performed before query execution. More details on implementing
traversal-aware encryption adjustment for CryptGraphDB and empirical evalua-
tion of related trade-off between security and query execution efficiency will be
given in the next section. The next chapter presents the implementation of TAEA
approach.
Chapter 5

Towards Implementation of
TAEA

5.1 Introduction

In Chapters 3 and 4 the proposed CryptGraphDB and TAEA were presented, both
were directed at querying encrypted graph database and adjusting the encryption
layers alongside with the query execution. This chapter considers the implementa-
tion of Traversal-Aware Encryption Adjustment TAEA that introduced in Chapter 4.
More specifically this chapter presents different modes of Cypher query execution
by using TAEA principles as presented in the previous chapter.
Recall from Chapter 3 that the original CrypDB approach has been transferred
to the context of graph databases. The basic idea is the same as in relational
CrypDB: the execution of the graph query is achieved after translating the query
into an encrypted form, which later executed on a server without decrypting any
data. Each encrypted result then is sent back to the user where they are finally
decrypted. The proposed design is implemented for Neo4j graph DBMS and Cypher
as a query language. It has been confirmed that SQL-aware encryption schemes
can be smoothly reused as Cypher-aware encryption schemes, together with
keeping the benefit of the performance of traversal graph queries. The mechanism
reported the efficiency of query implementation for different types of queries on
encrypted and non-encrypted Neo4j graph databases.
As discussed previously in Chapter 4, a traversal-aware encryption adjustment

54
Chapter 5. Towards Implementation of TAEA 55

mechanism is considered for implementing a subset of Cypher queries (traversal


queries) and empirically evaluate its efficiency. The proposed mechanism reveals
only the information required to execute the query. By using this approach, we
can observe that not all property values in the graph are adjusted with respect
to the DET layer, but only required values are adjusted, while the rest are still in
the RND layer. The traditional way searches an encrypted database by decrypting
all the data to the DET layer and then find the required records. Apart from
representing a significant security risk, this traditional approach is resource
intensive particularly when considering a large number of records.
We empirically tested the proposed approach TAEA by using a variety size of
Neo4j dataset and the Cypher queries. More details on this will be given in Section
5.3. These datasets were used to evaluate the proposed TAEA approach detailed
later in this chapter. The implementation of the proposed TAEA approach is also
considered in the context of four different types of queries: (i) query with a single
relationship, (ii) query with multiple relationships, (iii) bounded traversal and (iv)
unbounded traversal.
The rest of this chapter is organized as follows: Section 5.2 provides a full
description of the idea behind the traversal-aware encryption adjustment approach
and case studies. Section 5.3 provides some experiments and performance analy-
sis. Section 5.4 then presents the evaluation of the proposed approach. Finally
Section 5.5 concludes the chapter with a brief summary.

5.2 Implementation of Traversal-Aware Encryption Ad-


justment

The work presented in this section commences with an analysis of the proposed
approach Traversal-Aware Encryption Adjustment, as discussed in Chapters 4.
This approach in the context of whether or not its operation could be improved
upon, and if so how this might be achieved. The idea of Traversal-Aware Encryp-
tion Adjustment (TAEA) is quite simple. During the query execution, the paths
starting with nodes with specific names values and progressing alongside specified
relationships are traversed. The execution may perform additional checks of some
properties of encountered nodes. Therefore, the adjustment will not be everywhere,
56 Nahla Naser Aburawi

but just along the query execution path. The scheme of traversal-aware encryption
adjustment is dynamic, and the encryption adjustment happens not before the
query execution, but rather it gradually progresses alongside the execution.
In this section, we describe an implementation of TAEA for a subset of the
Cypher queries. We start with simple examples first. In general, execution of
a query with TAEA over an encrypted graph data store requires execution of
interlaced partial queries and encryption adjustment updates. While it is possible
to compose these partial queries and updates using the WITH construct of Cypher,
and thereby to execute all the sequence automatically ("in one go"), for simplicity,
here we present the required sequence of separate queries and updates. The
composition is discussed later in subsection 5.3.2.

Query with a single relationship.

Consider a query Q consisting of one link and two search criteria:

MATCH (node1: label1)-[:Relationship]->(node2: label2)


WHERE node1.propertyA = {value1} AND node2.propertyB = {value2}
RETURN node2

The query Q, using the TAEA scheme, is processed, as follows:

1. Each value starts out encrypted with the most private encryption level where
data is encrypted using the RND scheme.

2. To check the equality for the first part of the WHERE clause,
node1.propertyA = value1, we need to lower the encryption of propertyA
to level DET. The proxy issues this query to the server UPDATE Label1 SET
propertyA = DECRYPT RND(propertyA), that use the DECRYPT RND UDFs,
where DECRYPT RND is a user defined function implementing decryption
which is discussed in sub-section 5.2.1.

3. Executing the query Q1 to allow the initial search node1.propertyA = en-


crypted value1 for nodes of Q to be executed. Here encrypted value1 is
the encryption of value1, when the path required in the main query Q start
as:
Chapter 5. Towards Implementation of TAEA 57

MATCH (node1:label1)-[:Relationship]->(node2:label2)
WHERE node1.propertyA = {encrypted value1}
RETURN node2 AS result

Where result is used as an alias for the result column name.

4. Lowering the encryption level of node2.propertyB for nodes that are reach-
able from the outgoing of Q1 to DET layer.

5. Processing the second part of the query Q:

MATCH (node1:label1)-[:Relationship]->(result:label2)
WHERE result.property2 = {encrypted value2}
RETURN result

6. Finally, proxy decrypts the results from the server and returns them to the
user.

Query with multiple relationships

In the case of having a query Q consisting of multiple statements and two search
criteria, as follows:

MATCH (node1:label)-[Rel1]->(node2:label)-[Rel2]->(node3:label)
WHERE node1.propertyA = {value1} AND node2.propertyB = {value2}
AND node3.propertyC = {value3}
RETURN node3

Processing a query Q of the above form under TEAE is as follows:

1. Each value in the graph is encrypted using the RND scheme.

2. According to the first part node1.propertyA=value1 of Q, we need to lower


the encryption of propertyA to level DET. By using DECRYPT RND UDF:
UPDATE Label SET propertyA = DECRYPT RND(propertyA).

3. As Q has multiple links, we start with the first part R1 which is (node1)-
[Rel1]->(node2), and execute the query Q1 to allow the initial search
node1.propertyA=encrypt(value1) for nodes of Q to be executed. When
the path required in the main query Q start as:
58 Nahla Naser Aburawi

MATCH (node1:label)-[Rel1]->(node2:label)-[Rel2]->(node3:label)
WHERE node1.propertyA = {encrypt(value1)}
RETURN node2 AS result

Here, result is used as an alias for result column name of Q1 , while en-
crypt(value1) is the encryption of value1

4. In order to implement the second part Q2 of Q, we need to lower the encryption


level of result.propertyB for nodes that have an incoming relationship with
the result variable of Q1 to the DET layer.

5. Processing the second part Q2 of the query Q:

MATCH (result)<-[Rel2]-(node3: label)


WHERE result.propertyB = {encrypted value2}
RETURN result

6. Finally, proxy decrypts the results and sends them back to the user.

General Case

We now consider general case of the simple traversal query of the form:

MATCH (node_1: label_1)-[:Relationship1]->...(node_i:label_i)-[:Relationship_i]->


...(node_k: label_k)
WHERE node_1.property_1 = {value1} AND ... node_i.property_i = {value_i}...
AND node_k.property_k = {value_k}
RETURN node_k

The following is the process for resolving a query Q of the above form using the
TEAE scheme:

1. Encrypt all values at RND layer.

2. Lowering the encryption of property_1 to level DET, by using decryptRND


UDF: SET property_1 = decryptRND(property_1).

3. Execute Q1 which is the first part of Q when the path required start as:
Chapter 5. Towards Implementation of TAEA 59

MATCH (node_1:label_1)-[:Relationship1]->(node_i:label_i)
WHERE node_1.property_1 = {encrypt(value1)}
RETURN node_i AS result

Here, result is used as an alias for the result column name of Q1 , while
encrypt(value1) is the encryption of value1.

4. In order to execute the second part Q2 of Q, we need to lower the encryption


level of result.property_i for nodes that have an incoming relationship
with result of Q1 to the DET layer. Then, execute Q2 as:

MATCH (result)<-[:Relationship_i]-(node_k:label_k)
WHERE result.property_i = {encrypt(value_i)}
RETURN result_1

5. For Q3 , lower the encryption of property_k for nodes that have an incoming
relationship with result_1 of Q2 to DET, ...

6. Finally, proxy decrypts the results and sends them back to the user.

5.2.1 Implementation

A prototype system for evaluating the performance of traversal-aware encryption


adjustment was implemented. To build this prototype, we utilized the AES (Ad-
vanced Encryption Standard) algorithm [2, 22]. For the RND layer, we used AES
in CBC mode with an initialization vector (IV) obtained as the hash of ID of the
node to which encrypted data belong to. For DET layer we used AES in ECB mode.
The security parameter of AES key encryption schemes is 128-bit.
We create a set of User-Defined Functions (UDFs) to be called directly from
Cypher queries [44]. The functions encryptDET, encryptRND, decryptRND, and
decryptDET implement encryption and decryption for the DET and RND layers,
respectively. UDFs are written in Java, they are packaged in a Jar-file, deployed
into the $NEO4J_HOME/plugins, and then can be called in the same way as any
other Cypher function. Enterprise edition was used to execute the queries.
60 Nahla Naser Aburawi

Figure 5.1: Example data layout schema at the server of the graph database.

5.2.2 Case Study

In this sub-section, we present several examples of the queries executed on


a particular graph data store under different encryption adjustment policies
using implemented UDFs for encryption and decryption. Suppose we have a
graph database of Person and two properties of interest: name and age, and the
relationships KNOWS; the scenario is illustrated in Figure 5.1. Consider the Cypher
query as follows:

MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "Tom" AND node2.age = "22"
RETURN node2

For this study, we considered the execution of the above query in three modes:
(1) non-encrypted; (2) encrypted with simple adjustment; and (3) with traversal-
aware encryption adjustment.
Chapter 5. Towards Implementation of TAEA 61

Non-encrypted

The search criteria WHERE clause has two parameters start with node1.name =
"Tom" when executed gives an output of three nodes {Smith,22},{Smith,35},
{Lee,18} to be traversed (these are reached from the {Tom,29} node in one
step via the KNOWS relation). Next, execute the second part of the query node2.age
= "22". Lastly, get the {Smith,22} node as the final result.

Encrypted with simple adjustment

Initially, each value in the graph is encrypted within the RND layer as the outermost
layer, as follows:

match (n)
SET n.name = encryptRND((encryptDET(n.name)),ID(n)),n.age=encryptRND
((encryptDET(n.age)),ID(n))

Where encryptRND and encryptDET are a user defined function implementing


the encryption as mentioned in sub-section 5.2.1. Resolution of the query requires
the lowering of the encryption of name and age to level DET (as we need to check
the equality). To do so, we need to update the data by using SET clauses:

match (n)
SET n.name = decryptRND(n.name,ID(n)) AND n.age=decryptRND(n.age,ID(n))

Next, the proxy generates a query and sends it to DBMS:

MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "UD82Pv8uGNi7" AND node2.age = "NMtqlsMp8Qaf"
RETURN node2

Where, "UD82Pv8uGNi7" and "NMtqlsMp8Qaf" are the encryption of "Tom" and


"22", respectively. Lastly, proxy receives the encrypted result
{j39IjDVyx/+,NMtqlsMp8Qaf}. Proxy sends the decrypted result {Smith,22} to
the application.
62 Nahla Naser Aburawi

Table 5.1: Plain text data, encryption at the RND layer and encryption at the DET
layer (Ciphertexts shown are not full-length).

name age name at RND age at RND name at DET age at DET
Tom 29 a4a895a87052 e6ba69bdf08c UD82Pv8uGNi7 33TPfYgeYDKb
Smith 22 9d60b415e6e7 686097aa7a7a j39IjDVyx/+ NMtqlsMp8Qaf
Tom 39 9b078f653478 21da9938c098 UD82Pv8uGNi7 Ss67Waxq2n+m
Lee 18 6cf77f7817b1 bcb86ac44437 RQpqwfEE8Kbm fxEYkxe7g+P27L
Smith 35 e7a86cbc36ff 83e6b8ab0edc j39IjDVyx/+ 5K6xJRUEJ2s+
Jones 32 141a99a21cf4 dfd2e8d1dfa2 ax+/5Q23fEl4 z0sfDuU2mIP/
Perry 47 ca06d68f7c6b c1051d53aae2 0nPCg1bAxh8R oSLl00rhMbeZ
Sara 38 a5cb936cd7ed 7ed31f9f083d Z+NQr9J7iSRi V01kYVwG13GU
Perry 38 c32f8d5d66a1 49521d4f028e 0nPCg1bAxh8R V01kYVwG13GU
Tom 40 5f2041a58089 56c26c25e4d UD82Pv8uGNi7 rXhFoilgAFoO

Encrypted using Traversal-Aware Encryption Adjustment

Initially, each value in the database is encrypted with the most secure RND layer,
as listed in Table 5.1. The advantage is that the server can learn nothing about
the data values.
Returning to resolving the example query Q:

MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "Tom" AND node2.age = "22"
RETURN node2

Subsequently, we need to remove the onion layer, as WHERE node1.name="Tom"


requires lowering the encryption of name to level DET, the proxy issues the follow-
ing query to the server UPDATE Label SET name = DECRYPT RND(name), that
use the DECRYPT RND UDFs, where DECRYPT RND is a user defined function
implementing decryption which is discussed in sub-section 5.2.1.
Then we execute the query Q1 that process the initial search for nodes where
the path required to resolve the original query Q may start:
MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "UD82Pv8uGNi7"
RETURN node2 AS output

Here the output variable is used as an alias for the result column name of
Q1 , and UD82Pv8uGNi7 is the encryption of Tom. As a result of the first stage
Chapter 5. Towards Implementation of TAEA 63

of the query resolution, there are three nodes as the outgoing of the n.name =
"UD82Pv8uGNi7" node.
Before processing the second part of the query Q, WHERE y.age = "22", we
need to lower the encryption level of the age property of nodes in the output
variable ONLY to the DET layer.
Then we execute the query Q2 , implementing the next step of Q execution:

MATCH (n: person)-[:KNOWS]->(output)


WHERE output.age = "NMtqlsMp8Qaf"
RETURN output

Lastly, Proxy receives the encrypted result of the above implementation


{j39IjDVyx/+q,NMtqlsMp8Qaf}, decrypts the result and sends the decrypted
result back {Smith,22} to the application. This solution improves on previous
methods by not decrypting all age properties at the DET layer but only decrypting
what the query resolution requires.

Bounded traversal

In order to investigate how the traversal-aware adjustment works with a specific


variable length path, we return to the database example that is presented in Figure
5.1, a variable length path of between 1 and 3 relationships from node1 to node2
is considered below. For example, if we assume a query Q:

MATCH (node1)-[*1..3]->(node2)
WHERE node1.name = ’Smith’ AND node2.age = ’38’
RETURN node2

At the start point, all values are held in the RND layer. We then move values to
the DET layer using the function UPDATE Label SET P = DECRYPT RND, where
P corresponds to name. Thereafter, we perform the query Q1 processing the initial
search for nodes when the path required in the original query Q starts as:

MATCH (node1)-[*1..3]->(node2)
WHERE node1.name = "j39IjDVyx/+"
RETURN node2 AS output
64 Nahla Naser Aburawi

Again the output variable is used as an alias for the result column name of Q1 ,
j39IjDVyx/+ corresponds to the encryption of Smith. Further execution of Q1
shows that there are six nodes as the outgoing of node1.name = "j39IjDVyx/+"
condition. Before processing the second part of the query Q, WHERE node2.age
= "38", we need to lower the encryption level of the age property of nodes in the
output variable to the DET layer.
Next we execute the query Q2 , implementing the next step of Q execution:

MATCH (node1)-[*1..3]->(output)
WHERE output.age = ’V01kYVwG13GU’
RETURN output

Where V01kYVwG13GU is the encryption of "38". Finally, Proxy receives the


encrypted result of the above implementation {Z+NQr9J7iSRi,V01kYVwG13GU},
and sends the decrypted result {Sara,38} back to the application.

Unbounded traversal

Now, we need to see the affect when the path length between nodes is unbounded;
when the variable path length of any number of relationships from node1 to node2
is unlimited. With reference to the example graph in Figure 5.1, assume the
following query Q:

MATCH (node1)-[*]->(node2)
WHERE node1.name = ’Smith’ AND node2.age = ’38’
RETURN node2

To resolve the query the DET layer for name is required. We process the query
Q1 to allow the initial search for nodes to be executed when the path required in
the original query Q starts as:

MATCH (node1)-[*]->(node2)
WHERE node1.name = "j39IjDVyx/+"
RETURN node2 AS output

At this stage the output variable is used as an alias for the result column of
Q1 , j39IjDVyx/+ corresponds to the encryption of Smith. Further execution of
Chapter 5. Towards Implementation of TAEA 65

Q1 indicates that there are seven nodes in output using the filter node1.name =
"j39IjDVyx/+".
As soon as the lowering of the encryption level of the age property of the nodes
in the output variable to the DET layer has been done, we can process the second
part of the query Q, which is WHERE node2.age = "38". Next we execute Q2 , to
implement the next step of Q:

MATCH (node1)-[*]->(output)
WHERE output.age = ’V01kYVwG13GU’
RETURN node2

Where V01kYVwG13GU is the encryption of "38". The Proxy receives the en-
crypted results from the previous implementation:
{Z+NQr9J7iSRi,V01kYVwG13GU} and {0nPCg1bAxh8R,V01kYVwG13GU}; and
decrypts the results {Sara,38} and {Perry,38} and returns them to the user.

5.3 Experiments and Performance Analysis

In this section we report on the results of experiments conducted to show the


validity of the approach and estimate the performance.

5.3.1 Datasets

In order to study the traversal-aware encryption adjustment concept varieties of


datasets have been used. A total of five Neo4j databases were constructed as
detailed in (Table 5.2). Each database consists of a number of nodes and edges,
and each node has a different number of properties, as well as relationships.
For the case study, we consider a particular graph database instance. In this
example scenario we have nodes with the label Person, and a group of properties
of interest: name, age, and gender.
Graph datasets were created to contain 10, 100, 500, 1,000, and 10,000 nodes
to aid in assessing the execution time of queries over non-encrypted data and
encrypted data. The generation of queries by the proxy used and of inputs for the
experiments conducted are not automated yet. The datasets here are generated
by using the clauses in the Cypher query language. In particular, the writing
66 Nahla Naser Aburawi

Table 5.2: A different graph database sizes.

Database # of Nodes # of Relationships # of Properties


DB1 10 9 30
DB2 100 82 300
DB3 500 410 1,500
DB4 1,000 820 3,000
DB5 10,000 8,426 30,000

Cypher clauses (writing clause to write the data to the database) to create the
required database according to a certain number of nodes and its relationships.
For instance, to create two nodes we use: CREATE (a: Person name: "Tom", age: "29",
gender: "Male") , (b: Person name: "Smith", age: "22", gender: "Male") and to create a
relationship between node (a) and node (b) we use (a)-[:KNOWS]->(b).
The system used for testing ran on Windows, version 10. It has an Intel Core 2
Duo CPU running at 3.40 GHz and has 16 GB of RAM. The benchmarking program
was the only application running when the results were created, but the machine
was connected to the Internet and standard system processes were running.

5.3.2 Queries

The process of testing the approach with respect to the datasets that, generally
speaking, with a variety of nodes and relationships number. By considering
the set of queries that performed in three modes: (1) the queries (Q1 − Q5 ) over
non-encrypted data; (2) the queries (Q01 − Q05 ) over encrypted data with simple
adjustment; and (3) the queries (Q001 − Q005 ) over encrypted data with traversal-aware
encryption adjustment. This particular set of queries was selected to test some
commonly used queries in graph databases, as mentioned below:

Queries over non-encrypted data

Q1 : Find all orphan nodes (no incoming edges and no outgoing edges).

MATCH (node)
WHERE not((node)-[ ]-())
RETURN node
Chapter 5. Towards Implementation of TAEA 67

Q2 : Basic Relationships Matching

MATCH (node1)-[:KNOWS]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’22’
RETURN node2.name, node2.age

Q3 : Adding Relationship Length

MATCH (node1)-[:KNOWS]->(node2)-[:KNOWS]->(node3)
WHERE node1.name = ’Jones’ AND node2.age = ’47’ AND node3.gender = ’Female’
RETURN node3.name, node3.age, node3.gender

Q4 : Variable length relationships

MATCH (node1)-[:KNOWS*1..3]->(node2)
WHERE node1.name = ’Jones’ AND node2.age = ’38’
RETURN node2.name, node2.age

Q5 : Infinite Length and Length Limit

MATCH (node1)-[:KNOWS*]->(node2)
WHERE node1.name = ’Jones’ AND node2.age = ’38’
RETURN node2.name, node2.age

Using Simple Encryption Adjustment

Q01 : Find all orphan nodes (no incoming edges and no outgoing edges).

MATCH (node)
WHERE not((node)-[ ]-())
RETURN node

Q02 : Basic Relationships Matching

MATCH (node1)-[:KNOWS]->(node2)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),
ID(node1))AND node2.age= decryptRND(encryptRND(encryptDET(’22’),
ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
68 Nahla Naser Aburawi

Q03 : Adding Relationship Length

MATCH (node1)-[:KNOWS]->(node2)-[:KNOWS]->(node3)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),
ID(node1)) AND node2.age = decryptRND(encryptRND(encryptDET(’47’),
ID(node2)),ID(node2)) AND node3.gender =
decryptRND(encryptRND(encryptDET(’Female’),ID(node3)) ,ID(node3))
RETURN decryptDET(node3.name), decryptDET(node3.age), decryptDET(node3.gender)

Q04 : Variable length relationships

MATCH (node1)-[:KNOWS*1..3]->(node2)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Jones’),
ID(node1)),ID(node1)) AND node2.age = decryptRND(encryptRND(encryptDET(’47’),
ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)

Q05 : Infinite Length and Length Limit

MATCH (node1)-[:KNOWS*]->(node2)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),
ID(node1)) AND node2.age = decryptRND(encryptRND(encryptDET(’38’),
ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)

Using Traversal-Aware encryption Adjustment

Q001 : Find all orphan nodes (no incoming edges and no outgoing edges).

MATCH (node)
WHERE not((node)-[ ]-())
RETURN node

Q002 : Basic Relationships Matching


MATCH (node1{name:decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),
ID(node1))})-[:KNOWS]->(node2)
SET node2.age= decryptRND(node2.age,ID(node2))
with node2
MATCH (node1:Person)-[:KNOWS]->(node2)
WHERE node2.age = decryptRND(encryptRND(encryptDET(’22’),ID(node2)),
Chapter 5. Towards Implementation of TAEA 69

ID(node2))
return decryptDET(node2.name),decryptDET(node2.age)

Q003 : Adding Relationship Length

MATCH (node1 {name:decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),


ID(node1))})-[:KNOWS]->(node2)-[:KNOWS]->(node3)
SET node2.age= decryptRND(node2.age,ID(node2))
with node2
MATCH (node1)-[:KNOWS]->(node2{age:decryptRND(encryptRND(encryptDET(’47’),
ID(node2)),ID(node2)) })-[:KNOWS]->(node3)
SET node3.gender= decryptRND(node3.gender,ID(node3))
with node3
MATCH (node1)-[:KNOWS]->(node2)-[:KNOWS]->(node3{gender:decryptRND
(encryptRND(encryptDET(’Female’),ID(node3)),ID(node3))})
return decryptDET(node3.name),decryptDET(decryptRND(node3.age,ID(node3))),
decryptDET(node3.gender)

Q004 : Variable length relationships

MATCH (node1{name:decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),
ID(node1))})-[:KNOWS*1..3]->(node2)
SET node2.age= decryptRND(node2.age,ID(node2))
with node2
MATCH (node1:Person)-[:KNOWS*1..3]->(node2)
WHERE node2.age = decryptRND(encryptRND(encryptDET(’47’),ID(node2)),ID(node2))
return decryptDET(node2.name),decryptDET(node2.age)

Q005 : Infinite Length and Length Limit

MATCH (node1{name:decryptRND(encryptRND(encryptDET(’Jones’),
ID(node1)),ID(node1))})-[:KNOWS*]->(node2)
SET node2.age= decryptRND(node2.age,ID(node2))
with node2
MATCH (node1:Person)-[:KNOWS*]->(node2)
WHERE node2.age = decryptRND(encryptRND(encryptDET(’38’),ID(node2)),ID(node2))
return decryptDET(node2.name),decryptDET(node2.age)

Given the above, that execution of each of Q002 − Q005 , requires the execution
of several queries/updates (unlike the single query execution of non-encrypted
versions). In order to make a fair comparison we composed query/update parts
70 Nahla Naser Aburawi

of Q00i by using WITH clauses. Having WITH enabled the query parts to be chained
together, passing the outputs from one to be used as starting points or criteria in
the next. As in these queries, the first condition is WHERE node1.name=’value’
we need to adjust the encryption level of name to the DET layer to allow equality
checking. Take for example, Q002 , as follows:

(1) MATCH (node1)-[:KNOWS]->(node2)


(2) WHERE node1.name=decryptRND(encryptRND(encryptDET(’Tom’),
ID(node1)),ID(node1))}
(3) SET node2.age= decryptRND(node2.age,ID(node2))
(4) with node2
(5) MATCH (node1:Person)-[:KNOWS]->(node2)
(6) WHERE node2.age = decryptRND(encryptRND(encryptDET(’22’),
ID(node2)),ID(node2))
(7) return decryptDET(node2.name), decryptDET(node2.age)

In step (1), we have a MATCH clause to determine the direction of the relationship
and its depth. In step (2), as all values are held in the RND layer, we need to decrypt
the name property within the DET layer, in order to allow the equality checking.
In step (3), we lower the encryption of the age property for nodes in the previous
step. In step (4), by using WITH we can pass the previous result so that it becomes
the starting criteria to the next part of the query. In step (5) we determine the
direction of the relationship. In step (6) we implement the second condition. In
step (7) we return the result in plain text format.

5.3.3 Results

Each query was run over all databases that presented in Table 5.2 and execution
times (in milliseconds) were retrieval as detailed in Table 5.3. The Table 5.3 is
revealing in several ways. First retrieval times of queries (Q1 − Q5 ) over non-
encrypted data using different graph database sizes. Second queries (Q01 − Q05 ) over
encrypted databases using simple encryption adjustment. Third queries (Q001 − Q005 )
over encrypted databases using traversal-aware encryption adjustment.
From the results presented in the table 5.3 it can clearly be seen that the
queries Q1 , Q01 , and Q001 to find orphan nodes, show that the query Q01 increasing
in all five databases, albeit to widely varying degrees. Those nodes were iterated
Chapter 5. Towards Implementation of TAEA 71

through, checking each node for the presence of edges. While query Q001 shows
reasonable retrieval time compared with Q1 , the increase in the execution time of
the query in DB5 is normal, as the database grows.

Table 5.3: Retrieval times of queries (Q1 − Q5 ) over non-encrypted data using
different graph database sizes, queries (Q01 − Q05 ) over encrypted databases using
simple encryption adjustment, and queries (Q001 − Q005 ) over encrypted databases
using traversal-aware encryption adjustment, (time in millisecond).
XXX
XXX Database
XXX DB1 DB2 DB3 DB4 DB5
Query XX
XX
X
Q1 1 1 4 4 4
Q01 21 73 194 369 670
Q001 4 7 23 39 253
Q2 2 4 2 2 2
Q02 22 89 222 415 1,191
Q002 8 25 55 98 793
Q3 2 4 2 2 2
Q03 22 87 225 416 1,225
Q003 9 25 51 96 742
Q4 4 3 2 2 2
Q04 22 92 214 434 1,195
Q004 12 23 54 93 743
Q5 3 4 4 4 4
Q05 22 87 224 424 1,175
Q005 10 20 53 90 893

Table 5.4 shows that the retrieval times of encryption adjustment of (RND
layer and DET layer) using simple adjustment and traversal-aware encryption
adjustment in (milliseconds). Note that the decryption time has been taken into
account in the running times for the query (Qi ), such as: (Q’i = Qi + decryption time).
We observe from Table 5.4 that the retrieval times of encryption adjustment of the
DET layer by TAEA are less than the retrieval times of encryption adjustment by
simple adjustment, that because the traversal-aware encryption adjustment does
72 Nahla Naser Aburawi

not require encryption layer adjustment everywhere in the database only adjust
the values in a path that require to execute the query. While when using simple
encryption adjustment do an adjustment over all items in the database.

Table 5.4: Retrieval times of encryption adjustment using simple adjustment and
traversal-aware encryption adjustment (time in millisecond).

Database RND layer DET using simple adjustment DET using TAEA
DB1 31 20 6
DB2 34 72 5
DB3 122 192 19
DB4 188 368 35
DB5 1,108 645 238

For the queries (Q2 −Q5 ) the execution time was clearly faster, this was expected
since the queries over non-encrypted databases do not require any encryption layer
adjustment. On the other hand, when comparing the first queries set (Q02 − Q05 )
with the second query set (Q002 − Q005 ), it is clear that in (Q002 − Q005 ) the execution times
are faster than in (Q02 − Q05 ),
From an overall perspective, the retrieval time for non-encrypted databases
is small and roughly similar for all datasets. In both encrypted cases: (i) using
simple adjustment, it can be noted that the retrieval time increased as taking
the decryption times into account. (ii) for the encrypted case using traversal-
aware encryption adjustment, the execution time has clearly grown with the size
of the database, but remains in a practically feasible range (under a second)
for the largest considered dataset. As noted during the query execution that
the proposed approach of Aburawi et al. (Chapter 4) performed better than the
approach of simple encryption adjustment as reported earlier in terms of adjusting
the encryption layers from the RND layer to the DET layer.
Chapter 5. Towards Implementation of TAEA 73

5.4 Evaluation

This section presents the results obtained with respect to the evaluation of the
proposed TAEA based approach. Five sets of datasets were conducted, using
a variety of Cypher queries as described above. The first set of experiments
was conducted when applying over the non-encrypted database, the second set
when applying over encrypted graph databases using traversal-aware encryption
adjustment, while the third set when applying over encrypted graph databases
using simple encryption adjustment. The objectives of the evaluation were as
follows:

1. To determine the trade-off between data security protection and data pro-
cessing efficiency in both approaches.

2. To determine if the proposed approach can be effectively applied to encrypted


data in the graph database.

3. To determine if the obtained results of execution time were significant or not.

4. To provide a comparison between the proposed approach TAEA and the


approach of simple encryption adjustment of Aburawi et al. [4], as identified
previously.

Our technique shows a clear advantage by dynamically adjusting encryption


layers as query execution progresses. In this way, less information is revealed to
any adversary watching the execution of the query on the encrypted store; while,
as demonstrated in this chapter, being reasonably efficient.

Security

The considered case studies have shown the trade-off between simple and traversal-
aware encryption adjustment policies. The simple policy requires less queries and
updates to be followed, on the other hand, the traversal-aware policy provides
better security, as it reveals less information to a possible server-side attacker.
With the latter policy, as observed above not all age property values were adjusted
to the DET layer, just those required to allow the query execution to progress.
74 Nahla Naser Aburawi

Performance

We report on experiments and performance of the proposed schema in Section 5.3.


To evaluate the proposed mechanism a collection of five databases was created, the
proposed approach was tested using five types of Cypher queries. The evaluation
was conducted by doing experiments that measure the execution time for a set of
queries directed at both non-encrypted data and encrypted data with different
dataset sizes. Our results are encouraging, but still, need to be validated using
larger data sets.

Implementability

Similar to the methods in [4, 3, 34] and unlike the methods in [15, 52] the proposed
mechanism does not need to change the inner structure of the DBMS because
it is implemented as a set of layers above the DBMS. In particular, the proposed
approach is compatible with a concurrency control for multi-user DBMS, but
related security aspects and performance evaluation in a multi-user environment
need to be addressed in future work.

5.5 Summary
In this chapter we reported on the implementation and evaluation of traversal-
aware encryption adjustment mechanism for querying encrypted data in graph
databases. The fundamental idea of the approach was discussed in details in
Chapter 4. To evaluate the proposed mechanism varieties datasets in sub-section
5.3.1 are used. This was applied over the non-encrypted database and over the
encrypted database. In this manner, five datasets were created. The proposed
approach was tested using five types of Cypher query. The method provides better
security protection against server-side attacks while keeping good implement
ability and reasonable performance of query execution. In the next chapter
implementation of the TAEA based on extended databases is presented.
Chapter 6

Implementation of the TAEA


based on extended databases

6.1 Introduction

In Chapter 5 the proposed traversal-aware encryption adjustment TAEA approach


was presented, it was directed at querying encrypted graph database and adjusting
the encryption layers alongside with the query execution. In this chapter, more
experiments over a larger Neo4j database is presented. In particular, this chapter
presents different modes of Cypher query execution, over a non-encrypted database
and encrypted database. Moreover, the query execution is concerned with the
simple adjustment encryption and traversal-aware encryption adjustment.

The remainder of this chapter is organized as follows: Section 6.2 presents


more experiments on the proposed approach by using two different databases
with different size and a variety of Cypher query. Section 6.3 then presents the
evaluation of the proposed approach by comparing the query execution time of
both simple encryption adjustment and traversal-aware encryption adjustment.
The chapter is concluded with a brief summary presented in Section 6.4.

75
76 Nahla Naser Aburawi

6.2 Experiments and Performance Analysis

This section presents an investigation of the performance of the proposed approach,


more queries execution, and bigger datasets were considered. In each case, a
variety of methods are available to allow their implementation. Those used with
respect to the work presented in this thesis are discussed in the following three
subsections, Subsections 6.2.1 to 6.2.3.

6.2.1 Datasets

The experiment used two Neo4j databases to find friends-of-friends connections to


a depth of different degrees. And also querying using different query pattern. The
first dataset included 50,000 people and the second dataset included 1,000,000
people, each with a different number of friends, and each node has a different
number of properties. The results of the experiment are listed in sub-section 6.2.3.
We refer to each database and its components as follows:

non-encrypted database

• NON-DB1 (50,000 nodes, 150,000 properties, and 40,395 relationships)


• NON-DB2 (1,000,000 nodes, 3,000,000 properties, and 807,900 rela-
tionships )

encrypted database using simple encryption adjustment

• ENC-S-DB1 (50,000 nodes, 150,000 properties, and 40,395 relation-


ships)
• ENC-S-DB2 (1,000,000 nodes, 3,000,000 properties, and 807,900 rela-
tionships )

encrypted database using traversal-aware encryption adjustment

• ENC-T-DB1 (50,000 nodes, 150,000 properties, and 40,395 relation-


ships)
• ENC-T-DB2 (1,000,000 nodes, 3,000,000 properties, and 807,900 rela-
tionships )
Chapter 6. Implementation of the TAEA based on extended databases 77

6.2.2 Queries

In order to investigate more about the proposed approach we considered the execu-
tion of the queries listed below in three styles: (1) non-encrypted; (2) encrypted with
simple adjustment; and (3) encrypted with traversal-aware encryption adjustment.
Firstly, we executed the queries (Q1 − Q3 ) to find friends-of-friends connections
to a depth of eight degrees over the three styles as stated before. The following
syntax uses to show nodes that have a variable number of relationships - [: TYPE
* min..max] ->. Here, min is by default 1 while max is infinity. Then execute the
query in two different ways using a single query pattern and using a multiple
query pattern. As was mentioned in the methods, the first step is to encrypt all
the values in the database to the RND layer, then adjust the layers based on the
approach that used.
First step: for both approaches. is to encrypt all the values in the database
to the RND layer by using this statement of the Cypher query.

match (n)
SET
n.name = encryptRND((encryptDET(n.name)),ID(n)),
n.age=encryptRND(encryptDET(n.age)),ID(n)),
n.gender=encryptRND((encryptDET(n.gender)),ID(n))

Second step: for simple encryption adjustment. is to decrypt all values


from RND layer to the DET layer as follows.

match (n)
SET
n.name = decryptRND(n.name, ID(n)),
n.age=decryptRND(n.age, ID(n)),
n.gender = decryptRND(n.gender, ID(n))

Second step: for TAEA. is to decrypt only values that related to the Cypher
query (in this case name) from RND layer to the DET layer as follows.

match (n)
SET n.name = decrypttRND (n.name,ID(n))
78 Nahla Naser Aburawi

Queries over non-encrypted data

Q1 : Variable length relationships (Depth 2).

MATCH (node1)-[:FRIEND_OF*1..2]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’35’
RETURN node2.name, node2.age

Q2 : Variable length relationships (Depth 4).

MATCH (node1)-[:FRIEND_OF*1..4]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’26’
RETURN node2.name, node2.age

Q3 : Variable length relationships (Depth 6).

MATCH (node1)-[:FRIEND_OF*1..6]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’40’
RETURN node2.name, node2.age

Q4 : Querying using a single query pattern.

MATCH (node1)-[:FRIEND_OF]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’22’
RETURN node2.name, node2.age

Q5 : Querying using a multiple query pattern.

MATCH (node1)-[:FRIEND_OF]->(node2)-[:FRIEND_OF]->(node3)
WHERE node1.name = ’Jones’ AND node2.age = ’47’ AND node3.gender = ’Female’
RETURN node3.name, node3.age, node3.gender

Using Simple Encryption Adjustment

Q01 : Variable length relationships (Depth 2).

MATCH (node1)-[:FRIEND_OF*1..2]->(node2)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),ID(node1))
Chapter 6. Implementation of the TAEA based on extended databases 79

AND
node2.age = decryptRND(encryptRND(encryptDET(’35’),ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)

Q02 : Variable length relationships (Depth 4).

MATCH (node1)-[:FRIEND_OF*1..4]->(node2)
WHERE
node1.name = decrptRND(encryptRND(EncryptDET(’Tom’),ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’26’),ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)

Q03 : Variable length relationships (Depth 6).

MATCH (node1)-[:FRIEND_OF*1..6]->(node2)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Tom’), ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’40’), ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)

Q04 : Querying using a single query pattern.

MATCH (node1)-[:FRIEND_OF]->(node2)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’22’),ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)

Q05 : Querying using a multiple query pattern.

MATCH (node1)-[:FRIEND_OF]->(node2)-[:FRIEND_OF]->(node3)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’47’),ID(node2)),ID(node2))
AND
80 Nahla Naser Aburawi

node3.gender = decryptRND(encryptRND(encryptDET(’Female’),ID(node3)),ID(node3))
RETURN decryptDET(node3.name), decryptDET(node3.age), decryptDET(node3.gender)

Using Traversal-Aware encryption Adjustment

Q001 : Variable length relationships (Depth 2).

MATCH (x {name:decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})
-[:FRIEND_OF*1..2]->(y)
SET y.age= decryptRND(y.age, ID(y))
with y
MATCH (x:Person)-[:FRIEND_OF*1..2]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’35’),ID(y)),ID(y))
return decryptDET(y.name), decryptDET(y.age)

Q002 : Variable length relationships (Depth 4).

MATCH (x {name:decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})
-[:FRIEND_OF*1..4]->(y)
SET y.age= decryptRND(y.age, ID(y))
with y
MATCH (x:Person)-[:FRIEND_OF*1..4]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’26’),ID(y)),ID(y))
return decryptDET(y.name), decryptDET(y.age)

Q003 : Variable length relationships (Depth 6).

MATCH (x {name:decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})
-[:FRIEND_OF*1..6]->(y)
SET y.age= decryptRND(y.age, ID(y))
with y
MATCH (x:Person)-[:FRIEND_OF*1..6]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’40’),ID(y)),ID(y))
return decryptDET(y.name), decryptDET(y.age)

Q004 : Querying using a single query pattern.

MATCH (x {name :decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})


-[:FRIEND_OF]->(y)
SET y.age= decryptRND(y.age, ID(y))
Chapter 6. Implementation of the TAEA based on extended databases 81

with y
MATCH (x:Person)-[:FRIEND_OF]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’22’),ID(y)),ID(y))
return decryptDET(y.name),decryptDET(y.age)

Q005 : Querying using a multiple query pattern.

MATCH (n1 {name:decryptRND(encryptRND(encryptDET(’Jones’),ID(n1)),ID(n1))})


-[:FRIEND_OF]->(n2)-[:KNOWS]->(n3)
SET n2.age= decryptRND(n2.age, ID(n2))
with n2
MATCH (n1)-[:FRIEND_OF]->(n2 {age:decryptRND(encryptRND(encryptDET(’47’),
ID(n2)),ID(n2))})-[:FRIEND_OF]->(n3)
SET n3.gender= decryptRND(n3.gender, ID(n3))
with n3
MATCH (n1)-[:FRIEND_OF]->(n2)-[:FRIEND_OF]->(n3 {gender:decryptRND
(encryptRND(encryptDET(’Female’),ID(n3)),ID(n3))})
return decryptDET(n3.name), decryptDET(decryptRND(n3.age,ID(n3))),
decryptDET(n3.gender)

6.2.3 Results

In this sub-section, the results obtained from the execution of the above queries
with different sizes of databases is presented. Execution times were collected after
executing the queries over two datasets and noted in milliseconds, as presented in
Table 6.1. The table is interesting in several ways. First, the (upper part) provides
the execution times for non-encrypted data. Second, the (middle part) also gives
the execution times for encrypted data using simple encryption adjustment. Third,
the (lower part) presents the executing times for encrypted data using traversal-
aware encryption adjustment.
Table 6.1 highlights that the larger the data set, the longer it takes to find
matches. It can be easily observed from the values retrieved (upper part) that the
retrieval times of non-encrypted databases is less than the encrypted databases,
this was expected as the queries over non-encrypted databases do not need any
encryption layer adjustment.
To begin, execution times of queries Q01 - Q05 in ENC-S-DB1 (middle part) and
queries Q001 - Q005 in ENC-T-DB1 (lower part) showed similar retrieval time, with both
82 Nahla Naser Aburawi

Table 6.1: Retrieval times of queries (Q1 − Q5 ) over non-encrypted databases


(upper part), queries (Q01 − Q05 ) over encrypted databases using simple encryption
adjustment (middle part), and queries (Q001 − Q005 ) over encrypted databases using
traversal-aware encryption adjustment (lower part), (time in millisecond).

Non-encrypted database
Query Q1 Q2 Q3 Q4 Q5
NON-DB1 193 95 150 41 31
NON-DB2 845 1075 1346 418 349
Encrypted database using simple adjustment
Query Q’1 Q’2 Q’3 Q’4 Q’5
ENC-S-DB1 3682 3864 4306 3084 3282
ENC-S-DB2 9245 9315 9375 8741 8631
Encrypted database using TAEA
Query Q"1 Q"2 Q"3 Q"4 Q"5
ENC-T-DB1 2984 3010 3058 2804 3077
ENC-T-DB2 9317 9378 9407 6978 6915

gradually increasing from Q1 to Q3 . However, the execution times of queries Q01


- Q05 in ENC-S-DB2 (middle part) and queries Q001 - Q005 in ENC-T-DB2 (lower part)
remained significantly higher than that of the other databases.
From an overall perspective, the results of query execution with respect to the
different approaches and database sizes as was stated above are compared in
Figure 6.1. As can be seen in this Figure, TAEA outperforms simple encryption
adjustment, e.g. all queries in the database ENC-T-DB1 are less than the queries
in ENC-S-DB1.

6.3 Evaluation

In this section the results obtained from the previous sub-section of the pro-
posed approach is presented. The approach provides better security protection
against server-side attacks while keeping good implementation and reasonable
performance of query execution.
TAEA approach shows a clear advantage over simple encryption adjustment.
In the case of simple encryption adjustment, it does not require many updates
Chapter 6. Implementation of the TAEA based on extended databases 83

Figure 6.1: Comparison between the proposed CryptGraphDB using simple ad-
justment and TAEA on different databases sizes.

queries. However, in the TAEA better security is provided, as was noted above
not all age property nor gender property values were adjusted to the DET layer,
just the required values to allow the query execution to progress. To evaluate the
proposed approach a set of two databases was created, the proposed approach
was tested using five types of Cypher queries. the evaluation was conducted by
measuring the execution time by some experiments. In both ENC-S-DB2 and
ENC-T-DB2, each query shows a reasonable execution time compared to the size
of the database
The results presented in the tables also indicate that TAEA not only provides
better security, but also are in some cases more efficient for query execution over
large databases. The reasons for that we have discussed in Chapter 5. Indeed,
the execution time of all queries using TAEA over ENC-T-DB1 is smaller than the
execution time of the same queries using a simple adjustment mechanism over
the same database. Furthermore, the execution time of queries Q004 and Q005 using
TAEA over ENC-T-DB2 is shorter than the execution time of the same queries Q04
and Q05 using a simple adjustment mechanism over ENC-S-DB2.
84 Nahla Naser Aburawi

6.4 Summary
In this chapter, the proposed CryptGraphDB approach in both cases simple adjust-
ment and TAEA has been presented. To evaluate the approach two databases in
different sizes are used. The retrieval times were collected based on implementing
a set of Cypher queries with different modes as reported previously. The reported
evaluation showed that not only does TAEA provide better security, but it is also
be more efficient over large databases for query execution. We observe from Table
6.1 that TAEA outperforms the simple encryption adjustment in most cases. The
next chapter concludes this thesis with a summary of the contributions and main
findings and some suggestions for future research directions.
Chapter 7

Conclusions and Future Works

This chapter provides a summary of the work presented in this thesis, a review of
the primary findings related to the research question a review of the main findings
that related to the research question and research issues identified in Chapter 1
and some suggestions for the future work directions.

7.1 Summary

In this thesis, a flexible mechanism for the execution of queries over encrypted
graph databases called CryptGraphDB has been proposed. It is to utilize multi-
layered encryption and encryption adjustment to provide a reasonable trade-off be-
tween data security protection and data processing efficiency. The CryptGraphDB
approach considered in the context of both simple encryption adjustment that di-
rected at the static adjustment context and traversal-aware encryption adjustment
TAEA that directed at adjusting the encryption layers in the dynamic context.
The thesis commenced, in Chapter 2 with a literature review of previous work
relevant to the work done in the thesis. The following chapters covered the proposed
CryptGraphDB approach. These chapters were all structured in a similar way
starting with a description of the proposed approach and ending with a review of
the evaluation conducted for each approach. The proposed approach was tested
using a set of Cypher queries and different sizes of graph databases
In more detail Chapter 3 presented the CryptGraphDB approach designed
to query the encrypted graph database. The fundamental idea was to use the

85
86 Nahla Naser Aburawi

standard Cypher query over encrypted graph database. CryptGraphDB design


was described and experiments were reported with an initial prototype scheme
introduced for Neo4j graph DBMS with Embedded Java API.
Chapter 4 considered the proposed TAEA approach to adjusting the encryption
layers dynamically. The TAEA mechanism adopts a data processing to query
encrypted data in graph databases. The approach offers two advantages: (1) it
only reveals the information needed to perform the query; and (2) it dynamically
adjusts the encryption layers as the query execution progresses
Chapter 5 reported on the implementation and evaluation of traversal aware
encryption adjustment mechanism for querying encrypted data in graph databases.
Five datasets are used to evaluate the proposed mechanism. This has been applied
to both non-encrypted database and encrypted database. Five types of Cypher
queries were used to test the approach.
Chapter 6 presented implementation of the TAEA based on extended databases.
Two datasets were used, the first one contained 50,000 nodes and the second one
contained 1,000,000 nodes. A variety of Cypher queries was conducted as well,
ranging from checking the different depth of the database to checking single and
multiple query pattern.

7.2 Main Findings and Contributions

The main findings from the work presented in this thesis are provided in this
chapter. Consider the original research question from Chapter 1 was "How to
query encrypted graph database?" Further, more detailed question: How do we
encrypt graph database and apply encrypted queries on it using CryptGraphDB
approach?". In Chapter 1 a number of additional questions were also assumed.
Before returning the primary overriding research question, each of these will be
regarded as follows:

1. What are the features of CryptDB that can best be adopted to be implemented
on graph databases? The answer to this is that we re-used SQL-aware
encryption strategy to be implemented on graph databases as Cypher-aware
encryption strategy. An current encryption scheme was used to implement a
encryption that enables the processing of Cypher queries. Also, adjustable
Chapter 7. Conclusions and Future Works 87

query-based encryption when CryptGraphDB needs to be able to adjust


the layer of encryption of each data item based on user queries. This was
considered in detail in Chapter 3.

2. What is the effect on adjusting the encryption layers dynamically, by using


traversal-aware encryption adjustment scheme? The traversal-aware encryp-
tion adjustment scheme is dynamic and the adjustment of the encryption
layer does not occur before the execution of the query, but progresses along-
side the execution. To ensure a reasonable trade-off between data security
protection and data handling effectiveness, TAEA uses multi-layered en-
cryption and encryption adjustment to control the release of data element
information needed for query execution to some extent. In Chapter 4, this
was discussed in detail.

3. Given a set of the Cypher queries how should these queries implemented over
encrypted graph databases by using the proposed approach? A prototype
scheme has been introduced to evaluate the efficiency of traversal-aware en-
cryption adjustment. To build this prototype, a set of User-Defined Functions
(UDFs) were created and called in the same manner as any other Cypher
function. To allow the multi-layered encryption and encryption adjustment.
In Chapters 5 and 6, this was reported in detail.

Returning to the overriding research question in Chapter 1 "How to query


encrypted graph database?" Further, more detailed question: How do we encrypt
graph database and apply encrypted queries on it using CryptGraphDB approach?"
It has been shown that the efficiency using CryptGraphDB has shown promising
outcomes that justify further system growth. CryptGraphDB provides better
protection against server-side attacks while keeping good ability to implement
query execution and reasonable performance.
Since, the first paper that we published at (ICISSP 2018), it has been cited by
[51], which is good indication to this work.
Finally, in order to complete the primary findings reported in this chapter, the
research contributions presented in this thesis (initially listed in Chapter 1) are
again provided for completeness here:
88 Nahla Naser Aburawi

• General principles, design and prototype implementation of the CryptGraphDB


system have been presented.

• Two encryption adjustment schemes, simple and traversal-aware, have been


introduced. It was shown that traversal-aware scheme provides with better
security protection.

• In general, for small and medium size of databases simple encryption adjust-
ment is more efficient than traversal-aware scheme in some cases. For the
large databases, traversal-aware adjustment can be more efficient, depending
on the data and the query.

7.3 Future work

The research described in this thesis has indicated a number of potential research
directions for the future. These research directions are briefly introduced in the
concluding section of this thesis as follows.

• Alternative cryptographic algorithms to implement the encryption layers.

• Including more Cypher queries to evaluate the efficiency of the prototype.

• Considering the implementation and researching of the other onion types as


well such as Onion order and Onion search, to build the industry CryptDB
graph-equivalent for Neo4j. This also leads to applied database publications.

• More investigation of the trade-offs between efficiency and security on differ-


ent datasets.

• Incorporate concurrency to the implementation. Particularly, research should


move from query operations (Reads) to update operations (Writes) such as
INSERT, DELETE, and UPDATE. Which complicates the application since
different encryption keys are needed when writing on the same data field.

• Automate query generation and input generation to allow parameterized


density (edge-to-node ratio) experimentation with different kinds of queries
Chapter 7. Conclusions and Future Works 89

against input database-graphs. This will more easily extend the TAEA-
approach to bigger datasets and may result in a major publication in applied
databases (e.g., SIGMOD).

• Making the implementation beyond Neo4j portable to other NoSQL technolo-


gies. This will lead to a more concept-oriented theoretical research direction
in the area of secure databases.
Bibliography

[1] Cypher Query Language Reference. Apache License 2.0, 2018.

[2] Ako Muhammad Abdullah. Advanced encryption standard (aes) algorithm to


encrypt and decrypt data. In: Cryptography and Network Security, 1(1):1–12,
2017.

[3] Nahla Aburawi, Frans Coenen, and Alexei Lisitsa. Traversal-aware encryp-
tion adjustment for graph databases. Proceedings of 7th International Confer-
ence on Data Science, Technology and Applications (DATA 2018), 1(1):381–387,
2018.

[4] Nahla Aburawi, Alexei Lisitsa, and Frans Coenen. Querying encrypted graph
databases. In 4th International Conference on Information Systems Security
and Privacy, pages 447–451. Proceedings, 2018.

[5] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu.
Order preserving encryption for numeric data. In Proceeding SIGMOD ’04
Proceedings of the 2004 ACM SIGMOD international conference on Manage-
ment of data, pages 563–574. Proceedings, 2004.

[6] Jaafer Al-Saraireh. An efficient approach for query processing over encrypted
database. In Journal of Computer Science, pages 548–557. Journal, 2017.

[7] Shaukat Ali, Azhar Rauf, and Saeed Mahfooz. Update query over encrypted
data. In International Conference on Computer Networks and Information Tech-
nology, pages 279–282. Proceedings, 2011.

[8] Maryam Almarwani, Boris Konev, and Alexei Lisitsa. Flexible access control
and confidentiality over encrypted data for document-based database. In Pro-

91
92 Nahla Naser Aburawi

ceedings of the 5th International Conference on Information Systems Security


and Privacy (ICISSP), pages 606–614. Proceedings, 2019.

[9] Arvind Arasu, Ken Eguro, Raghav Kaushik, and Ravishankar Ramamurthy.
Querying encrypted data. Proceedings of the 2014 ACM SIGMOD International
Conference on Management of Data, 3(1):1259–1261, 2014.

[10] Mihir Bellare, Alexandra Boldyreva, and Adam O’Neill. Deterministic and
efficiently searchable encryption. In Proceedings of the 27th International
Cryptology Conference (CRYPTO), pages 535–552. Proceedings, 2007.

[11] Dobre Blazhevski, Adrijan Bozhinovski, Biljana Stojchevska, and Veno Pa-
chovski. Modes of operation of the aes algorithm. In The 10th Conference for
Informatics and Information Technology, pages 212–216. Proceedings, 2013.

[12] Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam O’Neill. Order
preserving symmetric encryption. In Proceedings of the 28th Annual Interna-
tional Conference on the Theory and Applications of Cryptographic Techniques
(Eurocrypt), pages 224–241. Proceedings, 2009.

[13] Alexandra Boldyreva, Serge Fehr, and Adam O’Neill. On notions of security
for deterministic encryption, and efficient constructions without random ora-
cles. In Proceedings of the 28th International Cryptology Conference (CRYPTO),
pages 335–359. Proceedings, 2008.

[14] Rik Van Bruggen. Learning Neo4j. Packt Publishing Ltd., Birmingham, UK.,
1st edition, 2014.

[15] Melissa Chase and Seny Kamara. Structured encryption and controlled
disclosure. In Proceedings of Advances in Cryptology - ASIACRYPT, pages
577–594. Proceedings, 2010.

[16] Michael Cooney. Ibm touts encryption innovation; new technology performs
calculations on encrypted data without decrypting it. In Computer World,
2009.

[17] Carlo Curino, Raluca A. Popa Evan P. C. Jones, Nirmesh Malviya, Eugene Wu,
Sam Madden, Hari Balakrishnan, and Nickolai Zeldovich. Relational cloud:
Bibliography 93

A database-as-a-service for the cloud. In Proceedings of the 5th Biennial


Conference on Innovative Data Systems Research (CIDR 2011)., pages 235–
240, 2011.

[18] Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. Searchable
symmetric encryption: improved definitions and efficient constructions. In
Proceedings of the 13th ACM conference on Computer and communications
security, pages 79–88. Proceedings, 2006.

[19] Cypher. Introduction to cypher. https://neo4j.com/developer/


cypher-query-language/, September 2015. Accessed on 2018-02-08.

[20] Anand Desai. New paradigms for constructing symmetric encryption schemes
secure against chosen-ciphertext attack. In Proceedings of the 20th Annual
International Conference on Advances in Cryptology, pages 394–412. Proceed-
ings, 2000.

[21] Swathi Edem, Gupta Vivek, and G. Sandhya Rani. Role of hash function in
cryptography. National Conference on Computer Security, Image Processing,
Graphics, Mobility and Analytics), 1(1), 2016.

[22] Niels Ferguson. Aes-cbc + elephant diffuser a disk encryption algorithm for
windows vista. IOSR Journal of Engineering (IOSRJEN), 3(8), 2006.

[23] Klint Finley. Five graph databases to consider. https://readwrite.com/2011/


04/20/5-graph-databases-to-consider/, April 2011. Accessed on 2018-04-
11.

[24] Tingjian Ge and Stan Zdonik. Answering aggregation queries in a secure


system model. In Proceedings of the 33rd International Conference on Very
Large DataBases (VLDB), pages 519–530. Proceedings, 2007.

[25] Oded Goldreich. Foundations of Cryptography. Volume I Basic Tools, Cam-


bridge University Press, 2001.

[26] Jose Guia, Valeria G. Soares, and Jorge Bernardino. Graph databases: Neo4j
analysis. In Proceedings of the 19th International Conference on Enterprise
Information Systems (ICEIS 2017), volume 1, pages 351–356. Proceedings,
2017.
94 Nahla Naser Aburawi

[27] Florian Kerschbaum, Martin Härterich, Patrick Grofig, Mathias Kohler, An-
dreas Schaad, Axel Schröpfer, and Walter Tighzert. Optimal re-encryption
strategy for joins in encrypted databases. In Data and Applications Security
and Privacy, pages 195–210. Proceedings, 2013.

[28] Lianzhong Liu and Jingfen Gai. A method of query over encrypted data in
database. In International Conference on Computer Engineering and Technol-
ogy, pages 23–27. Proceedings, 2009.

[29] Daniele Micciancio and Panagiotis Voulgaris. A deterministic single exponen-


tial time algorithm for most lattice problems based on voronoi cell compu-
tations. In Proceedings of the 42nd ACM Symposium on Theory of Computing
(STOC), pages 1364–1391. Proceedings, 2010.

[30] Harshini Selva Mohan and Abbadi Raji Reddy. Revised aes and its modes of
operation. In International Journal of Information Technology and Knowledge
Management, pages 31–36. Journal, 2012.

[31] neo4j. Introducing the neo4j graph platform. https://neo4j.com/, March


2015. Accessed on 2018-01-13.

[32] Pascal Paillier. Public-key cryptosystems based on composite degree residu-


osity classes. In Proceedings of the 18th Annual International Conference on
the Theory and Applications of Cryptographic Techniques (Eurocrypt)), pages
223–238. Springer-Verlag, 1999.

[33] Onofrio Panzarino. Learning Cypher. Packt Publishing Ltd., Birmingham,


UK., 1st edition, 2014.

[34] Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari
Balakrishnan. Cryptdb: Protecting confidentiality with encrypted query
processing. 23rd ACM Symposium on Operating Systems Principles, 1(1):85–
100, 2011.

[35] Raluca Ada Popa and Nickolai Zeldovich. Cryptographic treatment of cryptdb’s
adjustable join. Technical Report MIT-CSAIL-TR-2012-006, Computer Science
and Artificial Intelligence Laboratory, 6(1):289–300, 2012.
Bibliography 95

[36] Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan. Cryptdb: A
practical encrypted relational dbms. In Computer Science and Artificial Intelli-
gence Laboratory, Cambridge, MA. Technical Report MIT-CSAIL-TR-2011-005,
2011.

[37] K. Srinivasa Reddy and Ramachandram Sirandas. A new randomized order


preserving encryption scheme. In International Journal of Computer Applica-
tions, pages 41–46. Journal, 2014.

[38] Catherine M. Ricardo. Databases Illuminated. Jones and Bartlett, United


States of America, 1st edition, 2004.

[39] Ian Robinson, Jim Webber, and Emil Eifrem. Graph Databases. O’Reilly
Media, Inc., United States of America, 1st edition, 2013.

[40] Eyad Saleh, Ahmad Alsa’deh, Ahmad Kayed, and Christoph Meinel. Processing
over encrypted data: Between theory and practice. In SIGMOD, pages 5–16.
Newsletter, 2016.

[41] Muhammad Sarfraz, Mohamed Nabeel, Jianneng Cao, and Elisa Bertino.
Dbmask: Fine-grained access control on encrypted relational databases.
In 5th ACM Conference on Data and Application Security and Privacy, pages
1–11. Proceedings, 2015.

[42] Qahtan M. Shallal and Mohammad U. Bokhari. A review on symmetric key


encryption techniques in cryptography. International Journal of Computer
Applications, 147(10):43–48, 2016.

[43] Dawn Xiaodong Song, David Wagner, and Adrian Perrig. Practical techniques
for searches on encrypted data. In IEEE Symposium of Security and Privacy,
pages 44–. Proceedings, 2000.

[44] Neo4j Staff. User defined procedures and functions. https://neo4j.com/


developer/procedures-functions, June 2017. Accessed on 2019-04-18.

[45] Neo4j Staff. Using neo4j embedded in java applications. https://neo4j.com/


docs/java-reference/current/tutorials-java-embedded/, July 2017. Accessed
on 2019-03-08.
96 Nahla Naser Aburawi

[46] Open Cypher team. Open cypher. https://www.opencypher.org/, April 2018.


Accessed 13 June 2019.

[47] Rc Tripathi and Shubham Agrawal. Comparative study of symmetric and


asymmetric cryptography techniques. International Journal of Advance Foun-
dation and Research in Computer (IJAFRC), 1(1):68–76, 2014.

[48] Stephen Tu, M. Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich.
Processing analytical queries over encrypted data. Proceedings of the 39th In-
ternational Conference on Very Large Data Bases (VLDB), 6(1):289–300, 2013.

[49] Chad Vicknair, Michael Macias, zhendong Zhao, Xiaofei Nan, Yixin Chen,
and Dawn Wilkins. A comparison of a graph database and a relational
databas. In Proceedings of the 48th Annual Southeast Regional Conference,
pages 42:1–42:6. Proceedings, 2010.

[50] Aleksa Vukotic, Nicki Watt, Tareq Abedrabbo, Dominic Fox, and Jonas Partner.
Neo4j in Action. Manning Publications Co., Shelter Island, NY, USA, 1st edition,
2015.

[51] Harsha Vyawahare, Pravin P. Karde, and Vilas M. Thakare. A hybrid database
approach using graph and relational database. In Proceedings of the 3rd IEEE
International Conference on Research in Intelligent and Computing in Engineer-
ing, (RICE), pages 01–04. Proceedings, 2018.

[52] Pengtao Xie and Eric Xing. Cryptgraph: Privacy preserving graph analytics
on encrypted graph. In arXiv:1409.5021. Journal, 2015.

You might also like