CryptDB Mechanism On Graph Databases Nahla Naser Aburawi University of Liverpool
CryptDB Mechanism On Graph Databases Nahla Naser Aburawi University of Liverpool
March 2020
Dedication
i
Glossary
iii
List of Abbreviations
v
Nahla Naser Aburawi CryptDB Mechanism on Graph Databases
vi
Acknowledgements
All praise is due to Allah, the Most Merciful and the Most Gracious, who gave me
the strength and power to overcome all the challenges that came to me during my
Ph.D. study.
I would like to express my sincere gratitude to my primary supervisor Dr. Alexei
Lisitsa for the continuous support of my Ph.D. study and related research, for
his patience, motivation, constant patience and encouragement throughout this
journey. His guidance helped me in all the time of research and writing of this
thesis. He is always making time for discussions that have made the completion
of my Ph.D. possible. I could not have imagined having a better supervisor and
mentor for my Ph.D. study.
I would also like to express my gratitude to my second supervisor Prof. Frans
Coenen, for his suggestions, assistance, and valuable comments. I would like to
extend my profound gratitude to both advisors: Pro. Sven Schewe and Dr. Andre
Hernich for providing me with constructive feedback and suggestions at various
times. All staff members and colleagues at the computer science department at
the University of Liverpool, all were helpful whenever necessary. My gratitude also
extends to the Libyan government/Ministry of Higher Education for awarding me
this opportunity so that my dream of getting a Ph.D. can come true.
My sincere thanks to my husband and my children for their prayers, support,
trust, and encouragement and I am profoundly sorry about what you experienced
while I was busy. My deepest thanks also go to my mother, brothers, and sister for
supporting me spiritually throughout writing this thesis and my life in general.
viii
Nahla Naser Aburawi CryptDB Mechanism on Graph Databases
Abstract
The work presented in this thesis is concerned with the database security as-
pects. In particular, we address the problem of querying encrypted data in graph
databases. The thesis considers the most popular databases security methods
from the literature: (i) multi-layered encryption and (ii) encryption adjustment.
The encryption is one of the effective ways to protect sensitive data in a database
from various attacks. Querying encrypted data includes two challenges. Either the
data should be decrypted before the querying, leaving it vulnerable to server-side
attacks, or one has to apply computationally expensive methods for querying en-
crypted data. The approach presented in this thesis is inspired by CryptDB system
for relational databases (R. A. Popa et al, 2010). Before processing a graph query
is translated into an encrypted form which then executed on a server without full
decryption of the data; the encrypted results are sent back to a client where they
are finally decrypted. In this way, data privacy is protected at the server-side. A
flexible mechanism of executing the queries over encrypted graph databases called
CryptGraphDB was proposed in this thesis. It utilizes multi-layered encryption
and encryption adjustment in order to provide a reasonable trade-off between
data security protection and data processing efficiency. The thesis presents the
design principles, the prototype implementation and reports the empirical data ob-
tained by the experimentation with the prototype. The prototype was implemented
for Neo4j graph DBMS and Cypher query language in two different versions (i)
utilizing Java API and (ii) using user-defined functions UDFs. The efficiency of
query execution for various types of queries on encrypted and non-encrypted
Neo4j graph databases are reported. In the context of CryptGraphDB approach
proposed in this thesis two encryption adjustment policies have been considered:
simple adjustment and traversal-aware adjustment. The first policy assumes that
all encryption levels adjustment are performed statically before a query execution,
while in the second the encryption levels are updated dynamically. We show that
by dynamically adjusting encryption layers as query execution progresses, we can
correctly execute the query on the encrypted graph store revealing less information
to the adversary than in the case of static adjustment done prior to execution.
x
Contents
Dedication i
Glossary iii
List of Abbreviations v
Acknowledgements viii
Abstract x
Contents xiv
List of Figures xv
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
xii
CryptDB Mechanism on Graph Databases Nahla Naser Aburawi
xiii
Nahla Naser Aburawi CryptDB Mechanism on Graph Databases
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
References 91
xiv
List of Figures
3.1 The typical query flow in Graph CryptDB (adapted from [34]) . . . . 26
3.2 Onion layers of encryption and the groups of computation they per-
form (adapted from [36]). . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 The CryptGraphDB Design. . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Three nodes schema at server-side (part a) along with the correspond-
ing plaintexts and data under CryptGraphDB (part b) along with the
corresponding ciphertexts. Ciphertexts shown are not full-length. . 34
3.5 Querying Non-Encrypted and Encrypted Graph Database with Java
Interface and Neo4j Interface. . . . . . . . . . . . . . . . . . . . . . . 40
5.1 Example data layout schema at the server of the graph database. . 60
xv
List of Tables
3.1 Data layout at the server when the frontend creates encrypted tables
using the schema at the top; the tables created at the server are
given at the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 The structural queries on non-encrypted and encrypted databases
with Java interface and Neo4j interfaces (time in millisecond). . . . 39
3.3 The data queries on non-encrypted and encrypted databases with
Java interface and Neo4j interfaces (time in millisecond). . . . . . . 40
4.1 Data layout at the server (left part), where the application table
created at the server (middle and left parts). . . . . . . . . . . . . . . 44
5.1 Plain text data, encryption at the RND layer and encryption at the
DET layer (Ciphertexts shown are not full-length). . . . . . . . . . . 62
5.2 A different graph database sizes. . . . . . . . . . . . . . . . . . . . . 66
5.3 Retrieval times of queries (Q1 − Q5 ) over non-encrypted data using
different graph database sizes, queries (Q01 − Q05 ) over encrypted
databases using simple encryption adjustment, and queries (Q001 − Q005 )
over encrypted databases using traversal-aware encryption adjust-
ment, (time in millisecond). . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4 Retrieval times of encryption adjustment using simple adjustment
and traversal-aware encryption adjustment (time in millisecond). . 72
xvi
CryptDB Mechanism on Graph Databases Nahla Naser Aburawi
xvii
Chapter 1
Introduction
1.1 Overview
1
2 Nahla Naser Aburawi
no need to perform join [39] and it is absent in the Cypher query language [19].
Nevertheless, as it was noticed in [4] the issue of unnecessary leaks remains for
graph databases as well.
The rest of this introductory chapter is organised as follows. The research
question and associated research issues are then discussed in Section 1.2. Section
1.3 lists the contributions of the thesis. Section 1.4 presents details of published
work produced as a result of the presented research, followed in Section 1.5 with
an outline of the structure of the remainder of this thesis.
"How to query encrypted graph database? Further, more detailed question: How
do we encrypt graph database and apply encrypted queries on it using
CryptGraphDB approach?"
The term CryptGraphDB used here refers to the prototype system that we have
developed. CryptGraphDB works by allowing the graph DBMS server to execute
Cypher queries on encrypted data almost as if it was executing the same queries
on plain-text data.
To provide an answer to the above research question the following research sub-
questions need to be considered:
1. What are the features of CryptDB that can best be adopted to be implemented
on graph databases?
3. Given a set of the Cypher queries how should these queries implemented
over encrypted graph databases by using the proposed approach?
4 Nahla Naser Aburawi
The main research contribution of the thesis is developing the principles and
the design of CryptGraphDB system for querying encrypted graph databases.
The system ensures protecting the data confidentiality even against attackers
who can access to all server data. In order to protect data confidentiality, data
encryption was used. In order to provide a reasonable trade-off between data
security protection and data processing efficiency, one of the most challenging and
important aspects of data encryption is how to query the encrypted data, avoiding
full data decryption.
Additionally, using multi-layered encryption and appropriate encryption ad-
justment procedures is the key to finding the right balance between security and
query execution performance. Moreover, the graph query is translated into an
encrypted form before processing. The encryption layers of the data are adjusted
accordingly at the server side. Subsequently, the query is executed on a server
and the encrypted results are sent back to a user where they are finally decrypted.
In this thesis, a novel encryption adjustment scheme called traversal-aware
encryption adjustment for graph database was used. The proposed approach
reveals only the information required to execute the query. The scheme is dynamic
and the encryption layer adjustment happens not before the query execution, but
rather it gradually progresses alongside the execution.
The contributions are presented in the thesis as outlined below.
1.4 Publications
Three peer reviewed publications have arisen out of the work presented in this
thesis. These are listed below together with a brief description of each.
1. Aburawi Nahla, Lisitsa Alexei and Coenen Frans (2018). "Querying en-
crypted graph databases", In Proceedings of 4th International Confer-
ence on Information Systems Security and Privacy (ICISSP 2018),
SCITEPRESS-Science and Technology Publications, pp. 447-451. This
paper proposed the first approach to query encrypted data in a graph
database presented in this thesis, the main concept of this approach is
that before processing a graph query is translated into encrypted form which
then executed on a server without decrypting any data; the encrypted results
are sent back to a client where they are finally decrypted. In this way data
privacy is protected at the server side. Also the design of the system and
empirical data obtained by experimentation with a prototype was presented,
implemented for Neo4j graph DBMS and Cypher query language, utilizing
Java API. The efficiency of query execution for various types of queries on
encrypted and non-encrypted Neo4j graph databases was reported. The
content of this paper is included in Chapter 3.
context of Chapter 4.
3. Aburawi Nahla, Coenen Frans and Lisitsa Alexei (2020). "Querying En-
crypted Data in Graph Databases". In: Khalaf M., Al-Jumeily D., Lisitsa
A. (eds) Applied Computing to Support Industry: Innovation and Tech-
nology. (ACRIT 2019). Communications in Computer and Information
Science, vol 1174, pp. 367-382. Springer, Cham. This paper proposed
the encryption as an effective way to protect the sensitive data in a database
from various attacks. Querying encrypted data, however, becomes a chal-
lenge. Either the data should be decrypted before the querying, leaving it
vulnerable for the server side attacks, or one has to apply computationally
expensive methods for querying encrypted data. A flexible mechanism for
the execution of queries over encrypted graph databases was presented in
the paper. The data privacy is protected at the server side, through the use
of multi-layered encryption and encryption layers adjustment done dynam-
ically during the execution of queries. The proposed scheme reveals less
information to the adversary than in the case of static adjustment done prior
to execution. The implementation of the scheme done for a subset of Cypher
graph queries (graph traversal queries) running on Neo4j graph database.
The experimental results show the efficiency of query execution for various
types of queries on encrypted data in graph database stores. The content of
this paper is covered in Chapter 5.
2.1 Introduction
This chapter presents a review of the background and related work with respect to
the work described in this thesis. As mentioned in Chapter 1, the work presented
in this thesis is directed at exploring and proposing an approach to querying
encrypted data in graph databases. The chapter commences, Section 2.2, with a
review of the background of Cryptography. Section 2.3 then presents a review of
the background of CryptDB system. The chapter then continues, in Section 2.4,
with a more detailed look at CryptDB design and how does it work. This is then
followed by Section 2.5 which considers graph Databases (Neo4j) and the query
language (Cypher) that used. Section 2.6 then analyses other querying encrypted
database technologies. The chapter then ends with a summary in Section 2.7
8
Chapter 2. Background and Related Work 9
is achieved by changing back the ciphertext into the original one. These two
operations use also a parameter known as a key, that enables to conduct them.
Differently from these two operations, a hashing does not imply recovering the
original text from the cipher one.
Depending on the number of keys that are employed for cryptographic opera-
tions, the following techniques are defined:
public key is used for encryption and the private key is used for decryption.
However, some public key techniques allow to use the private key for encryp-
tion and the public key for decryption. The common public cryptographic
algorithms are Rivest-Shamir-Adleman (RSA) and Digital Signature Algorithm
(DSA).
The work presented in this thesis uses the symmetric encryption technique, in
particular, the AES encryption algorithm using ECB and CBC modes.
CryptDB has completely explored access control for SQL queries on encrypted
relational data. The CryptDB architecture supposes a proxy between the users
and the server. More details on this will be given in the next section. The basic idea
is that by selecting an appropriate encryption scheme, at the server side no need
to decrypt the data that stored in the database server in an encrypted form, not
even during the execution of the queries. One of the CryptDB advantages is that
no need to change the server software. All process is implemented by intercepting
users’ queries, rewrites them and passes to the server for execution. In [34, 36] The
issue of information leaks during encryption adjustment in relational databases
is discussed and special Join-aware encryption scheme to reduce the leaks is
proposed [35].
There are three novel concepts that CryptDB follows: (i) SQL-aware encryption
strategy that converts SQL operations to encryption schemes; (ii) adjustable query-
based encryption that adjusts the encryption level of each data value based on user
queries; and (iii) onion encryption to change data encryption levels in efficient way.
The main purpose of CryptDB is to guarantee the privacy of data in a relational
database (using SQL queries) to face any adversary that can access the server of
the database. It covers a set of motivating scenarios, such as protecting against
a curious database administrator (DBA), guarding against attackers that get the
access into the database server machine, and outsourcing a SQL database to the
cloud [17, 36]. As shown in Figure 2.2, the assumption of the application and
the CryptDB frontend are not compromised, and the adversary cannot obtain the
keys.
12 Nahla Naser Aburawi
again to Figure 2.2, the typical query flow in CryptDB is presented, as follows:
• In step (1), An SQL query is issued by the user, that passed through the
query rewriter/encryptor (QRE). In this stage, anonymizes each table and
column name by using the master key MK, and encrypts each constant in
the query with an encryption scheme that allow the required operation,
• In step (2), Passing the query to the onion key manager OKM. Then checks
if the server need to know the onion keys to execute the query. In that case,
the OKM gives the necessary onion keys by having an UPDATE query at the
server,
• In step (3), The anonymized query is forwarded to the server by the OKM,
that executes it using standard SQL,
• In step (5), The results are decrypted by the result decryptor (RD) and sends
them back to the user.
We observe from Table 2.1 that R. A. Popa et al. [36] used some existing encryption
schemes for implementing encryption that allows processing SQL query and
concluded that CryptDB uses the same layer of encryption to encrypt all data
values in a given column, that to perform the same computation on every item
in that column. In order to see the security aspects that CryptDB requests from
each encryption type, there are a variety of encryption schemes as follows:
• Random (RND). It gives the maximum of the privacy for the layer encryption.
RND scheme was conducted to that any two equal values will be mapped
to different encryptions. No computation to be performed efficiently on the
ciphertext at RND layer. To implement RND, AES in UFE mode was used
[11, 20, 30].
using DET is allowing the server to process equality checks (i.e. selects with
equality filters, equality joins etc.). Here, DET needs to be a pseudo-random
permutation (PRP) [25]. To implement DET, AES and Blowfish were used for
64−bit and 128−bit values [11, 20, 30].
• Join (JOIN and OPE−JOIN). In order to allow equality JOIN between two
columns, a separate encryption scheme is used. Another feature of JOIN is
that JOIN supports all operations allowed by DET [27].
adjustment level is based on the queries that required over the data: if there is
no function needed the column will be encrypted with RND, while if the columns
need to perform equality checks, DET suffices. Initially, each cell in the database
is encrypted independently into an onion, each item in the table is encrypted with
the strongest encryption layers, as detailed in Figure 2.3. each layer provides
specific function, as reported in subsection 2.4.1.
According to the above, the maximum privacy comes from layers RND and HOM,
while layers DET and OPE provide more functionality. CryptDB uses three onions
for numeric values, whereas using two onions for string values. The frontend of
CryptDB anonymizes the schema as well, to prevent the server from seeing the
information that comes from a column or table names.
Figure 2.3: Onion layers of encryption and the classes of computation they allow.
(from [34, 36])
16 Nahla Naser Aburawi
Turning now to show how to execute a query for read queries. Before executing
the SQL query over an encrypted data, the frontend encrypts, anonymizes and
rewrites the query then sends it to the untrusted DBMS server. Table 2.2 shows
the anonymized schema that needs to rewrite a query. In addition, the query
constant is encrypted with the encryption scheme that matching to layer of the
onion.
Table 2.2: Data layout at the server. When the front end creates a table with the
schema on the left, the table created at the DBMS server is the one shown on the
right (adopted from [36])
Students Table1
ID Name C1-Onion1 C1-Onion2 C1-Onion3 C2-Onion1 C2-Onion2
30 Smith xr32g1 xt43q1 xo89u2 xq22e4 xu4o98
SELECT ID
FROM Students
WHERE name = ’Smith’
Lowering encryption of name to the DET layer is required. The frontend creates
the query
In this section graph database concepts (more specifically Neo4j) will be present
[39]. Graph databases have recently become very popular. The graph structures
with nodes and edges constitute a convenient data model allowing to model all
kinds of scenarios. Querying graph databases may also be more efficient as
compared with relational databases, especially by data traversal queries.
Several implementations of graph DBMS are available, including GraphDB,
Neo4j, OrientDB, to name a few [14, 23, 50]. A graph database consists of two
elements: a node and a relationship. The node represents an entity (a person,
place, thing), and the relationship represents how two nodes are connected. For
example, the two nodes (Blue) and (color) would have the relationship [is a type of]
pointing from (Blue) to (color).
As can be seen in Figure 2.4, some nodes (Employee) represented in a graph
data model. Each node (labeled “Employee”) belongs to a single person and is
connected with relationships describing how each Employee is connected. As
shown, (Smith) knows each of (Tom) and (Jones), as does (Jones) knows (Tom).
The remainder of this section comprises two subsections. Firstly Sub-section
2.5.1 presents graph database (Neo4j). Secondly Sub-section 2.5.2 presents
Cypher query language that used to query Neo4j database.
Chapter 2. Background and Related Work 19
Nodes
Relationships
• Relationships use to connect two nodes directly, the source node and
the target node.
Labels
• Labels use to set the common name to a group of nodes. A node might
have one or more labels.
Properties
• Properties are key-value pairs used to add features to nodes and rela-
tionships.
Chapter 2. Background and Related Work 21
Cypher is a declarative query language for querying a Neo4j database (it is more
on expressing what to return from the graph the result rather than on ways how
to get the result). The syntax of the Cypher gives a natural way to express the
patterns of nodes and relationships in the graph to be matched during the query
execution. It is not required to describe how we can do the select, insert, update
or delete from our graph data [1, 31, 33, 46].
Cypher Clauses
Cypher Examples
To present the fundamental ideas of the Cypher query execution, the example
graph in Figure 2.4 will be used. Many Cypher queries will be considered as
follows:
Example 1.
Find Employee nodes in the graph that have a name of Tom. First, we need
to name a variable and this variable is used as a reference to Tom node. The
answer is Tom.
MATCH (n: Employee)-[:KNOWS]->( )
WHERE n.name = {Tom}
RETURN n
Example 2.
Find the age of Tom. Note, we need to use the same variable for both name and
age to point to the same node which is Tom. The answer is 33.
MATCH (n: Employee)-[:KNOWS]->( )
WHERE n.name = {Tom}
RETURN n.age
Example 3.
Find WHO does knows Tom. In this case, we need to specify the direction of the
relationship, to know who knows Tom, we need the incoming relationship direction.
The answer is Jones.
MATCH (n: Employee)<-[:KNOWS]-(m)
WHERE n.name = {Tom}
RETURN m
Chapter 2. Background and Related Work 23
Example 4.
Find WHO does Tom know. In this case, we need to specify the direction of the
relationship, to know who Tom knows, we need the outgoing relationship direction.
The answer is Smith.
2.7 Summary
This chapter has provided the reader with a review of the existing work that
underpins the work presented in the thesis. The chapter commenced with a
review of the Cryptography concept. The CryptDB system and its design was then
presented, together with discussion of both the SQL-aware encryption strategy
and an adjustable query-based encryption with the onion encryption. Also, query
execution for both read and write queries. The central theme of the work presented
in this thesis is to address CryptDB system principles using graph database. There
are a variety of graph databases that fall within the domain of this approach, the
one used in this thesis was Neo4j. The chapter then went on to consider the graph
database which the work presented in this thesis is directed; especially the Neo4j
database that have been proposed in the literature. The significance of the later
was that Cypher query language was used for evaluation purposes with respect
to the CryptGraphDB approach proposed later in the thesis. The next chapter
describes the querying encrypted graph databases mechanism.
Chapter 3
3.1 Introduction
This chapter presents the proposed CryptGraphDB approach for querying graph
databases. CryptGraphDB is a database system that protects the privacy of
the data in the graph database by executing Cypher queries over the encrypted
database, more details on this will be given in this chapter. Another key point
to remember is that CryptGraphDB deals with the threat of either an Inquisitive
Database Administrator (DBA) or database hacker who attempts to know private
data (For example details concerning: online banking, medical records, billing
data, employment records) by spying on the DBMS server. Because the server can
see the database only in encrypted form and never receives the decryption key,
the DBA can learn nothing about the data. Figure 3.1 details CryptGraphDB’s
architecture.
In this work, we aim to take CryptDB principles [34] as they are implemented
for relational databases and transfer them to graph databases. We first elaborate
on the requirements for the CryptGraphDB component of Cypher query language
and the Neo4j database system. It has been found that SQL-aware encryption
schemes can be seamlessly re-used as Cypher-aware schemes, at the same time
preserving the performance advantages of traversal graph queries over equivalent
relational ones, more details on this will be given in the next section.
25
26 Nahla Naser Aburawi
Figure 3.1: The typical query flow in Graph CryptDB (adapted from [34])
CryptGraphDB was implemented in two different ways: (i) using Neo4j embed-
ded in Java applications by including the Neo4j library jars in the build; and (ii) by
creating a set of User-Defined Functions (UDFs) to be called directly from Cypher
query.
The structure of the remainder of this chapter is as follows. Section 3.2 presents
the proposed design. Section 3.3 introduces the implementation of the prototype.
Finally Section 3.4 presents a summary of the chapter and the main findings.
Chapter 3. Querying Encrypted Graph Databases 27
level of encryption that maps the same plain text values to the same encrypted
texts. As a result, we need to allow CryptGraphDB to adjust the encryption level
of each data item based on user queries. This could be done either in advance, if a
set of queries is fixed in advance, or during run time, in which case an adjustable
onion layered encryption should be used.
Figure 3.2: Onion layers of encryption and the groups of computation they perform
(adapted from [36]).
• Deterministic (DET). It only leaks encrypted values that match to the same
data value, and no more. DET was implemented to allow equality checks to
be performed.
node to the next one over the connecting relationships. Every hop over the path
will be equivalent to a join operation. As a result the JOIN layer in the onion layers
of encryption in [4] were omitted. More details on this topic can be found in [35].
Regardless above the above, under the natural translation of relational databases
to graph databases, such as by the work flow proposed in the Neo4j manual (export
relational database instance to CSV file and upload it to Neo4j as a graph DB in-
stance), one may show that the issue of unnecessary leaks remains. Furthermore
the original Join-aware encryption scheme can be used as a "property-aware"
encryption scheme for Cypher query execution. More details on related strategies
of encryption adjustment will be given in the next section. Further improvement in
security protection in CryptGraphDB will come from the development of Traversal-
aware Encryption Adjustment applied dynamically during the execution of graph
traversal queries. More details on the development of such schemata, and an
analysis of the related trade-offs between efficiency and security protection, will
be given in the next section.
As reported above, the focus of CryptDB [34] is on allowing the DBMS to implement
relational queries over encrypted data in a manner roughly synonymous with
implementing the same queries on the original data. To determine whether some
queries need individual operations such as equality comparison or summation,
only these modified operators are used on ciphertexts. In order to implement
encryption that empowers SQL query processing, existing encryption schemes are
used.
In this scheme if two columns are to be joined, they need to be encrypted with
the same key for levels JOIN or OPE-JOIN. To allow the equality joins between two
columns a separate encryption level is necessary. For inequality joins, OPE-JOIN
is used to perform inequality joins by the server on the columns at that encryption
level.
Table 3.1 illustrates the issue with information leaks for simple encryption
adjustment schemes. The upper part represents two tables (left for Student)
and (right for Enroll), while the subsequent parts represent RND and DET layers
respectively.
Chapter 3. Querying Encrypted Graph Databases 31
Table 3.1: Data layout at the server when the frontend creates encrypted tables
using the schema at the top; the tables created at the server are given at the
bottom.
STUDENT ENROLL
Non-encrypted database
stuID Name Major stuID Name ClasssNumber
1 Smith Math 1 Smith MTH200
2 Tom CSC 3 Mary CSC201
3 Mary CSC 4 Jones CSC205
An encrypted database RND layer
stuID Name Major stuID Name ClasssNumber
X98e7 x5a8c X4a87 qwe70 x9g6h oi54f
X7a8q x78as gh66y asd8q xu33k 86uyt
Xq74f x1cf8 tri13 qwe4a x7j8u dft63
An encrypted database DET layer
stuID Name Major stuID Name ClasssNumber
as99e X88df RtE54 as99e X88df khj33
uyt57 x3s4t FM98w nhy97 X6s5g qw12l
nhy97 X6s5g FM98w opo44 x7rz7p hg80u
SELECT Name
FROM Student, Enroll
WHERE Student.Name = Enroll.Name
The outcome of executing the above query would be Smith and Mary. According
to the CryptDB mechanism, initially the RND is performed as illustrated in the
middle part of Table 3.1. Later the DET level is used to lower the encryption of
Name, as shown in the lower part of Table 3.1. After that a query to update the
table is required, where x88df and x6s5g are an encryption of Smith and Mary
respectively. The results demonstrate two things. First, we can recognize that
there are two values correspond to the same non-encrypted values: x88df and
32 Nahla Naser Aburawi
More detail on how CryptGraphDB executes Cypher queries over encrypted data
will be given in this sub-section. The required steps to process the query by
CryptGraphDB are illustrated in Figure 3.3. This involves the following steps:
1. The application issues a query that is intercepted and rewritten by the proxy:
it anonymizes each label, node, and relationship name, and encrypts each
constant in the query with an encryption scheme that allows the required
operation.
3. The encrypted query is sent by the proxy to the DBMS server, then be executed
using a standard Cypher.
4. The DBMS server returns the encrypted result of the query to the proxy, then
the Proxy decrypts, and sends them back to the application.
Nodes:
(a:actor{name:’Tom Hanks’})
(m:movie{title:’The Matrix’})
(d:director{name:’Andy Wach’})
Relationships:
(a)-[:Acted_In{role:’Keneu’}]->(m)
(d)-[:Directed{In:’1999’}]->(m)
Initially, each data item is adapted with onion of encryption either at most 3
onions (for integers) or 2 onions (for non-integer) values, within the RND and the
34 Nahla Naser Aburawi
Figure 3.4: Three nodes schema at server-side (part a) along with the corresponding
plaintexts and data under CryptGraphDB (part b) along with the corresponding
ciphertexts. Ciphertexts shown are not full-length.
HOM as the outermost layer. At this stage, the server (or rather an attacker taken
over the server, or curious administrator) can recognize nothing about the data
content other than the number of nodes, properties, and relationships.
Given a query of the form:
MATCH (a:actor)->[m]<-(d:director)
WHERE a.name = "Tom Hanks"
RETURN m.title
Now, P1-Eq and P3-Eq properties are decrypted at the DET layer:
Step 2. Proxy encrypts Tom Hanks to its EQ onion, DET layer encryption value
of x7bf8 via:
EkeyN1,P1,Eq,DET(Tom Hanks).
Here, x7bf8 is encryption of Tom Hanks. The proxy should request the random
IV from property P1-IV in order to decrypt the RND ciphertext from P1-Eq.
Step 3. The proxy receives an encrypted RND level result x684 and decrypts it
using:
Finally, The proxy sends the decrypted result The Matrix to the application.
The first prototype of a fragment of the above CryptGraphDB design was imple-
mented using a Java API. The main focus here was to evaluate the efficiency of
execution of Cypher queries over an encrypted Neo4j database. Several ways for
integrating Neo4j with client applications have been proposed. These include
Neo4j Server REST extension, Neo4j server with JDBC, and Neo4j’s Embedded
Java API [45].
The embedded Java API provides for the tight integration of the client Java
program with the Neo4j server, that is embedding a server into an application. So
it does not allow for the actual separation of the CryptGraphDB component and
the server, as assumed by the design. However, it allows for quick prototyping,
where the client, the proxy and the server, are all parts of the same application,
and experimenting with the execution of the queries on the encrypted store.
For the first prototype, only one encryption layer was implemented, is the DET
layer. The Advanced Encryption Standard (AES) algorithm available in the Java
Cryptographic Architecture (JCA) package was used. The Neo4j Native Java API
was used to create a Neo4j database in the chosen path, as shown here:
Note that the follows transaction needs to be included, in order to start Neo4j
database transaction.
Chapter 3. Querying Encrypted Graph Databases 37
MATCH (A:PEOPLE)-[:KNOWS]-(B:PEOPLE)
WHERE A.name= "John"
RETURN B.name
1. Each value in the graph is encrypted within the RND layer. Resolution of
the query requires the lowering of the encryption values to level DET, as we
need to check the equality, by issuing an update query command using SET
clauses.
2. Proxy encrypts "John" to its Equality onion, the DET layer encryption, then
the proxy generates the query and sends it to DBMS:
MATCH (AA:PEOPLE1)-[:KNOWS1]-(BB:PEOPLE1)
WHERE AA.name1="c9Yz1Og1PdfVKBrVnOk46Q"
RETURN BB.name1;
3. Proxy receives the encrypted result and decrypts it and finally sends it back
to the application.
The implemented prototype system was used to conduct the experiments and
study the efficiency of Cypher queries executed on an encrypted graph database.
38 Nahla Naser Aburawi
For the first group of experiments an instance of a graph database with 250 nodes
and 200 relationships was manually created. Some nodes were intentionally left
orphan, meaning they have neither incoming nor outgoing edges. Several types of
relationships were used and a variety of path lengths from a binary relationship
to a path of up to a length of ten.
3.3.2 Queries
The set of queries was designed to test some of the common types of queries. For
example, traversals are important to define data objects (nodes) that originate
from or are affected by some starting object or node. Another popular operation
is searching for particular values within a specific property. The queries were
divided into two types: structural and data queries.
• S0 : Find all orphan nodes. Which means, find all nodes in the graph with
no incoming edges and no outgoing edges.
• S3 : Find all nodes with the incoming relationship, and count them.
• S4 : Find all nodes with the outgoing relationship, and count them.
• S5 : Find all nodes with the incoming relationship and outgoing relationship,
and count them.
Each query was run 12 times on the database and execution times were collected.
All times are in milliseconds (ms). We have dropped the longest times and the
shortest times, the remaining ten times were averaged. We have done this to make
sure that the caching does not affect the timing. The data values were chosen
randomly and in advance all the same values were implemented for both the
non-encrypted databases and the encrypted databases.
A summary of the execution time for structural queries, both non-encrypted
and encrypted instances of the same database, is given in Table 3.2. For the
structure queries, S0 − S6 , the non-encrypted database was slightly faster, as
shown in Table 3.2 (upper part). On the other hand, there was also a small
difference in the execution time in the data queries, C1 − C3 , as detailed in Table
3.3 (upper part). Overall the slowdown and acceleration, demonstrated for some
of the queries, were insignificant.
Table 3.2: The structural queries on non-encrypted and encrypted databases with
Java interface and Neo4j interfaces (time in millisecond).
Query S0 S1 S2 S3 S4 S5 S6
Java interface
Non-encrypted DB 5030.8 5115.7 5256.6 5038.1 4931.1 5326 5455.5
Encrypted DB 5058.4 5092.5 5252.1 5081.8 4881.2 5338.3 5471.9
Slowdown 0.54% -0.45% -0.085% 0.86% -1.012% 0.23% 0.3%
Neo4j interface
Non-encrypted DB 278.5 49 76.3 64.3 68.4 89.1 94.8
Encrypted DB 360.7 46.1 80.3 75.1 41 63 85.2
Slowdown 29.5% -5.92% 5.24% 16.8% -40.1% -29.3% -10.13%
For the sake of comparison, we executed the same sets of structural queries,
again over encrypted and non-encrypted instances using the native Neo4j interface
(Enterprise edition). The results can be seen in Table 3.2 (lower part). Additionally,
the results of the same group of data queries can be seen in Table 3.3 (lower
part). Unlike the case of the Java API, where the wall time was measured using
Java methods, here the processor time reported by Neo4j is shown. This explains
40 Nahla Naser Aburawi
Table 3.3: The data queries on non-encrypted and encrypted databases with Java
interface and Neo4j interfaces (time in millisecond).
Query C1 C2 C3
Java interface
Non-encrypted DB 5491.2 5248.8 5216.8
Encrypted DB 5478.2 5287.1 5510.4
Slowdown -0.24% 0.73% 5.63%
Neo4j interface
Non-encrypted DB 63.8 71.8 79.2
Encrypted DB 59.5 40.7 53.8
Slowdown -6.74% -43.31% -32.07%
one or two orders of magnitude difference between the tables. While slowdown is
reported for S0 and S2 queries run over encrypted instance, other queries were in
fact accelerated. The results look encouraging for using CryptDB-like approaches
for graph databases. However, more empirical evidence is required with respect to
larger databases. The results of execution time are compared in Figure 3.5.
Figure 3.5: Querying Non-Encrypted and Encrypted Graph Database with Java
Interface and Neo4j Interface.
Chapter 3. Querying Encrypted Graph Databases 41
3.4 Summary
In this chapter the proposed CryptGraphDB approach for executing queries over
encrypted graph databases has been presented. The design of CryptGraphDB
was outlined and experiments with an initial prototype system implemented with
Embedded Java API for Neo4j graph DBMS reported on. It was shown that the per-
formance using CryptGraphDB demonstrated promising results which warranted
further development of the system. The future directions in the development of the
encryption adjustment schemes specific for graph databases was also outlined,
such as property-aware and traversal-aware schemes. The implementation of
these schemes and the investigation of related trade-offs between efficiency and
security is a topic for future research. In the next chapter a dynamic scheme
called traversal-aware encryption adjustment TAEA is presented.
Chapter 4
Traversal-Aware Encryption
Adjustment for Graph Databases
4.1 Introduction
In recent years there has been growing interest in querying encrypted data in
databases [6, 7, 9, 28, 40], to address such data security concerns. Until very
recently most of the work was on relational databases. In [3] we extended CryptDB
to on graph databases. The challenge is how best to transform the query into a
format appropriate for querying an encrypted database. In the context of data
processing methods to query encrypted data. This approach is inspired by the
work in [34]. CryptDB provides a powerful mechanism for security protection of
data against server-based attacks. One of the techniques used in the CryptDB
and CryptGraphDB is layered common encryption and encryption of adjustment.
Both original approaches apply static adjustment before query execution.
In this chapter, the proposed Traversal-Aware Encryption Adjustment (TAEA)
mechanism for graph databases is presented. TAEA is directed at adjusting the
encryption layers in the dynamic context (whereas the CryptGraphDB approach
using simple encryption adjustment presented in the previous chapter was directed
at static adjustment context). The TAEA mechanism adopts a data processing to
query encrypted data in graph databases.
The main idea is when applied TAEA to graph database querying lead to
demonstrably less unnecessary leaks of information. The scheme is dynamic
42
Chapter 4. Traversal-Aware Encryption Adjustment for Graph Databases 43
and the encryption layer adjustment happens not before the query execution,
but rather it gradually progresses alongside the execution. In order to provide a
sensible trade-off between data security protection and data processing efficiency,
TAEA utilizes multi-layered encryption and encryption adjustment that allows
controlling to some extent the release of information about data elements required
for query execution. However, it is argued here that using simple adjustment
policies (CryptGraphDB) requires less queries and updates to be followed; in
other words, unnecessary data may reveal when adjusting the encryption layers.
Instead, the proposed TAEA mechanism, as the name suggests, uses the concept
of dynamically adjusting encryption layers as query execution progresses. In this
way less information is revealed to any adversary watching the execution of the
query on the encrypted store; The TAEA method is fully described later in this
chapter together with how it was implemented and its evaluation.
The remainder of this chapter is organised as follows: Section 4.2 demonstrates
the encryption layers and simple adjustment. Section 4.3 then presents the
proposed Traversal-Aware Encryption Adjustment method and its evaluation.
Finally Section 4.4 gives a summary of the chapter.
In this section, we outline briefly the concepts of encryption layers and encryption
adjustment in the context of encrypted databases querying. Onion Layers of
encryption considered in [4, 34] allow changing data encryption levels on-demand
in an efficient way. The main idea is to encrypt each data item in one or more
onions, where each layer of each onion enables some kinds of functionality as
explained in [4, 34]. In the beginning, each data item in the database is encrypted
in all onions of encryption, started with the most secure encryption scheme as
outermost layers. At this point, the server can know nothing about the data other
than the number of nodes, properties, and data size, whilst the inner layers such as
the OPE and the DET provide more functionality. Depending on the requirements
of a particular query for data access the level of encryption is adjusted before
query execution. Different cryptographic algorithms are available to be cascaded
into onion layers as mentioned in sub-section 3.2.3.
44 Nahla Naser Aburawi
MATCH (node:person)-[:knows]->( )
WHERE node.name = {"Tom"}
RETURN node.age
The above Cypher query requires the equality check, to do this, the query
needs to pass from the RND layer as shown in the (middle part) to the DET layer as
detailed in the (right part) in Table 4.1. Both values in the middle part and the
left part of the table are corresponding to ciphertexts. (ciphertexts shown are not
full-length)
Table 4.1: Data layout at the server (left part), where the application table created
at the server (middle and left parts).
person
name age name at RND age at RND name at DET age at DET
Tom 29 a4a895a87052 e6ba69bdf08c UD82Pv8uGNi7 33TPfYgeYDKb
Smith 22 9d60b415e6e7 686097aa7a7a j39IjDVyx/+ NMtqlsMp8Qaf
Step(1) Implement onion layer decrypting using UDFs that run on the DBMS
server, more details on this will be given in the next Chapter . In order to decrypt
onion equality of column 2 to layer DET, the proxy creates the following query by
using the DECRYPT RND UDF:
Step(2) The proxy encrypts Tom, to the DET layer encryption value of UD82Pv8uGNi7,
then, proxy generates query and sends it to DBMS:
MATCH (node:person)-[:knows]->( )
WHERE node.name = {"UD82Pv8uGNi7"}
RETURN node.age
Step(3) The proxy receives encrypted RND level result e6ba69bdf08c and decrypts
it using:
MATCH (node1:Label1)-[:RELATIONSHIP*1..n]-(node2:Label2)
WHERE node1.propertyA = {Value} AND ...
RETURN node1.propertyA, node2.propertyB
B. Unbounded traversal
MATCH (node1:Label1)-[:RELATIONSHIP*]-(node2:Label2)
WHERE node1.propertyA = {Value} AND ...
RETURN node1.propertyA, node2.propertyB
In both cases, during the query execution, the paths starting with nodes with
particular names values and progressing alongside specified relationships are
traversed. The execution may perform additional checks of some properties of
encountered nodes if required by conditions following AND in the above queries.
When executing such queries over encrypted graph database in the original
CryptGraphDB, the encryption layer adjustment may be required for the properties
46 Nahla Naser Aburawi
of all nodes which may be encountered during traversal, if conditions checks are
present in the query. As before query execution, it is not generally possible to
identify nodes that will be traversed, the simple encryption adjustment will be
done everywhere (all nodes) where the properties required for checks are present.
Whereas, the adjustment in TAEA will not be everywhere, but just along the
query execution path. The scheme of traversal-aware encryption adjustment is
dynamic and the encryption adjustment happens not before the query execution,
but rather it gradually progresses alongside the execution. The proposed scheme
follows the simple principles defined in [3]:
• The adjustment is performed to enable one step of traversal using all infor-
mation accumulated to this step, in particular, the set of nodes traversed so
far.
Figure 4.1: Data layout schema at the server of the graph database.
Non-encrypted mode
The search criteria WHERE clause consists of two conditions. Initially, WHERE name
= "Tom", the execution of it leads to have three nodes to be traversed/ checked
{Smith,22},{Smith,35},{Lee,18} (these are reachable from {Tom,29} node
in one step via Knows relation). Then check the second part of the query which is
the age = 22, based on the previous result, the final result is {Smith,22} node.
Initially, each property in the graph is dressed in the onion of encryption with the
RND as the outermost layers. At this point, the server can learn nothing about the
data content other than the number of nodes, properties, and relationships. To
execute the query over the encrypted store it is required to lower encryption of
name and age to level DET (as we need equality checks). In this case, an UPDATE
query is required:
G83 are an encryption of Tom and 22, respectively. The results are decrypted and
return them to the user.
More details on this follows.
In step (1), proxy sends to the DBMS: UPDATE Database,
Because all the properties on the RND layer, as illustrated in Figure 4.2. DBMS
decrypts entire name and age properties to the DET layer:
DECRYPT.P,Eq,RND(X11)=D1
DECRYPT.P,Eq,RND(G71)=G68
DECRYPT.P,Eq,RND(X77)=D6
Proxy updates its internal state to log that entire name and age properties are
now at the DET layer in the DBMS, as can be seen in Figure 4.3;
In step (2), proxy encrypts Tom and 22, to their Equality onion, DET encryption
layer value of D1 and G83, respectively. Proxy generates a query and sends it to
DBMS:
In step (3), finally, the proxy sends decrypted result Smith and 22 back to the
application.
We notice that with this encryption adjustment procedure after the query
execution the equality of {name,age} for each node becomes apparent. This
information on equality is not related to the result of the query and is not strictly
necessary for computing the result, as these nodes are not connected to Tom node.
Consider the example schema shown in Figure 4.1. Initially, each node and
property in the graph is encrypted using the RND scheme, as shown in Figure 4.2.
Return now to our running query example Q:
Figure 4.2: Encryption at the RND layer. Ciphertexts shown are not full-length.
Figure 4.3: Encryption at the DET layer. Ciphertexts shown are not full-length.
50 Nahla Naser Aburawi
In order to execute this query we need to adjust the encryption of name to the
DET layer, by issuing this query:
Here result variable is used as an alias for the result column name of Q1 , property
name1 corresponds to name, and D1 is the encryption of Tom. The outcome shows
that there are only three nodes as the outgoing of Tom node. Before processing the
second part of the query Q, that is WHERE y.age = "22", lowering encryption of
age property of nodes in the result is needed to level DET, as illustrated in Figure
4.4.
Finally, Proxy receives the encrypted result of D6 and G83, decrypts them and
sends {Smith,22} back to the user.
This solution enhances the previous method original encryption adjustment
procedure by not revealing all the equality of age property to the DET layer. (DET
layer indicates that there are two values corresponds to the same value). As
in this graph, there are two nodes have the same age value {Perry,38}, after
implementing the above query both kept at RND layer with G36 and G72.
4.3.2 Discussion
The advantage of the proposed approach is would not reveal the information that
comes from the DET layer more than necessary. Figure 4.5 shows a significant dif-
ference in the results when applying both the original CryptGraphDB encryption
adjustment strategy part (a) and the traversal-aware encryption adjustment strat-
egy part (b). The results showed that when the original CryptGraphDB strategy is
applied it reveals more information than necessary, such as the age of other nodes
that are not connected to Tom node. By contrast, when traversal-aware encryption
adjustment strategy is applied, it reveals only the information required to execute
the query. Note that with respect to the implementation of the TAEA approach
only adjust age property to DET layer of nodes connected to Tom while keeping
the rest age property at the RND layer. Our technique shows a clear advantage
by dynamically adjusting encryption layers as query execution progresses. In
this way, less information is revealed to the potential adversary overseeing the
execution of the query on the encrypted store.
The results are presented in Figure 4.5. Note that the adjusting values for
both approaches have been reproduced from Figure 4.1. From the figure, it can
clearly be seen that the TAEA approach outperformed the original CryptGraphDB
approach by a significant margin. The TAEA approach was good at adjusting only
the required property value to execute the query while at the same time using the
original CryptGraphDB approach was adjusting all values.
52 Nahla Naser Aburawi
Given that from Figure 4.5 part (a) it can be seen that all age values are
adjusted at DET layer; therefore the similarity of age values between nodes are
revealed. That indicates the correlation of age values between two nodes (such as
G49), based on the original value of (38). While we observe from Figure 4.5 part
(b) that the majority of age values are kept at the RND layer, not all are adjusted
to the DET layer. So, when we return to the previous values of (age= 38) we can
notice that they are kept at the RND layer of values G36 and G72.
4.4 Summary
In this chapter, the Traversal-Aware Encryption Adjustment for graph database
approach has been presented. This approach is a novel solution that supports
executing cypher queries over encrypted graph databases. The approach offers
the dual advantages that: (i) it reveals only the information required to execute
the query and (ii) adjusting encryption layers dynamically as query execution
progresses. The evaluation was conducted by considering the execution of Cypher
query in three modes (non-encrypted, original CryptGraphDB adjustment, and
traversal-aware encryption adjustment). Comparisons were presented between
the proposed TAEA approach and original CryptGraphDB adjustment. The pro-
posed approach produced the better results, we have shown that when querying
encrypted graph databases, dynamic traversal-aware encryption adjustment pro-
vides the best security protection of database content, as compared with the static
adjustment performed before query execution. More details on implementing
traversal-aware encryption adjustment for CryptGraphDB and empirical evalua-
tion of related trade-off between security and query execution efficiency will be
given in the next section. The next chapter presents the implementation of TAEA
approach.
Chapter 5
Towards Implementation of
TAEA
5.1 Introduction
In Chapters 3 and 4 the proposed CryptGraphDB and TAEA were presented, both
were directed at querying encrypted graph database and adjusting the encryption
layers alongside with the query execution. This chapter considers the implementa-
tion of Traversal-Aware Encryption Adjustment TAEA that introduced in Chapter 4.
More specifically this chapter presents different modes of Cypher query execution
by using TAEA principles as presented in the previous chapter.
Recall from Chapter 3 that the original CrypDB approach has been transferred
to the context of graph databases. The basic idea is the same as in relational
CrypDB: the execution of the graph query is achieved after translating the query
into an encrypted form, which later executed on a server without decrypting any
data. Each encrypted result then is sent back to the user where they are finally
decrypted. The proposed design is implemented for Neo4j graph DBMS and Cypher
as a query language. It has been confirmed that SQL-aware encryption schemes
can be smoothly reused as Cypher-aware encryption schemes, together with
keeping the benefit of the performance of traversal graph queries. The mechanism
reported the efficiency of query implementation for different types of queries on
encrypted and non-encrypted Neo4j graph databases.
As discussed previously in Chapter 4, a traversal-aware encryption adjustment
54
Chapter 5. Towards Implementation of TAEA 55
The work presented in this section commences with an analysis of the proposed
approach Traversal-Aware Encryption Adjustment, as discussed in Chapters 4.
This approach in the context of whether or not its operation could be improved
upon, and if so how this might be achieved. The idea of Traversal-Aware Encryp-
tion Adjustment (TAEA) is quite simple. During the query execution, the paths
starting with nodes with specific names values and progressing alongside specified
relationships are traversed. The execution may perform additional checks of some
properties of encountered nodes. Therefore, the adjustment will not be everywhere,
56 Nahla Naser Aburawi
but just along the query execution path. The scheme of traversal-aware encryption
adjustment is dynamic, and the encryption adjustment happens not before the
query execution, but rather it gradually progresses alongside the execution.
In this section, we describe an implementation of TAEA for a subset of the
Cypher queries. We start with simple examples first. In general, execution of
a query with TAEA over an encrypted graph data store requires execution of
interlaced partial queries and encryption adjustment updates. While it is possible
to compose these partial queries and updates using the WITH construct of Cypher,
and thereby to execute all the sequence automatically ("in one go"), for simplicity,
here we present the required sequence of separate queries and updates. The
composition is discussed later in subsection 5.3.2.
1. Each value starts out encrypted with the most private encryption level where
data is encrypted using the RND scheme.
2. To check the equality for the first part of the WHERE clause,
node1.propertyA = value1, we need to lower the encryption of propertyA
to level DET. The proxy issues this query to the server UPDATE Label1 SET
propertyA = DECRYPT RND(propertyA), that use the DECRYPT RND UDFs,
where DECRYPT RND is a user defined function implementing decryption
which is discussed in sub-section 5.2.1.
MATCH (node1:label1)-[:Relationship]->(node2:label2)
WHERE node1.propertyA = {encrypted value1}
RETURN node2 AS result
4. Lowering the encryption level of node2.propertyB for nodes that are reach-
able from the outgoing of Q1 to DET layer.
MATCH (node1:label1)-[:Relationship]->(result:label2)
WHERE result.property2 = {encrypted value2}
RETURN result
6. Finally, proxy decrypts the results from the server and returns them to the
user.
In the case of having a query Q consisting of multiple statements and two search
criteria, as follows:
MATCH (node1:label)-[Rel1]->(node2:label)-[Rel2]->(node3:label)
WHERE node1.propertyA = {value1} AND node2.propertyB = {value2}
AND node3.propertyC = {value3}
RETURN node3
3. As Q has multiple links, we start with the first part R1 which is (node1)-
[Rel1]->(node2), and execute the query Q1 to allow the initial search
node1.propertyA=encrypt(value1) for nodes of Q to be executed. When
the path required in the main query Q start as:
58 Nahla Naser Aburawi
MATCH (node1:label)-[Rel1]->(node2:label)-[Rel2]->(node3:label)
WHERE node1.propertyA = {encrypt(value1)}
RETURN node2 AS result
Here, result is used as an alias for result column name of Q1 , while en-
crypt(value1) is the encryption of value1
6. Finally, proxy decrypts the results and sends them back to the user.
General Case
We now consider general case of the simple traversal query of the form:
The following is the process for resolving a query Q of the above form using the
TEAE scheme:
3. Execute Q1 which is the first part of Q when the path required start as:
Chapter 5. Towards Implementation of TAEA 59
MATCH (node_1:label_1)-[:Relationship1]->(node_i:label_i)
WHERE node_1.property_1 = {encrypt(value1)}
RETURN node_i AS result
Here, result is used as an alias for the result column name of Q1 , while
encrypt(value1) is the encryption of value1.
MATCH (result)<-[:Relationship_i]-(node_k:label_k)
WHERE result.property_i = {encrypt(value_i)}
RETURN result_1
5. For Q3 , lower the encryption of property_k for nodes that have an incoming
relationship with result_1 of Q2 to DET, ...
6. Finally, proxy decrypts the results and sends them back to the user.
5.2.1 Implementation
Figure 5.1: Example data layout schema at the server of the graph database.
MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "Tom" AND node2.age = "22"
RETURN node2
For this study, we considered the execution of the above query in three modes:
(1) non-encrypted; (2) encrypted with simple adjustment; and (3) with traversal-
aware encryption adjustment.
Chapter 5. Towards Implementation of TAEA 61
Non-encrypted
The search criteria WHERE clause has two parameters start with node1.name =
"Tom" when executed gives an output of three nodes {Smith,22},{Smith,35},
{Lee,18} to be traversed (these are reached from the {Tom,29} node in one
step via the KNOWS relation). Next, execute the second part of the query node2.age
= "22". Lastly, get the {Smith,22} node as the final result.
Initially, each value in the graph is encrypted within the RND layer as the outermost
layer, as follows:
match (n)
SET n.name = encryptRND((encryptDET(n.name)),ID(n)),n.age=encryptRND
((encryptDET(n.age)),ID(n))
match (n)
SET n.name = decryptRND(n.name,ID(n)) AND n.age=decryptRND(n.age,ID(n))
MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "UD82Pv8uGNi7" AND node2.age = "NMtqlsMp8Qaf"
RETURN node2
Table 5.1: Plain text data, encryption at the RND layer and encryption at the DET
layer (Ciphertexts shown are not full-length).
name age name at RND age at RND name at DET age at DET
Tom 29 a4a895a87052 e6ba69bdf08c UD82Pv8uGNi7 33TPfYgeYDKb
Smith 22 9d60b415e6e7 686097aa7a7a j39IjDVyx/+ NMtqlsMp8Qaf
Tom 39 9b078f653478 21da9938c098 UD82Pv8uGNi7 Ss67Waxq2n+m
Lee 18 6cf77f7817b1 bcb86ac44437 RQpqwfEE8Kbm fxEYkxe7g+P27L
Smith 35 e7a86cbc36ff 83e6b8ab0edc j39IjDVyx/+ 5K6xJRUEJ2s+
Jones 32 141a99a21cf4 dfd2e8d1dfa2 ax+/5Q23fEl4 z0sfDuU2mIP/
Perry 47 ca06d68f7c6b c1051d53aae2 0nPCg1bAxh8R oSLl00rhMbeZ
Sara 38 a5cb936cd7ed 7ed31f9f083d Z+NQr9J7iSRi V01kYVwG13GU
Perry 38 c32f8d5d66a1 49521d4f028e 0nPCg1bAxh8R V01kYVwG13GU
Tom 40 5f2041a58089 56c26c25e4d UD82Pv8uGNi7 rXhFoilgAFoO
Initially, each value in the database is encrypted with the most secure RND layer,
as listed in Table 5.1. The advantage is that the server can learn nothing about
the data values.
Returning to resolving the example query Q:
MATCH (node1:person)-[:KNOWS]->(node2:person)
WHERE node1.name = "Tom" AND node2.age = "22"
RETURN node2
Here the output variable is used as an alias for the result column name of
Q1 , and UD82Pv8uGNi7 is the encryption of Tom. As a result of the first stage
Chapter 5. Towards Implementation of TAEA 63
of the query resolution, there are three nodes as the outgoing of the n.name =
"UD82Pv8uGNi7" node.
Before processing the second part of the query Q, WHERE y.age = "22", we
need to lower the encryption level of the age property of nodes in the output
variable ONLY to the DET layer.
Then we execute the query Q2 , implementing the next step of Q execution:
Bounded traversal
MATCH (node1)-[*1..3]->(node2)
WHERE node1.name = ’Smith’ AND node2.age = ’38’
RETURN node2
At the start point, all values are held in the RND layer. We then move values to
the DET layer using the function UPDATE Label SET P = DECRYPT RND, where
P corresponds to name. Thereafter, we perform the query Q1 processing the initial
search for nodes when the path required in the original query Q starts as:
MATCH (node1)-[*1..3]->(node2)
WHERE node1.name = "j39IjDVyx/+"
RETURN node2 AS output
64 Nahla Naser Aburawi
Again the output variable is used as an alias for the result column name of Q1 ,
j39IjDVyx/+ corresponds to the encryption of Smith. Further execution of Q1
shows that there are six nodes as the outgoing of node1.name = "j39IjDVyx/+"
condition. Before processing the second part of the query Q, WHERE node2.age
= "38", we need to lower the encryption level of the age property of nodes in the
output variable to the DET layer.
Next we execute the query Q2 , implementing the next step of Q execution:
MATCH (node1)-[*1..3]->(output)
WHERE output.age = ’V01kYVwG13GU’
RETURN output
Unbounded traversal
Now, we need to see the affect when the path length between nodes is unbounded;
when the variable path length of any number of relationships from node1 to node2
is unlimited. With reference to the example graph in Figure 5.1, assume the
following query Q:
MATCH (node1)-[*]->(node2)
WHERE node1.name = ’Smith’ AND node2.age = ’38’
RETURN node2
To resolve the query the DET layer for name is required. We process the query
Q1 to allow the initial search for nodes to be executed when the path required in
the original query Q starts as:
MATCH (node1)-[*]->(node2)
WHERE node1.name = "j39IjDVyx/+"
RETURN node2 AS output
At this stage the output variable is used as an alias for the result column of
Q1 , j39IjDVyx/+ corresponds to the encryption of Smith. Further execution of
Chapter 5. Towards Implementation of TAEA 65
Q1 indicates that there are seven nodes in output using the filter node1.name =
"j39IjDVyx/+".
As soon as the lowering of the encryption level of the age property of the nodes
in the output variable to the DET layer has been done, we can process the second
part of the query Q, which is WHERE node2.age = "38". Next we execute Q2 , to
implement the next step of Q:
MATCH (node1)-[*]->(output)
WHERE output.age = ’V01kYVwG13GU’
RETURN node2
Where V01kYVwG13GU is the encryption of "38". The Proxy receives the en-
crypted results from the previous implementation:
{Z+NQr9J7iSRi,V01kYVwG13GU} and {0nPCg1bAxh8R,V01kYVwG13GU}; and
decrypts the results {Sara,38} and {Perry,38} and returns them to the user.
5.3.1 Datasets
Cypher clauses (writing clause to write the data to the database) to create the
required database according to a certain number of nodes and its relationships.
For instance, to create two nodes we use: CREATE (a: Person name: "Tom", age: "29",
gender: "Male") , (b: Person name: "Smith", age: "22", gender: "Male") and to create a
relationship between node (a) and node (b) we use (a)-[:KNOWS]->(b).
The system used for testing ran on Windows, version 10. It has an Intel Core 2
Duo CPU running at 3.40 GHz and has 16 GB of RAM. The benchmarking program
was the only application running when the results were created, but the machine
was connected to the Internet and standard system processes were running.
5.3.2 Queries
The process of testing the approach with respect to the datasets that, generally
speaking, with a variety of nodes and relationships number. By considering
the set of queries that performed in three modes: (1) the queries (Q1 − Q5 ) over
non-encrypted data; (2) the queries (Q01 − Q05 ) over encrypted data with simple
adjustment; and (3) the queries (Q001 − Q005 ) over encrypted data with traversal-aware
encryption adjustment. This particular set of queries was selected to test some
commonly used queries in graph databases, as mentioned below:
Q1 : Find all orphan nodes (no incoming edges and no outgoing edges).
MATCH (node)
WHERE not((node)-[ ]-())
RETURN node
Chapter 5. Towards Implementation of TAEA 67
MATCH (node1)-[:KNOWS]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’22’
RETURN node2.name, node2.age
MATCH (node1)-[:KNOWS]->(node2)-[:KNOWS]->(node3)
WHERE node1.name = ’Jones’ AND node2.age = ’47’ AND node3.gender = ’Female’
RETURN node3.name, node3.age, node3.gender
MATCH (node1)-[:KNOWS*1..3]->(node2)
WHERE node1.name = ’Jones’ AND node2.age = ’38’
RETURN node2.name, node2.age
MATCH (node1)-[:KNOWS*]->(node2)
WHERE node1.name = ’Jones’ AND node2.age = ’38’
RETURN node2.name, node2.age
Q01 : Find all orphan nodes (no incoming edges and no outgoing edges).
MATCH (node)
WHERE not((node)-[ ]-())
RETURN node
MATCH (node1)-[:KNOWS]->(node2)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),
ID(node1))AND node2.age= decryptRND(encryptRND(encryptDET(’22’),
ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
68 Nahla Naser Aburawi
MATCH (node1)-[:KNOWS]->(node2)-[:KNOWS]->(node3)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),
ID(node1)) AND node2.age = decryptRND(encryptRND(encryptDET(’47’),
ID(node2)),ID(node2)) AND node3.gender =
decryptRND(encryptRND(encryptDET(’Female’),ID(node3)) ,ID(node3))
RETURN decryptDET(node3.name), decryptDET(node3.age), decryptDET(node3.gender)
MATCH (node1)-[:KNOWS*1..3]->(node2)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Jones’),
ID(node1)),ID(node1)) AND node2.age = decryptRND(encryptRND(encryptDET(’47’),
ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
MATCH (node1)-[:KNOWS*]->(node2)
WHERE node1.name = decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),
ID(node1)) AND node2.age = decryptRND(encryptRND(encryptDET(’38’),
ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
Q001 : Find all orphan nodes (no incoming edges and no outgoing edges).
MATCH (node)
WHERE not((node)-[ ]-())
RETURN node
ID(node2))
return decryptDET(node2.name),decryptDET(node2.age)
MATCH (node1{name:decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),
ID(node1))})-[:KNOWS*1..3]->(node2)
SET node2.age= decryptRND(node2.age,ID(node2))
with node2
MATCH (node1:Person)-[:KNOWS*1..3]->(node2)
WHERE node2.age = decryptRND(encryptRND(encryptDET(’47’),ID(node2)),ID(node2))
return decryptDET(node2.name),decryptDET(node2.age)
MATCH (node1{name:decryptRND(encryptRND(encryptDET(’Jones’),
ID(node1)),ID(node1))})-[:KNOWS*]->(node2)
SET node2.age= decryptRND(node2.age,ID(node2))
with node2
MATCH (node1:Person)-[:KNOWS*]->(node2)
WHERE node2.age = decryptRND(encryptRND(encryptDET(’38’),ID(node2)),ID(node2))
return decryptDET(node2.name),decryptDET(node2.age)
Given the above, that execution of each of Q002 − Q005 , requires the execution
of several queries/updates (unlike the single query execution of non-encrypted
versions). In order to make a fair comparison we composed query/update parts
70 Nahla Naser Aburawi
of Q00i by using WITH clauses. Having WITH enabled the query parts to be chained
together, passing the outputs from one to be used as starting points or criteria in
the next. As in these queries, the first condition is WHERE node1.name=’value’
we need to adjust the encryption level of name to the DET layer to allow equality
checking. Take for example, Q002 , as follows:
In step (1), we have a MATCH clause to determine the direction of the relationship
and its depth. In step (2), as all values are held in the RND layer, we need to decrypt
the name property within the DET layer, in order to allow the equality checking.
In step (3), we lower the encryption of the age property for nodes in the previous
step. In step (4), by using WITH we can pass the previous result so that it becomes
the starting criteria to the next part of the query. In step (5) we determine the
direction of the relationship. In step (6) we implement the second condition. In
step (7) we return the result in plain text format.
5.3.3 Results
Each query was run over all databases that presented in Table 5.2 and execution
times (in milliseconds) were retrieval as detailed in Table 5.3. The Table 5.3 is
revealing in several ways. First retrieval times of queries (Q1 − Q5 ) over non-
encrypted data using different graph database sizes. Second queries (Q01 − Q05 ) over
encrypted databases using simple encryption adjustment. Third queries (Q001 − Q005 )
over encrypted databases using traversal-aware encryption adjustment.
From the results presented in the table 5.3 it can clearly be seen that the
queries Q1 , Q01 , and Q001 to find orphan nodes, show that the query Q01 increasing
in all five databases, albeit to widely varying degrees. Those nodes were iterated
Chapter 5. Towards Implementation of TAEA 71
through, checking each node for the presence of edges. While query Q001 shows
reasonable retrieval time compared with Q1 , the increase in the execution time of
the query in DB5 is normal, as the database grows.
Table 5.3: Retrieval times of queries (Q1 − Q5 ) over non-encrypted data using
different graph database sizes, queries (Q01 − Q05 ) over encrypted databases using
simple encryption adjustment, and queries (Q001 − Q005 ) over encrypted databases
using traversal-aware encryption adjustment, (time in millisecond).
XXX
XXX Database
XXX DB1 DB2 DB3 DB4 DB5
Query XX
XX
X
Q1 1 1 4 4 4
Q01 21 73 194 369 670
Q001 4 7 23 39 253
Q2 2 4 2 2 2
Q02 22 89 222 415 1,191
Q002 8 25 55 98 793
Q3 2 4 2 2 2
Q03 22 87 225 416 1,225
Q003 9 25 51 96 742
Q4 4 3 2 2 2
Q04 22 92 214 434 1,195
Q004 12 23 54 93 743
Q5 3 4 4 4 4
Q05 22 87 224 424 1,175
Q005 10 20 53 90 893
Table 5.4 shows that the retrieval times of encryption adjustment of (RND
layer and DET layer) using simple adjustment and traversal-aware encryption
adjustment in (milliseconds). Note that the decryption time has been taken into
account in the running times for the query (Qi ), such as: (Q’i = Qi + decryption time).
We observe from Table 5.4 that the retrieval times of encryption adjustment of the
DET layer by TAEA are less than the retrieval times of encryption adjustment by
simple adjustment, that because the traversal-aware encryption adjustment does
72 Nahla Naser Aburawi
not require encryption layer adjustment everywhere in the database only adjust
the values in a path that require to execute the query. While when using simple
encryption adjustment do an adjustment over all items in the database.
Table 5.4: Retrieval times of encryption adjustment using simple adjustment and
traversal-aware encryption adjustment (time in millisecond).
Database RND layer DET using simple adjustment DET using TAEA
DB1 31 20 6
DB2 34 72 5
DB3 122 192 19
DB4 188 368 35
DB5 1,108 645 238
For the queries (Q2 −Q5 ) the execution time was clearly faster, this was expected
since the queries over non-encrypted databases do not require any encryption layer
adjustment. On the other hand, when comparing the first queries set (Q02 − Q05 )
with the second query set (Q002 − Q005 ), it is clear that in (Q002 − Q005 ) the execution times
are faster than in (Q02 − Q05 ),
From an overall perspective, the retrieval time for non-encrypted databases
is small and roughly similar for all datasets. In both encrypted cases: (i) using
simple adjustment, it can be noted that the retrieval time increased as taking
the decryption times into account. (ii) for the encrypted case using traversal-
aware encryption adjustment, the execution time has clearly grown with the size
of the database, but remains in a practically feasible range (under a second)
for the largest considered dataset. As noted during the query execution that
the proposed approach of Aburawi et al. (Chapter 4) performed better than the
approach of simple encryption adjustment as reported earlier in terms of adjusting
the encryption layers from the RND layer to the DET layer.
Chapter 5. Towards Implementation of TAEA 73
5.4 Evaluation
This section presents the results obtained with respect to the evaluation of the
proposed TAEA based approach. Five sets of datasets were conducted, using
a variety of Cypher queries as described above. The first set of experiments
was conducted when applying over the non-encrypted database, the second set
when applying over encrypted graph databases using traversal-aware encryption
adjustment, while the third set when applying over encrypted graph databases
using simple encryption adjustment. The objectives of the evaluation were as
follows:
1. To determine the trade-off between data security protection and data pro-
cessing efficiency in both approaches.
Security
The considered case studies have shown the trade-off between simple and traversal-
aware encryption adjustment policies. The simple policy requires less queries and
updates to be followed, on the other hand, the traversal-aware policy provides
better security, as it reveals less information to a possible server-side attacker.
With the latter policy, as observed above not all age property values were adjusted
to the DET layer, just those required to allow the query execution to progress.
74 Nahla Naser Aburawi
Performance
Implementability
Similar to the methods in [4, 3, 34] and unlike the methods in [15, 52] the proposed
mechanism does not need to change the inner structure of the DBMS because
it is implemented as a set of layers above the DBMS. In particular, the proposed
approach is compatible with a concurrency control for multi-user DBMS, but
related security aspects and performance evaluation in a multi-user environment
need to be addressed in future work.
5.5 Summary
In this chapter we reported on the implementation and evaluation of traversal-
aware encryption adjustment mechanism for querying encrypted data in graph
databases. The fundamental idea of the approach was discussed in details in
Chapter 4. To evaluate the proposed mechanism varieties datasets in sub-section
5.3.1 are used. This was applied over the non-encrypted database and over the
encrypted database. In this manner, five datasets were created. The proposed
approach was tested using five types of Cypher query. The method provides better
security protection against server-side attacks while keeping good implement
ability and reasonable performance of query execution. In the next chapter
implementation of the TAEA based on extended databases is presented.
Chapter 6
6.1 Introduction
75
76 Nahla Naser Aburawi
6.2.1 Datasets
non-encrypted database
6.2.2 Queries
In order to investigate more about the proposed approach we considered the execu-
tion of the queries listed below in three styles: (1) non-encrypted; (2) encrypted with
simple adjustment; and (3) encrypted with traversal-aware encryption adjustment.
Firstly, we executed the queries (Q1 − Q3 ) to find friends-of-friends connections
to a depth of eight degrees over the three styles as stated before. The following
syntax uses to show nodes that have a variable number of relationships - [: TYPE
* min..max] ->. Here, min is by default 1 while max is infinity. Then execute the
query in two different ways using a single query pattern and using a multiple
query pattern. As was mentioned in the methods, the first step is to encrypt all
the values in the database to the RND layer, then adjust the layers based on the
approach that used.
First step: for both approaches. is to encrypt all the values in the database
to the RND layer by using this statement of the Cypher query.
match (n)
SET
n.name = encryptRND((encryptDET(n.name)),ID(n)),
n.age=encryptRND(encryptDET(n.age)),ID(n)),
n.gender=encryptRND((encryptDET(n.gender)),ID(n))
match (n)
SET
n.name = decryptRND(n.name, ID(n)),
n.age=decryptRND(n.age, ID(n)),
n.gender = decryptRND(n.gender, ID(n))
Second step: for TAEA. is to decrypt only values that related to the Cypher
query (in this case name) from RND layer to the DET layer as follows.
match (n)
SET n.name = decrypttRND (n.name,ID(n))
78 Nahla Naser Aburawi
MATCH (node1)-[:FRIEND_OF*1..2]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’35’
RETURN node2.name, node2.age
MATCH (node1)-[:FRIEND_OF*1..4]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’26’
RETURN node2.name, node2.age
MATCH (node1)-[:FRIEND_OF*1..6]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’40’
RETURN node2.name, node2.age
MATCH (node1)-[:FRIEND_OF]->(node2)
WHERE node1.name = ’Tom’ AND node2.age = ’22’
RETURN node2.name, node2.age
MATCH (node1)-[:FRIEND_OF]->(node2)-[:FRIEND_OF]->(node3)
WHERE node1.name = ’Jones’ AND node2.age = ’47’ AND node3.gender = ’Female’
RETURN node3.name, node3.age, node3.gender
MATCH (node1)-[:FRIEND_OF*1..2]->(node2)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),ID(node1))
Chapter 6. Implementation of the TAEA based on extended databases 79
AND
node2.age = decryptRND(encryptRND(encryptDET(’35’),ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
MATCH (node1)-[:FRIEND_OF*1..4]->(node2)
WHERE
node1.name = decrptRND(encryptRND(EncryptDET(’Tom’),ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’26’),ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
MATCH (node1)-[:FRIEND_OF*1..6]->(node2)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Tom’), ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’40’), ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
MATCH (node1)-[:FRIEND_OF]->(node2)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Tom’),ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’22’),ID(node2)),ID(node2))
RETURN decryptDET(node2.name), decryptDET(node2.age)
MATCH (node1)-[:FRIEND_OF]->(node2)-[:FRIEND_OF]->(node3)
WHERE
node1.name = decryptRND(encryptRND(encryptDET(’Jones’),ID(node1)),ID(node1))
AND
node2.age = decryptRND(encryptRND(encryptDET(’47’),ID(node2)),ID(node2))
AND
80 Nahla Naser Aburawi
node3.gender = decryptRND(encryptRND(encryptDET(’Female’),ID(node3)),ID(node3))
RETURN decryptDET(node3.name), decryptDET(node3.age), decryptDET(node3.gender)
MATCH (x {name:decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})
-[:FRIEND_OF*1..2]->(y)
SET y.age= decryptRND(y.age, ID(y))
with y
MATCH (x:Person)-[:FRIEND_OF*1..2]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’35’),ID(y)),ID(y))
return decryptDET(y.name), decryptDET(y.age)
MATCH (x {name:decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})
-[:FRIEND_OF*1..4]->(y)
SET y.age= decryptRND(y.age, ID(y))
with y
MATCH (x:Person)-[:FRIEND_OF*1..4]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’26’),ID(y)),ID(y))
return decryptDET(y.name), decryptDET(y.age)
MATCH (x {name:decryptRND(encryptRND(encryptDET(’Tom’),ID(x)),ID(x))})
-[:FRIEND_OF*1..6]->(y)
SET y.age= decryptRND(y.age, ID(y))
with y
MATCH (x:Person)-[:FRIEND_OF*1..6]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’40’),ID(y)),ID(y))
return decryptDET(y.name), decryptDET(y.age)
with y
MATCH (x:Person)-[:FRIEND_OF]->(y)
WHERE y.age = decryptRND(encryptRND(encryptDET(’22’),ID(y)),ID(y))
return decryptDET(y.name),decryptDET(y.age)
6.2.3 Results
In this sub-section, the results obtained from the execution of the above queries
with different sizes of databases is presented. Execution times were collected after
executing the queries over two datasets and noted in milliseconds, as presented in
Table 6.1. The table is interesting in several ways. First, the (upper part) provides
the execution times for non-encrypted data. Second, the (middle part) also gives
the execution times for encrypted data using simple encryption adjustment. Third,
the (lower part) presents the executing times for encrypted data using traversal-
aware encryption adjustment.
Table 6.1 highlights that the larger the data set, the longer it takes to find
matches. It can be easily observed from the values retrieved (upper part) that the
retrieval times of non-encrypted databases is less than the encrypted databases,
this was expected as the queries over non-encrypted databases do not need any
encryption layer adjustment.
To begin, execution times of queries Q01 - Q05 in ENC-S-DB1 (middle part) and
queries Q001 - Q005 in ENC-T-DB1 (lower part) showed similar retrieval time, with both
82 Nahla Naser Aburawi
Non-encrypted database
Query Q1 Q2 Q3 Q4 Q5
NON-DB1 193 95 150 41 31
NON-DB2 845 1075 1346 418 349
Encrypted database using simple adjustment
Query Q’1 Q’2 Q’3 Q’4 Q’5
ENC-S-DB1 3682 3864 4306 3084 3282
ENC-S-DB2 9245 9315 9375 8741 8631
Encrypted database using TAEA
Query Q"1 Q"2 Q"3 Q"4 Q"5
ENC-T-DB1 2984 3010 3058 2804 3077
ENC-T-DB2 9317 9378 9407 6978 6915
6.3 Evaluation
In this section the results obtained from the previous sub-section of the pro-
posed approach is presented. The approach provides better security protection
against server-side attacks while keeping good implementation and reasonable
performance of query execution.
TAEA approach shows a clear advantage over simple encryption adjustment.
In the case of simple encryption adjustment, it does not require many updates
Chapter 6. Implementation of the TAEA based on extended databases 83
Figure 6.1: Comparison between the proposed CryptGraphDB using simple ad-
justment and TAEA on different databases sizes.
queries. However, in the TAEA better security is provided, as was noted above
not all age property nor gender property values were adjusted to the DET layer,
just the required values to allow the query execution to progress. To evaluate the
proposed approach a set of two databases was created, the proposed approach
was tested using five types of Cypher queries. the evaluation was conducted by
measuring the execution time by some experiments. In both ENC-S-DB2 and
ENC-T-DB2, each query shows a reasonable execution time compared to the size
of the database
The results presented in the tables also indicate that TAEA not only provides
better security, but also are in some cases more efficient for query execution over
large databases. The reasons for that we have discussed in Chapter 5. Indeed,
the execution time of all queries using TAEA over ENC-T-DB1 is smaller than the
execution time of the same queries using a simple adjustment mechanism over
the same database. Furthermore, the execution time of queries Q004 and Q005 using
TAEA over ENC-T-DB2 is shorter than the execution time of the same queries Q04
and Q05 using a simple adjustment mechanism over ENC-S-DB2.
84 Nahla Naser Aburawi
6.4 Summary
In this chapter, the proposed CryptGraphDB approach in both cases simple adjust-
ment and TAEA has been presented. To evaluate the approach two databases in
different sizes are used. The retrieval times were collected based on implementing
a set of Cypher queries with different modes as reported previously. The reported
evaluation showed that not only does TAEA provide better security, but it is also
be more efficient over large databases for query execution. We observe from Table
6.1 that TAEA outperforms the simple encryption adjustment in most cases. The
next chapter concludes this thesis with a summary of the contributions and main
findings and some suggestions for future research directions.
Chapter 7
This chapter provides a summary of the work presented in this thesis, a review of
the primary findings related to the research question a review of the main findings
that related to the research question and research issues identified in Chapter 1
and some suggestions for the future work directions.
7.1 Summary
In this thesis, a flexible mechanism for the execution of queries over encrypted
graph databases called CryptGraphDB has been proposed. It is to utilize multi-
layered encryption and encryption adjustment to provide a reasonable trade-off be-
tween data security protection and data processing efficiency. The CryptGraphDB
approach considered in the context of both simple encryption adjustment that di-
rected at the static adjustment context and traversal-aware encryption adjustment
TAEA that directed at adjusting the encryption layers in the dynamic context.
The thesis commenced, in Chapter 2 with a literature review of previous work
relevant to the work done in the thesis. The following chapters covered the proposed
CryptGraphDB approach. These chapters were all structured in a similar way
starting with a description of the proposed approach and ending with a review of
the evaluation conducted for each approach. The proposed approach was tested
using a set of Cypher queries and different sizes of graph databases
In more detail Chapter 3 presented the CryptGraphDB approach designed
to query the encrypted graph database. The fundamental idea was to use the
85
86 Nahla Naser Aburawi
The main findings from the work presented in this thesis are provided in this
chapter. Consider the original research question from Chapter 1 was "How to
query encrypted graph database?" Further, more detailed question: How do we
encrypt graph database and apply encrypted queries on it using CryptGraphDB
approach?". In Chapter 1 a number of additional questions were also assumed.
Before returning the primary overriding research question, each of these will be
regarded as follows:
1. What are the features of CryptDB that can best be adopted to be implemented
on graph databases? The answer to this is that we re-used SQL-aware
encryption strategy to be implemented on graph databases as Cypher-aware
encryption strategy. An current encryption scheme was used to implement a
encryption that enables the processing of Cypher queries. Also, adjustable
Chapter 7. Conclusions and Future Works 87
3. Given a set of the Cypher queries how should these queries implemented over
encrypted graph databases by using the proposed approach? A prototype
scheme has been introduced to evaluate the efficiency of traversal-aware en-
cryption adjustment. To build this prototype, a set of User-Defined Functions
(UDFs) were created and called in the same manner as any other Cypher
function. To allow the multi-layered encryption and encryption adjustment.
In Chapters 5 and 6, this was reported in detail.
• In general, for small and medium size of databases simple encryption adjust-
ment is more efficient than traversal-aware scheme in some cases. For the
large databases, traversal-aware adjustment can be more efficient, depending
on the data and the query.
The research described in this thesis has indicated a number of potential research
directions for the future. These research directions are briefly introduced in the
concluding section of this thesis as follows.
against input database-graphs. This will more easily extend the TAEA-
approach to bigger datasets and may result in a major publication in applied
databases (e.g., SIGMOD).
[3] Nahla Aburawi, Frans Coenen, and Alexei Lisitsa. Traversal-aware encryp-
tion adjustment for graph databases. Proceedings of 7th International Confer-
ence on Data Science, Technology and Applications (DATA 2018), 1(1):381–387,
2018.
[4] Nahla Aburawi, Alexei Lisitsa, and Frans Coenen. Querying encrypted graph
databases. In 4th International Conference on Information Systems Security
and Privacy, pages 447–451. Proceedings, 2018.
[5] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu.
Order preserving encryption for numeric data. In Proceeding SIGMOD ’04
Proceedings of the 2004 ACM SIGMOD international conference on Manage-
ment of data, pages 563–574. Proceedings, 2004.
[6] Jaafer Al-Saraireh. An efficient approach for query processing over encrypted
database. In Journal of Computer Science, pages 548–557. Journal, 2017.
[7] Shaukat Ali, Azhar Rauf, and Saeed Mahfooz. Update query over encrypted
data. In International Conference on Computer Networks and Information Tech-
nology, pages 279–282. Proceedings, 2011.
[8] Maryam Almarwani, Boris Konev, and Alexei Lisitsa. Flexible access control
and confidentiality over encrypted data for document-based database. In Pro-
91
92 Nahla Naser Aburawi
[9] Arvind Arasu, Ken Eguro, Raghav Kaushik, and Ravishankar Ramamurthy.
Querying encrypted data. Proceedings of the 2014 ACM SIGMOD International
Conference on Management of Data, 3(1):1259–1261, 2014.
[10] Mihir Bellare, Alexandra Boldyreva, and Adam O’Neill. Deterministic and
efficiently searchable encryption. In Proceedings of the 27th International
Cryptology Conference (CRYPTO), pages 535–552. Proceedings, 2007.
[11] Dobre Blazhevski, Adrijan Bozhinovski, Biljana Stojchevska, and Veno Pa-
chovski. Modes of operation of the aes algorithm. In The 10th Conference for
Informatics and Information Technology, pages 212–216. Proceedings, 2013.
[12] Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam O’Neill. Order
preserving symmetric encryption. In Proceedings of the 28th Annual Interna-
tional Conference on the Theory and Applications of Cryptographic Techniques
(Eurocrypt), pages 224–241. Proceedings, 2009.
[13] Alexandra Boldyreva, Serge Fehr, and Adam O’Neill. On notions of security
for deterministic encryption, and efficient constructions without random ora-
cles. In Proceedings of the 28th International Cryptology Conference (CRYPTO),
pages 335–359. Proceedings, 2008.
[14] Rik Van Bruggen. Learning Neo4j. Packt Publishing Ltd., Birmingham, UK.,
1st edition, 2014.
[15] Melissa Chase and Seny Kamara. Structured encryption and controlled
disclosure. In Proceedings of Advances in Cryptology - ASIACRYPT, pages
577–594. Proceedings, 2010.
[16] Michael Cooney. Ibm touts encryption innovation; new technology performs
calculations on encrypted data without decrypting it. In Computer World,
2009.
[17] Carlo Curino, Raluca A. Popa Evan P. C. Jones, Nirmesh Malviya, Eugene Wu,
Sam Madden, Hari Balakrishnan, and Nickolai Zeldovich. Relational cloud:
Bibliography 93
[18] Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. Searchable
symmetric encryption: improved definitions and efficient constructions. In
Proceedings of the 13th ACM conference on Computer and communications
security, pages 79–88. Proceedings, 2006.
[20] Anand Desai. New paradigms for constructing symmetric encryption schemes
secure against chosen-ciphertext attack. In Proceedings of the 20th Annual
International Conference on Advances in Cryptology, pages 394–412. Proceed-
ings, 2000.
[21] Swathi Edem, Gupta Vivek, and G. Sandhya Rani. Role of hash function in
cryptography. National Conference on Computer Security, Image Processing,
Graphics, Mobility and Analytics), 1(1), 2016.
[22] Niels Ferguson. Aes-cbc + elephant diffuser a disk encryption algorithm for
windows vista. IOSR Journal of Engineering (IOSRJEN), 3(8), 2006.
[26] Jose Guia, Valeria G. Soares, and Jorge Bernardino. Graph databases: Neo4j
analysis. In Proceedings of the 19th International Conference on Enterprise
Information Systems (ICEIS 2017), volume 1, pages 351–356. Proceedings,
2017.
94 Nahla Naser Aburawi
[27] Florian Kerschbaum, Martin Härterich, Patrick Grofig, Mathias Kohler, An-
dreas Schaad, Axel Schröpfer, and Walter Tighzert. Optimal re-encryption
strategy for joins in encrypted databases. In Data and Applications Security
and Privacy, pages 195–210. Proceedings, 2013.
[28] Lianzhong Liu and Jingfen Gai. A method of query over encrypted data in
database. In International Conference on Computer Engineering and Technol-
ogy, pages 23–27. Proceedings, 2009.
[30] Harshini Selva Mohan and Abbadi Raji Reddy. Revised aes and its modes of
operation. In International Journal of Information Technology and Knowledge
Management, pages 31–36. Journal, 2012.
[34] Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari
Balakrishnan. Cryptdb: Protecting confidentiality with encrypted query
processing. 23rd ACM Symposium on Operating Systems Principles, 1(1):85–
100, 2011.
[35] Raluca Ada Popa and Nickolai Zeldovich. Cryptographic treatment of cryptdb’s
adjustable join. Technical Report MIT-CSAIL-TR-2012-006, Computer Science
and Artificial Intelligence Laboratory, 6(1):289–300, 2012.
Bibliography 95
[36] Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan. Cryptdb: A
practical encrypted relational dbms. In Computer Science and Artificial Intelli-
gence Laboratory, Cambridge, MA. Technical Report MIT-CSAIL-TR-2011-005,
2011.
[39] Ian Robinson, Jim Webber, and Emil Eifrem. Graph Databases. O’Reilly
Media, Inc., United States of America, 1st edition, 2013.
[40] Eyad Saleh, Ahmad Alsa’deh, Ahmad Kayed, and Christoph Meinel. Processing
over encrypted data: Between theory and practice. In SIGMOD, pages 5–16.
Newsletter, 2016.
[41] Muhammad Sarfraz, Mohamed Nabeel, Jianneng Cao, and Elisa Bertino.
Dbmask: Fine-grained access control on encrypted relational databases.
In 5th ACM Conference on Data and Application Security and Privacy, pages
1–11. Proceedings, 2015.
[43] Dawn Xiaodong Song, David Wagner, and Adrian Perrig. Practical techniques
for searches on encrypted data. In IEEE Symposium of Security and Privacy,
pages 44–. Proceedings, 2000.
[48] Stephen Tu, M. Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich.
Processing analytical queries over encrypted data. Proceedings of the 39th In-
ternational Conference on Very Large Data Bases (VLDB), 6(1):289–300, 2013.
[49] Chad Vicknair, Michael Macias, zhendong Zhao, Xiaofei Nan, Yixin Chen,
and Dawn Wilkins. A comparison of a graph database and a relational
databas. In Proceedings of the 48th Annual Southeast Regional Conference,
pages 42:1–42:6. Proceedings, 2010.
[50] Aleksa Vukotic, Nicki Watt, Tareq Abedrabbo, Dominic Fox, and Jonas Partner.
Neo4j in Action. Manning Publications Co., Shelter Island, NY, USA, 1st edition,
2015.
[51] Harsha Vyawahare, Pravin P. Karde, and Vilas M. Thakare. A hybrid database
approach using graph and relational database. In Proceedings of the 3rd IEEE
International Conference on Research in Intelligent and Computing in Engineer-
ing, (RICE), pages 01–04. Proceedings, 2018.
[52] Pengtao Xie and Eric Xing. Cryptgraph: Privacy preserving graph analytics
on encrypted graph. In arXiv:1409.5021. Journal, 2015.