0% found this document useful (0 votes)
49 views

Improving Cuckoo Hashing With Perfect Hashing

Uploaded by

Mahmoudghm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Improving Cuckoo Hashing With Perfect Hashing

Uploaded by

Mahmoudghm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Int'l Conf. Software Eng.

Research and Practice | SERP'17 | 143

Improving Cuckoo Hashing with Perfect Hashing


Moulika Chadalavada and Yijie Han
School of Computing and Engineering
University of Missouri at Kansas City
5100 Rockhill Road
Kansas City, MO 64110, USA.
[email protected], [email protected]

Hashing. In this paper, we use Perfect Hashing to improve


Abstract - This paper mainly aims at improving Cuckoo
Cuckoo Hashing in terms of memory utilization and allocating
hashing by using Perfect Hashing to store the keys in memory
memory based on frequency of keys.
based on frequency. Perfect Hashing is fast and hit ratio is
high in Perfect Hashing. Cuckoo Hashing has high memory
usage in allocating keys to its memory. So, combining Cuckoo 2 What is Cuckoo Hashing
Hashing and Perfect Hashing will increase the keys hit ratio. Cuckoo hashing uses a number d of hash tables and an
element x can be placed in those tables 1, 2, …d in positions
Keywords: Tree, Hashing, Cuckoo Hashing, Perfect h1(x), h2(x), …... where hi(x) is hash functions [1]. The main
Hashing, Algorithms difference between d-left hashing [5] and Cuckoo hashing is
that, in d-left hashing when all positions are occupied the new
element cannot be inserted and limits memory usage. But in
1 Introduction Cuckoo Hashing the elements in occupied position is moved
There are many hashing techniques that aim at storing to their alternative positions to insert new element. There are
keys in memory to increase key access efficiency and to make many implementations of Cuckoo Hashing which aims at
hashing efficient. In network applications packet classification increasing throughput [1]. The first case is Serial
plays very prominent role [2]. One option to increase implementation in which tables are accessed serially and in
throughput is to use the algorithms based on hashing [3]. Hash Parallel implementation, the table is selected in random.
Table or Hash Map is a data structure that is used in
implementing structure that can map keys to values. A hash In Pipeline architecture [1], searching access each
table uses Hash Function to compute index into array, from memory sequentially i.e. when current option moved to
which the required values are found. The main disadvantage memory-2, the second search operation can start accessing
of Hash Tables is that it maps multiple keys to same index memory-1. In this pipeline implementation when one search is
thus results in collision in hashing. To handle this issue many successful other memories need not have to be accessed. In
hashing techniques are introduced to avoid collisions in Parallel d-pipeline [1] each pipeline has different entry point
allocating memory. which allows the user to insert an element to any table idle in
that cycle. If an element is in first pipeline and match found in
Cuckoo Hashing is one of the hash table schema which Table-1 then in next cycle element will be inserted in Second
provides high memory utilization and constant access time pipeline to make use of Table-2.
[4]. Cuckoo Hashing mainly aims at reducing collisions and
optimizing the throughput. There were many implementations 3 What is Perfect Hashing
in Cuckoo Hashing such as Serial Implementation, Parallel
Implementation, Parallel Pipeline Implementation, Parallel d- A Perfect Hash function for a set S is a hash function
Pipeline Implementation [1]. that maps distinct elements in S to a set of integers, with no
collisions. Minimal Perfect Hashing [9] guarantees that n keys
Perfect Hash Function is a hash function which maps will map to 0..n-1 with no collisions at all. Given set of n
distinct element of subset S to set of integers with no keys, a static hash table of size m=O(n) can be constructed
collision. However, in perfect hashing the set of keys to be such that Search takes O(1) time in the worst case. A perfect
hashed must be provided to create the hash function. In hash function can be used with limited range of values used
mathematical term, it is total injective function. This hash for efficient lookup operations, this can be done by placing
function is used in implementing lookup table with constant keys from subset S in lookup table indexed by function’s
worst-case access time. There are many hash functions that output. Then one can test whether key present in S, by looking
are like Perfect Hashing but the main advantage is that no at its cell of table and each lookup takes constant time in
collision resolution should be implemented in Perfect worst case.

ISBN: 1-60132-468-5, CSREA Press ©


144 Int'l Conf. Software Eng. Research and Practice | SERP'17 |

As discussed above, Perfect hashing is a technique for value of f(x) in within {0, 1, …, |S|2-1}. Thus, no matter x is
building hash table with no collisions. This is possible when in S or not, f(x) will always return a value in {0, 1, …, |S|2}.
all the keys are known in advance. Minimal Hashing means We then use the value of f(x) to index into a table T that
the resulting hash table contains only one entry for each stores the memory module number for f(x) value. Thus, if x is
known key and no empty slots exists. To insert keys to slots in S then we find the correct memory module that stores x.
two levels of hash functions are used [9]. First is H (key), Say x is in S and the memory module for store x is Ma. Then
hash the key that gets position in intermediate array G. The T[f(x)] =a. After we know memory module Ma, we then use a
second function, F (d, key) uses extra information of G to find hash function h for Ma to find the location h(x) of x in Ma. If x
unique position of the key. This scheme always returns value, is not in S. Then we will first use f(x)=a to find memory
if we know for sure that the key we are searching for is in the module Ma. We then use h(x) to locate x in Ma. Three
table. Otherwise, it returns bad information. situations can happen here. The first situation is that h(x)=h(y)
for a y in S. Thus h(x) and h(y) collides. Thus, we know that x
How the Intermediate Value can be found in Perfect is a less frequent key. We can then go to the memory module
Hashing? [9] Mb for storing less frequent keys and hash and rehash x there
to identify whether x is already in Mb or need to be inserted
1. We keep keys into buckets according to first hash into Mb. The second situation is that no key is at position h(x)
function, H (key). or x is stored at h(x) position of Ma. This can happen because
f(x) =f(y) for y in S and thus T[f(x)]=T[f(y)]=a and therefore
2. Then we process the buckets largest first and try to place we are going to go to the same memory module Ma for both of
all the keys it contains in an empty slot. If that is them. However, h(x)≠h(y). Thus, if h(x) position is vacant we
unsuccessful, we keep trying with successively larger then store x at h(x) position of Ma. If x is already in the h(x)
values of d. It sounds like it would take a long time, but it position of Ma then we found x in Ma. The third situation is
doesn't. Since we try to find the d value for the buckets that h(x)=h(y) for a y not in S while y has already been in the
with the most items early, they are likely to find empty position h(y) in Ma. In this situation, again x is a less frequent
spots. When we get to bucket with just one item, we can key and we need to go to the memory for less frequent keys to
simply place them into the next unoccupied spot. [5] locate x.

Also note that f(x) has |S|2 values and only |S| values
4 Use of Perfect Hashing to improve correspond to keys in S and the other |S|2-|S| values don’t
correspond to any key x in S. Thus these |S|2-|S| values for f(x)
Cuckoo Hashing correspond to less frequency keys. We can set T[f(x)] to
memory for storing less frequency keys. In this way frequently
occurred keys in S will be identified in constant time. For
4.1 Allocating Key to Memory those keys not in S their hash value may have collision with
We show how to use perfect hashing to improve Cuckoo the keys in S. Since these keys are less frequently occurred
hashing by considering the frequency of keys. We cannot and therefore we can afford more hashing and rehashing time
anticipate all possible keys because the set of keys is a huge for them. Where as in perfect hashing the hashing is fast and
set. For example, if keys are limited to no more than 20 letters hit ratio is high. In perfect hashing, all the keys in the subset S
then the set of keys has size 12720 which is a huge size set. is known. Initially hash f needs to be performed on each key
However, we can put known frequently encountered keys into which returns the frequency of the key and the memory
a set S and then map the keys in S to memory modules by module for it. Each key is assigned to memory module via the
using perfect hashing function f. Such a perfect hashing hash table value for it. Based on the frequency of respective
function can be obtained in O(|S|2b) time [7], where b is the key, the keys are stored in memory modules. The keys with
number of bits to represent a key in S. After f is obtained, f(x) highest frequency are stored in Memory-1 and the lower
for a key x can be computed in constant time [7]. In Cuckoo frequency keys are stored in next memory. If there are any
hashing every key is assumed to have the same priority. Here non-frequent words, then they can be stored in Separate
we analyze the set S of frequently encountered keys and store Memory.
high frequency keys together in a memory module. Because
there are few keys with high frequency and more keys with For example, let’s take below famous sentence stated by
less frequency we may, say, store keys with frequency above Fredrick P. Brooks Jr.
50% in memory module 0, store keys with frequency 20% to
50% in memory module 1, store keys with frequency 5% to “There is no single development, in either technology or
20% in memory module 2, and store the keys with frequency management technique, which by itself promises even one
less than 5% in memory module 3. order-of-magnitude improvement within a decade in
productivity, in reliability, in simplicity.”
The architecture of our scheme is, for an input key x,
first compute its perfect hash value f(x). According to [7] the

ISBN: 1-60132-468-5, CSREA Press ©


Int'l Conf. Software Eng. Research and Practice | SERP'17 | 145

From above sentence the frequently occurring word is ‘in’, Also explained how this mechanism is used to increase keys
that has count of 4 and all other words has count of ‘1’. As per hit ratio and to reduce memory usage. Key lookup in memory
our problem statement each keyword is hashed and distinct based on its frequency will be fast and new key insertion to
hash values from {0,1, 2………., n2} is assigned to each memory also becomes easy with this mechanism.
word. An index table is maintained to store both frequency
and Memory location for respective word. As ‘in’ is more 6 References
frequently occurred word, it is stored in Memory-1 and other
with less frequency are stored in Memory-2. Once the
hashing, memory allocation and updating of index table is [1] S. Pontarelli, P.Reviriego, J.A.Maestro, Parallel d-
completed for all words look up for any word in the memory Pipeline: A Cuckoo Hashing Implementation for Increased
becomes easy. Throughput, IEEE Transactions on Computers, vol. 65, 326-
331(2016).
4.2 Adding new Key to Memory
[2] P. Gupta and N. McKeown, “Algorithms for packet
All the keys are known in the set, so once the memory classification,” IEEE Network, vol. 15, no. 2, pp. 24–32,
allocation is completed any key can be looked up in memory 2001.
based on frequency. If a new key which is unknown has to be
stored in memory, first hashing is performed. If the key has [3] A. Kirsch, M. Mitzenmacher, and G. Varghese, “Hash-
highest frequency, then it is looked up in memory that stores based techniques for high-speed packet processing,” in
keys with high frequencies or if it has less frequency then it Algorithms for Next Generation Networks. London, U.K.:
will be looked up in memory with low frequencies. If the new Springer, 2010, pp. 181–218.
key is not a frequent key, then it will be stored in memory
which stores non-frequent keys. [4] R. Pagh and F. F. Rodler, “Cuckoo hashing,” Int. J.
Algorithms, vol. 51, no. 2, pp. 122–144, 2004.
Because of this mechanism, the hashing, key storage,
memory utilization and key look up is performed very [5] A. Broder and M. Mitzenmacher, “Using multiple hash
efficiently. When compared to Cuckoo Hashing this functions to improve IP lookups,” in Proc. 20th Annu. Joint
mechanism is more efficient as hashing is performed fast and Conf. IEEE Comput. Commun. Soc., 2001, vol. 3, pp. 1454–
cleverly follows memory utilization. To explain this 1463.
mechanism with example let us consider below table that
contains character’s list and its respective hashed frequencies. [6] R. Raman, The Power of Collision: Randomized Parallel
Algorithms for Chaining and Integer Sorting, Proceedings of
Table 1: Perfect Hashing Indexing Table the Tenth Conference on Foundations of Software
Technology and Theoretical Computer Science, 9-11(1990).
Character Hashed Frequency Memory
a 0.1 % Memory-2 [7] R. Raman. Priority queues: small, monotone and trans-
dichotomous. Proc. 1996 European Symp. on Algorithms,
b 0.4 % Memory-2 Lecture Notes in Computer Science 1136, 121-137(1996).
c 4% Memory-1
[8] https://en.wikipedia.org/wiki/Perfect_hash_function
d 5% Memory-1
[9] Belazzougui D., Botelho F.C., Dietzfelbinger M., Hash,
e 0.3 % Memory-2
Displace, and Compress. In: Fiat A., Sanders P. (eds)
Algorithms - ESA 2009. ESA 2009. Lecture Notes in
Computer Science, vol 5757. Springer, Berlin, Heidelberg,
As per above table Character c & d have highest 2009.
frequencies with 4% and 5% respectively. So, these two
characters are stored in Memory-1. Whereas characters a, b, e
has less frequencies with 0.1%, 0.4%, 0.3% respectively,
therefore these 3 keys are stored in next memory i.e. Memory-
2. The index table is maintained that stores hashed value of
each character and its respective Memory Location.

5 Conclusions
In this paper, we explained how perfect hashing can be
used to improve Cuckoo Hashing with frequency of keys.

ISBN: 1-60132-468-5, CSREA Press ©

You might also like