0% found this document useful (0 votes)
19 views41 pages

L15 Maps and Hashes

Uploaded by

Jessica Milner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views41 pages

L15 Maps and Hashes

Uploaded by

Jessica Milner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Maps and Hashes

Associative Array
What is a Map
A map models a searchable collection of key-value entries

The main operations of a map are for searching, inserting, and deleting items

Multiple entries with the same key are not allowed

Applications:

● address book
● student-record database

You may see them also called: associative array, map, symbol table, or dictionary
Entry ADT
An entry stores a key-value pair
(k,v)

Methods:

● key(): return the associated


key
● value(): return the
associated value
● setKey(k): set the key to k
● setValue(v): set the value
to v
Map ADT
find(k): if the map M has an entry with key k, return an iterator to it; else, return
special iterator end

put(k, v): if there is no entry with key k, insert entry (k, v), and otherwise set its
value to v. Return an iterator to the new/modified entry

erase(k): if the map M has an entry with key k, remove it from M

Other: size(), empty()

Iterators: begin(), end(): return iterators to beginning and end of M


Map Example
find(k): return an entry with
key k, else return special
iterator end

put(k, v): if no entry with


key k, insert entry(k, v), else
set its value to v

erase(k): if an entry with


key k, remove it
// map interface
template <typename K, typename V>
class Map {
public:
class Entry; // a (key,value) pair
class Iterator; // an iterator (and position)

Interface
int size() const; // number of entries
bool empty() const; // is the map empty?
Iterator find(const K& k) const; // find entry with key k
Iterator put(const K& k, const V& v); // insert or replace

Map ADT // remove entry with key


void erase(const K& k) throw(NonexistentElement);
void erase(const Iterator& p); // erase entry at p

Iterator begin(); // iterator to first entry


Iterator end(); // iterator to end entry
};
How to Implement a Map
A simple implementation of a map can be done using an unsorted list

We store the items of the map in a list S (based on a doubly-linked list), in


arbitrary order
Algorithm find(k) {

Find
for each p in [S.begin(), S.end()) {
if p->key() == k {
return p
}
}
Map via a List (Pseudocode) return S->end() // there is no entry with key equal to k
}
Algorithm put(k,v) {
for each p in [S.begin(), S.end()) {
if p->key() == k {

Put
p->setValue(v)
return p
}
}
p = S.insertBack((k,v)) // there is no entry with key k
Map via a List (Pseudocode) n = n + 1 // increment number of entries
return p
}
Algorithm erase(k) {

Erase
for each p in [S.begin(), S.end()) {
if p.key() == k {
S.erase(p)
n = n – 1 // decrement number of entries
}
Map via a List (Pseudocode) }
}
Problems with the Unsorted List Map
Performance:

● put() takes O(n) time since we need to determine whether it is already in the
sequence
● find() and erase() take O(n) time since in the worst case (the item is not found)
● We traverse the entire sequence to look for an item with the given key

The unsorted list implementation is effective only for small size or for maps in which puts
are the most common operations

● Where searches and removals are rarely performed since we may find a key early in
the sequence
● Example: historical record of logins to a system
Hash Tables for Maps
Data Structures: Hash Tables

Learn the basics of Hash Tables, one of the most useful data structures for solving interview questions. This video is a part of HackerRank's Cracking The Coding Interview Tutorial with
Gayle Laakmann McDowell. http://www.hackerrank.com/domains/tutorials/cracking-the-coding-interview?utm_source=videoutm_medium=youtubeutm_campaign=ctci
Maps with Hash Tables
A hash function h maps keys of a given type to integers in a fixed interval [0, N
− 1]

Example: h(x) = x mod N is a hash function for integer keys

The integer h(x) is called the hash value of key x

A hash table for a given key type consists of

● Hash function h
● Array (called table) of size N

When implementing a hash table, the goal is to store item (k, v) at index i =
h(k)
Example of Hash Tables
We design a hash table for a map storing
entries as (SSN, Name),
where SSN (social security number) is a nine-
digit positive integer

The hash table uses an array of size N =


10,000 and the hash function:

● h(x) = last four digits of x


Hash Bucket Array
A bucket array for a hash table is an array A of size N, where each cell of A is a
collection of key-value pairs (commonly called a “bucket”)

The integer N defines the capacity of the array.

If the keys are integers well distributed in the range [0, N − 1], this bucket array
is all that is needed.

An entry e with key k is simply inserted into the bucket A[k]


Hash Bucket Array
If keys are unique integers in the range [0, N − 1], then each bucket holds at most one entry.

● Thus, searches, insertions, and removals in the bucket array take O(1) time.

Drawbacks:

● The space used is proportional to N and if we don’t have many entries, we waste space
● Keys are required to be integers in the range [0, N − 1], which is often not the case.

We normally use a bucket array in conjunction with a “good” mapping from the keys to the integers
Hash Functions
A hash function is usually specified as the composition of two functions:

● Hash Code: h1: keys ↦ integers

● Compression Function: h2: integers ↦ [0, N − 1]

The hash code is applied, and then compression function is applied on the result

● For example: h(key) = h2(h1(key))

The goal of the hash function is to disperse the keys in a seemingly random way
Hash Code Function
Memory address:

● Memory address of the key object as an integer


● Good in general, except for numeric and string keys

Integer cast:

● Bits of the key as an integer


● Suitable for lengths less than or equal to the number of bits of integer type

Component sum:

● We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits)
● We sum the components (ignoring overflows)
● Suitable for fixed lengths greater than or equal to the number of bits of the integer type
Hash Compression Function
Division: h2(y) = y mod N

● The size N of the hash table is usually chosen to be a prime


● To minimize collisions, it is important to reduce the common factors between
N and the elements of y
● We explore why on the next slide

Multiply, Add and Divide (MAD): h2(y) = (ay + b) mod N

● a and b are nonnegative integers such that a mod N ≠ 0


● Otherwise, every integer would map to the same value b
Division Compressions and Primes
K1 K2 Hash Function: h(k) = k % N where N = 4
0 0

1 4

2 8
// K1 Compress // K2 Compress
3 12
{ {
4 16 0: [0, 4, 8], 0=>[0, 4, 8, 12, 16, 20, 24, 28, 32, 36],
1: [1, 5, 9], 1=>[],
5 20 2: [2, 6], 2=>[],
6 24
3: [3, 7] 3=>[]
} }
7 28

8 32

9 36
Division Compressions and Primes
K1 K2 Hash Function: h(k) = k % N where N = 7
0 0

1 4
// K1 Compress // K2 Compress
2 8
{ {
3 12 0=>[0, 7], 0=>[0, 28],
1=>[1, 8], 1=>[8, 36],
4 16 2=>[2, 9], 2=>[16],
5 20
3=>[3], 3=>[24],
4=>[4], 4=>[4, 32],
6 24 5=>[5], 5=>[12],
6=>[6] 6=>[20]
7 28
} }
8 32

9 36
Collision Handling
Collisions occur when different elements are
mapped to the same cell

Separate Chaining: let each cell in the


table point to a linked list of entries that map
there

Separate chaining is simple, but requires


additional memory outside the table

Load Factor: if there are n entries of our


map in a bucket array of capacity N, we
expect each bucket to be of size n/N
Question:
What is the load
factor for this array?
What does this tell
you about the hash
table or hash
function?
Question:
n = 10 items
N = 13 slots
n/N = 10/13 or 0.77
Collision Handling
Open addressing: the colliding item is placed in a
different cell of the table

One form of open addressing is Linear probing

𝐴[ 𝑖 + 1 𝑚𝑜𝑑 𝑁]

Collisions handled by placing the colliding item in the


next (circularly) available table cell

Interval between probes is fixed — often simply 1

Each table cell inspected is referred to as a “probe”

Colliding items lump together, causing future collisions


to cause a longer sequence of probes
Collision Handling
Other methods for collision handling include:

Quadratic probing:

● The interval between probes increases quadratic (hence, the indices are described by a
quadratic function, opposed to linear probing with increases by a fixed interval)
𝐴 𝑖 + 𝑓 𝑗 𝑚𝑜𝑑𝑁 , 𝑓𝑜𝑟 𝑗 = 0,1,2, … , 𝑁, 𝑤ℎ𝑒𝑟𝑒𝑓 𝑗 = 𝑗 2

Double hashing:

● The interval between probes is fixed for each record but is computed by another hash
function

𝐴 𝑖 + 𝑓 𝑗 𝑚𝑜𝑑𝑁 , 𝑓𝑜𝑟 𝑗 = 1,2, … , 𝑁, 𝑤ℎ𝑒𝑟𝑒𝑓 𝑗 = 𝑗 ∙ 𝑓 ′ 𝑘 , 𝑖 = ℎ(𝑘)


template <typename K, typename V, typename H>
class HashMap {
public:
typedef Entry<const K,V> Entry; // a (key,value) pair
class Iterator; // a iterator/position

HashMap(int capacity = 100); // constructor


Iterator find(const K& k); // find entry key k
Iterator put(const K& k, const V& v);// insert/replace

Interface
void erase(const K& k); // remove entry key k
void erase(const Iterator& p); // erase entry at p
Iterator begin(); // iterator first entry
Iterator end(); // iterator end entry

HashTable Map // Some functions left out and utilities (next slide)
protected:
typedef std::list<Entry> Bucket; // a bucket of entries
typedef std::vector<Bucket> BktArray; // a bucket array
private:
int n; // number of entries
H hash; // the hash comparator
BktArray B; // bucket array
};
// find utility
Iterator finder(const K& k);

// insert utility
Iterator inserter(const Iterator& p, const Entry& e);

// remove utility
void eraser(const Iterator& p);

More Functions // bucket iterator


typedef typename BktArray::iterator BItor;

// entry iterator
HashTable Map typedef typename Bucket::iterator EItor;

// bucket's next entry


static void nextEntry(Iterator& p) { ++p.ent; }

// end of bucket?
static bool endOfBkt(const Iterator& p) {
return p.ent == p.bkt->end();
}
// a (key, value) pair
template <typename K, typename V>
class Entry {
Public:
// constructor

Entry Class
Entry(const K& k = K(),const V& v = V()) :_key(k),_value(v) {}

const K& key() const { return _key; } // get key


const V& value() const { return _value; } // get value

HashTable Map void setKey(const K& k) { _key = k; }


void setValue(const V& v) { _value = v; }
// set key
// set value
private:
K _key; // key
V _value; // value
};
// an iterator (& position)
class Iterator {
private:
EItor ent; // which entry
BItor bkt; // which bucket
const BktArray* ba; // which bucket array

Iterator Class public:


Iterator(const BktArray& a, const BItor& b,
const EItor& q = EItor()) : ent(q), bkt(b), ba(&a) { }

HashTable Map Entry& operator*() const;


bool operator==(const Iterator& p) const;
// get entry
// iterators equal?

Iterator& operator++(); // advance to next entry


friend class HashMap; // give HashMap access
};
// are iterators equal?
template <typename K, typename V, typename H>
bool HashMap<K,V,H>::Iterator::operator==(const Iterator& p)
const {

Iterator ==
// ba (Bucket Array) or bkt (Bucket) differ?
if (ba != p.ba || bkt != p.bkt) return false;

// both at the end?


else if (bkt == ba->end()) return true;
HashTable Map
// else use entry to decide
else return (ent == p.ent);
}
// advance to next entry
template <typename K, typename V, typename H>
typename HashMap<K,V,H>::Iterator&
HashMap<K,V,H>::Iterator::operator++() {
// next entry in bucket
++ent;

// at end of bucket?
if (endOfBkt(*this)) {

Iterator ++
// go to next bucket
++bkt;

// find nonempty bucket

HashTable Map while (bkt != ba->end() && bkt->empty()) { ++bkt; }

// end of bucket array?


if (bkt == ba->end()) return *this;

// first nonempty entry


ent = bkt->begin();
}
return *this; // return self
}
Iterator *
// get entry
template <typename K, typename V, typename H>
typename HashMap<K,V,H>::Entry&
HashMap<K,V,H>::Iterator::operator*() const {
return *ent;
HashTable Map }
// constructor
template <typename K, typename V, typename H>

HashMap
HashMap<K,V,H>::HashMap(int capacity) : n(0), B(capacity) { }

// number of entries

Functions
template <typename K, typename V, typename H>
int HashMap<K,V,H>::size() const {
return n;
}

HashTable Map // is the map empty?


template <typename K, typename V, typename H>
bool HashMap<K,V,H>::empty() const {
return size() == 0;
}
// iterator to end
template <typename K, typename V, typename H>
typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::end() {
return Iterator(B, B.end());
}

// iterator to front
template <typename K, typename V, typename H>

Begin and End


typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::begin() {
// emtpty - return end
if (empty()) return end();

HashTable Map // Otherwise search for an entry


BItor bkt = B.begin();

// find nonempty bucket


while (bkt->empty()) { ++bkt; }

// return first of bucket


return Iterator(B, bkt, bkt->begin());
}
// find utility
template <typename K, typename V, typename H>
typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::finder(const
K& k) {
int i = hash(k) % B.size(); // get hash index i
BItor bkt = B.begin() + i; // the ith bucket
Iterator p(B, bkt, bkt->begin()); // start of ith bucket

// search for k

Find
while (!endOfBkt(p) && (*p).key() != k) { nextEntry(p); }
return p; // return final position
}

// find key

HashTable Map template <typename K, typename V, typename H>


typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::find(const K&
k) {
Iterator p = finder(k); // look for k
if (endOfBkt(p)) { // didn't find it?
return end(); // return end iterator
} else {
return p; // return its position
}
}
// insert utility
template <typename K, typename V, typename H>
typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::inserter(const
Iterator& p, const Entry& e) {
EItor ins = p.bkt->insert(p.ent, e); // insert before p
n++; // one more entry
return Iterator(B, p.bkt, ins); // return this position
}

Put // insert/replace (v,k)


template <typename K, typename V, typename H>
typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::put(const K&

HashTable Map k, const V& v) {


Iterator p = finder(k); // search for k
if (endOfBkt(p)) { // k not found?
return inserter(p, Entry(k, v)); // insert at end of bucket
} else { // found it?
p.ent->setValue(v); // replace value with v
return p; // return this position
}
}
// remove utility
template <typename K, typename V, typename H>
void HashMap<K,V,H>::eraser(const Iterator& p) {
p.bkt->erase(p.ent); // remove entry from bucket
n--; // one fewer entry
}

// remove entry at p

Erase
template <typename K, typename V, typename H>
void HashMap<K,V,H>::erase(const Iterator& p) {
eraser(p);
}

HashTable Map // remove entry with key k


template <typename K, typename V, typename H>
void HashMap<K,V,H>::erase(const K& k) {
Iterator p = finder(k); // find k
if (endOfBkt(p)) { // not found?
throw NonexistentElement("Erase of nonexistent");
}
eraser(p); // remove it
}
Question:
What is the worst-case running time for inserting n key-
value entries into initially empty map M that is
implemented with a list.
Question:
Draw the 11-entry hash table that results from using the
hash function, h(k) = (3k+3) mod 11, to hash keys: 12,
44, 13, 88, 23, 94, 11, 39, 5, 20, and 16
a. Collisions are handled by chaining
b. Collisions are handled by linear probing
c. Collisions are handled by quadratic probing, up to the point
where the method fails

You might also like