CSL202: Discrete Mathematical Structures

(1)

Ragesh Jaiswal, CSE, IIT Delhi

(2)

(3)

How do we design a good hash function?

A set S of keys from a universeU ={0,1, ...,m−1} is supposed to be stored in a table of sizen with indices T ={0,1, ...,n−1}.

Assume collisions are resolved using auxiliary data structure.

What we need is a hash functionh :U →T with the following main requirements:

1 The hash function should minimize the number of collisions.

2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)

(4)

Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y))

(5)

Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y))

Claim 1.1: Any fixed hash functionh:U →T, must map at leastd^m_neelements of U to some index in the setT.

(6)

Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y)) Claim 2: For any fixed key set S such that |S| ≤n, there exists a hash function such thath has no collisions w.r.t. S.

(7)

Collisions are resolved using auxiliary data structure.

Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y)) Claim 2: For any fixed key set S such that |S| ≤n, there exists a hash function such thath has no collisions w.r.t. S. The issue is that the key setS is not known a-priori. That is, before using the data structure.

Question: How do we solve this problem then?

(8)

A setS of keys from a universeU={0,1, ...,m−1}is supposed to be stored in a table of sizenwith indicesT ={0,1, ...,n−1}.

What we need is a hash function h:U→T with the following main requirements:

Claim 1: If m>n, then for anyhthere exists a key set S such thath has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y))

Claim 2: For any fixed key setS such that|S| ≤n, there exists a hash function such thath has no collisions w.r.t. S.

The issue is that the key setS is not known a-priori. That is, before using the data structure.

Randomlyselect a hash function from afamilyHof hash functions.

(9)

A setSof keys from a universeU={0,1, ...,m−1}is supposed to be stored in a table of sizenwith indicesT={0,1, ...,n−1}.

What we need is a hash functionh:U→T with the following main requirements:

The issue is that the key set Sis not known a-priori. That is, before using the data structure.

Randomlyselect a hash function from afamilyHof hash functions.

Definition (2-universality)

A hash function familyH is said to be 2-universal iff:

∀x,y ∈U,x 6=y,Pr_h←H[h(x) =h(y)]≤1 n.

(10)