Ragesh Jaiswal, CSE, IIT Delhi
How do we design a good hash function?
A set S of keys from a universeU ={0,1, ...,m−1} is supposed to be stored in a table of sizen with indices T ={0,1, ...,n−1}.
Assume collisions are resolved using auxiliary data structure.
What we need is a hash functionh :U →T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
How do we design a good hash function?
A set S of keys from a universeU ={0,1, ...,m−1} is supposed to be stored in a table of sizen with indices T ={0,1, ...,n−1}.
Assume collisions are resolved using auxiliary data structure.
What we need is a hash functionh :U →T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y))
How do we design a good hash function?
A set S of keys from a universeU ={0,1, ...,m−1} is supposed to be stored in a table of sizen with indices T ={0,1, ...,n−1}.
Assume collisions are resolved using auxiliary data structure.
What we need is a hash functionh :U →T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y))
Claim 1.1: Any fixed hash functionh:U →T, must map at leastdmneelements of U to some index in the setT.
How do we design a good hash function?
A set S of keys from a universeU ={0,1, ...,m−1} is supposed to be stored in a table of sizen with indices T ={0,1, ...,n−1}.
Assume collisions are resolved using auxiliary data structure.
What we need is a hash functionh :U →T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y)) Claim 2: For any fixed key set S such that |S| ≤n, there exists a hash function such thath has no collisions w.r.t. S.
How do we design a good hash function?
A set S of keys from a universeU ={0,1, ...,m−1} is supposed to be stored in a table of sizen with indices T ={0,1, ...,n−1}.
Collisions are resolved using auxiliary data structure.
What we need is a hash functionh :U →T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
Claim 1: Ifm>n, then for anyh there exists a key setS such that h has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y)) Claim 2: For any fixed key set S such that |S| ≤n, there exists a hash function such thath has no collisions w.r.t. S. The issue is that the key setS is not known a-priori. That is, before using the data structure.
Question: How do we solve this problem then?
How do we design a good hash function?
A setS of keys from a universeU={0,1, ...,m−1}is supposed to be stored in a table of sizenwith indicesT ={0,1, ...,n−1}.
Collisions are resolved using auxiliary data structure.
What we need is a hash function h:U→T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
Claim 1: If m>n, then for anyhthere exists a key set S such thath has collision w.r.t. S (i.e., ∃x,y ∈S,h(x) =h(y))
Claim 2: For any fixed key setS such that|S| ≤n, there exists a hash function such thath has no collisions w.r.t. S.
The issue is that the key setS is not known a-priori. That is, before using the data structure.
Question: How do we solve this problem then?
Randomlyselect a hash function from afamilyHof hash functions.
How do we design a good hash function?
A setSof keys from a universeU={0,1, ...,m−1}is supposed to be stored in a table of sizenwith indicesT={0,1, ...,n−1}.
Collisions are resolved using auxiliary data structure.
What we need is a hash functionh:U→T with the following main requirements:
1 The hash function should minimize the number of collisions.
2 The space used should be proportional to the number of keys stored. (i.e.,n≈ |S|)
The issue is that the key set Sis not known a-priori. That is, before using the data structure.
Question: How do we solve this problem then?
Randomlyselect a hash function from afamilyHof hash functions.
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤1 n.
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
Proof sketch: Consider any keyx. The expected number of keys in locationh(x) is at most t/n.
Question: Can you think of a 2-universal hash function family?
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
Proof sketch: Consider any keyx. The expected number of keys in locationh(x) is at most t/n.
Question: Can you think of a 2-universal hash function family?
Simple answer: The set ofallfunctions fromU toT.
Do you see any issues with using this hash function family?
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
Proof sketch: Consider any keyx. The expected number of keys in locationh(x) is at most t/n.
Question: Can you think of a 2-universal hash function family?
Simple answer: The set ofallfunctions fromU toT.
Do you see any issues with using this hash function family? The description of any hash function from this family is large.
Question: Can we design a morecompacthash function family?
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
A compact 2-universal hash function family:
Letm≤p≤2m.
H={ha,b|a∈ {1, ...,p−1},b∈ {0, ...,p−1}}and ha,b(x) = ((ax+b) modp) modn.
How many functions doesH have?
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
A compact 2-universal hash function family:
Letm≤p≤2m.
H={ha,b|a∈ {1, ...,p−1},b∈ {0, ...,p−1}}and ha,b(x) = ((ax+b) modp) modn.
How many functions doesH have? p(p−1)
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
A compact 2-universal hash function family:
Letm≤p≤2m.
H={ha,b|a∈ {1, ...,p−1},b∈ {0, ...,p−1}}and ha,b(x) = ((ax+b) modp) modn.
How many functions doesH have? p(p−1) Theorem: H is 2-universal.
Definition (2-universality)
A hash function familyH is said to be 2-universal iff:
∀x,y ∈U,x 6=y,Prh←H[h(x) =h(y)]≤ 1 n.
Theorem: Consider hashing using a 2-universal hash function family. Considertinsert operations, the expected cost of each operation is at most (1 +t/n).
A compact 2-universal hash function family:
Letm≤p≤2m.
H={ha,b|a∈ {1, ...,p−1},b∈ {0, ...,p−1}}and ha,b(x) = ((ax+b) modp) modn.
Theorem: H is 2-universal.
Theorem: H is 2-universal.
Proof sketch
Letga,b(x) = (ax+b) modp. So,ha,b(x) =ga,b(x) modn.
Consider anyx,y∈ {0, ...,p−1}such thatx6=y.
Claim 1: Ifha,b(x) =ha,b(y), thenga,b(x) =ga,b(y) modn.
Theorem: H is 2-universal.
Proof sketch
Letga,b(x) = (ax+b) modp. So,ha,b(x) =ga,b(x) modn.
Consider anyx,y∈ {0, ...,p−1}such thatx6=y.
Claim 1: Ifha,b(x) =ha,b(y), thenga,b(x) =ga,b(y) modn.
Claim 2: For allα, β∈ {0, ...,p−1}:
Pr[ga,b(x) =αandga,b(y) =β] =
(0 ifα=β
1
p(p−1) otherwise
Theorem: H is 2-universal.
Proof sketch
Letga,b(x) = (ax+b) modp. So,ha,b(x) =ga,b(x) modn.
Consider anyx,y∈ {0, ...,p−1}such thatx6=y.
Claim 1: Ifha,b(x) =ha,b(y), thenga,b(x) =ga,b(y) modn.
Claim 2: For allα, β∈ {0, ...,p−1}:
Pr[ga,b(x) =αandga,b(y) =β] =
(0 ifα=β
1
p(p−1) otherwise Claim 3: We have:
Pr[ha,b(x) =ha,b(y)] = |{(α, β) :α6=β andα≡β modn}|
p(p−1) ≤ 1
n.