Shane Chu

Universal hash functions

The definition of universal hash function is as follows:

A family of hash functions H={h:UV}\mathcal H=\{h:U\rightarrow V\} is called universal if for any two distinct element x,yUx,y\in U, the probability that a randomly chosen hash function hHh\in\mathcal H that maps xx and yy to the same value is at most 1V\frac{1}{|V|}. Formally, H\mathcal H is universal if:

PhH[h(x)=h(y)]1V\underset{h\in \mathcal H}{\mathbb P}\left[h(x)=h(y)\right]\leq \frac{1}{|V|}

for all xyx\neq y. Here, the probability is taken over the uniform distribution of hh from H\mathcal H.

We focus on H\mathcal H whose range V={0,1,,M1}V=\{0,1,\dots,M-1\} as a set of integers. Constructing a universal hash function from such a family of functions is straightforward. Here's a way to do it:

h(x)=(r1x1++rkxk) mod Mh(x) = (r_1x_1+\cdots+r_kx_k)\,\,\text{mod}\,\,M

Done! We've now constructed a universal hash function hh with a collsion probability 1/M1/M. You can find a proof of this, e.g. in this lecture here by Arvim Blum).

In some cases, the prime number MM might be inconvenient to work with, especially if we want to restrict the range of the output to {0,,N1}\{0,\dots,N-1\}, where N<MN < M. To address this, we modify the hash function as follows:

h(x)=(r1x1++rkxk) mod M mod Nh(x) = (r_1x_1+\cdots+r_kx_k)\,\,\text{mod}\,\,M \,\, \text{mod}\,\, N

Now the collision probability increases from 1/M1/M to 1/N1/N. This happens because the reduction from {0,,M1}\{0,\dots,M-1\} to {0,,N1}\{0,\dots,N-1\} causes each value in {0,,N1}\{0,\dots,N-1\} to correspond to MN+1\lfloor \frac{M}{N} \rfloor + 1 values from the original range {0,,M1}\{0,\dots,M-1\}.

Here's a simple Julia code to construct universal hash function hh:

using Primes
# K: dimension for the items in the universe
# M: A sufficiently large prime number
# N: the output dimension when N > 1
function create_universal_hash(K::Int, M::Int; N = 1)
    @assert isprime(M) "M must be a prime number"
    r = [rand(0:M-1) for _ = 1:K]
    h(x) = (sum(r .* x) % M) % N
    return h
end
Contact me by E-mail | Github | Linkedin
This work is licensed under CC BY-SA 4.0. Last modified: January 15, 2025.
Website built with Franklin.jl and the Julia language.