The definition of universal hash function is as follows:
A family of hash functions is called universal if, for any two distinct elements , a randomly chosen hash function maps and to the same value with probability at most . Formally, is universal if:
for all . Here, the probability is taken over the uniform distribution of from .
We focus on families whose range is the set of integers . Constructing a universal hash function from such a family is straightforward. Here's one way to do it:
Fix a prime number
Assume that the keys in the universe can be encoded as vectors of positive integers
Uniformly choose numbers from the set
Define the hash function as
This is universal, with collision probability . You can find a proof of this, e.g., in this lecture by Avrim Blum.
In some cases, the prime number might be inconvenient to work with, especially if we want to restrict the range of the output to , where . To address this, we modify the hash function as follows:
Now the collision probability increases from to . This happens because the reduction from to causes each value in to correspond to values from the original range .
Here's some simple Julia code to construct the universal hash function :
using Primes
# K: dimension for the items in the universe
# M: A sufficiently large prime number
# N: the output dimension when N > 1
function create_universal_hash(K::Int, M::Int; N = 1)
@assert isprime(M) "M must be a prime number"
r = [rand(0:M-1) for _ = 1:K]
h(x) = (sum(r .* x) % M) % N
return h
end