The definition of universal hash function is as follows:
A family of hash functions is called universal if for any two distinct element , the probability that a randomly chosen hash function that maps and to the same value is at most . Formally, is universal if:
for all . Here, the probability is taken over the uniform distribution of from .
We focus on whose range as a set of integers. Constructing a universal hash function from such a family of functions is straightforward. Here's a way to do it:
Fix a prime number
Assume that the keys in the universe can be encoded as vectors of positive integers
Uniformly choose numbers from the set
Define the hash function as
Done! We've now constructed a universal hash function with a collsion probability . You can find a proof of this, e.g. in this lecture here by Arvim Blum).
In some cases, the prime number might be inconvenient to work with, especially if we want to restrict the range of the output to , where . To address this, we modify the hash function as follows:
Now the collision probability increases from to . This happens because the reduction from to causes each value in to correspond to values from the original range .
Here's a simple Julia code to construct universal hash function :
using Primes
# K: dimension for the items in the universe
# M: A sufficiently large prime number
# N: the output dimension when N > 1
function create_universal_hash(K::Int, M::Int; N = 1)
@assert isprime(M) "M must be a prime number"
r = [rand(0:M-1) for _ = 1:K]
h(x) = (sum(r .* x) % M) % N
return h
end