My research deals with Perfect Hash Families, and this article will deal with some of the background in (possible) future posts that I make. No background is strictly necessary for this post other than general mathematical knowledge.
A perfect hash family (PHF) is a 4-tuple which is a set of functions with and , such that for any -subset of this set, there is at least one function such that restricted to is one-to-one. In matrix form, this represents an matrix over symbols such that for any columns, there is at least 1 row among these columns that has all distinct entries.
If for a given set of columns we have such a condition (i.e., one row is all distinct), then we say that these columns are covering. Also, we say the PHF is covering if all -subsets of columns are covering.
We want to find the smallest possible PHF for the given parameters, in terms of the number of rows. We denote this question (the smallest number of rows) by .
These objects were introduced by Mehlhorn in 1984 for fast retrieval of frequently used information. Currently, they are used in the construction of Covering Arrays as well as in combinatorial cryptography.
Theorem 1 if , and otherwise.
Proof: If , the array consists of 1 row of all distinct symbols. If and there is only 1 row, then there must be one set of columns with a duplicate entry.
Theorem 2 .
Proof: If , then the PHF covers if and only if every pair of columns is different. Therefore, we can list columns containing symbols. If each is of length , then we have different columns, and solving for gives the desired result.
Theorem 3 (Symbol Increase) for any .
Proof: Increasing the number of symbols does not change whether or not the PHF covers.
Theorem 4 (Column Decrease) for any .
Proof: Removing columns does not change whether or not the PHF covers.
Theorem 5 for any .
Proof: Add new columns to the PHF, each one consisting entirely of a new symbol. The PHF certainly remains covering on any of the previous columns, and covers when involving any of the new columns because they introduce new symbols.
1. Recursive Constructions
The power in generating large combinatorial objects lies in being able to create them using smaller “ingredients” – we describe a few of them here. The first two lead to the best bounds, in general, for very large PHFs.
Theorem 6 (Blackburn Composition) .
Proof: Call the first PHF , and the other . Replace all occurrences of the th symbol of with the th column of to form . has the required size, but why is it a PHF? Select any columns of . In with the same columns, at least 1 row contains distinct values. In , that row includes distinct columns of . Since was a PHF, at least 1 row in must contain distinct columns. Therefore, is a PHF.
Theorem 7 (Atici-Magliveras-Stinson-Wei Product)
Theorem 8 (Martirosyan Row Decrease)
Proof: Let be the PHF where we want to remove a row. Pick any row . Denote to be the number of times symbol appears in row . Without loss of generality, suppose is the largest over all symbols, and is the least (ordered in the usual way).
Delete all columns of that have symbols in row . Now remove row completely. We get an array (with potentially fewer symbols, but this is not guaranteed). However, we can guarantee is a PHF of strength .
What is the value of ? Since we have that:
. Therefore, , and the result is proved.
2. Probabilistic Constructions
The previous section discusses actually making a PHF, but here we take a different approach: we see that a PHF of given parameters must exist, but the proofs are nonconstructive (unfortunately).
Let be a random array over symbols. Choose a set of columns . What is the probability that is not covered? It is the probability that no rows in have all distinct entries. What is the probability that a given row is all distinct? This probability is
Let this probability be . Therefore, the probability that a given row is not all distinct is , and the probability that all rows are not distinct is .
So what is the expected number of -sets of columns that are not covering? By union bound, we know that since there are columns, that this expected number is
All we need to do is to solve for when this number is strictly less than 1. If we do, then we guarantee that there must be some PHF with those parameters. Taking logarithms, and upper bounds, we have that:
Theorem 9 .
Deng, Stinson, and Wei showed an improved bound using the Lovász Local Lemma:
Theorem 10 .
Stinson, van Trung, and Wei improved on this (called “expurgation”):
Theorem 11 .
Note that improvements are only in the numerator in all cases.