New Idea for Probabilistic Upper Bounds on PHFN

One famous result of Stinson, van Trung, and Wei is that PHFN(k,v,t) <= c*(log(2k choose t)-log(k)) where c is a constant independent of k, but dependent on v, t. The proof essentially shows that if we start with 2k columns, and have the expected number of unseparated t-sets be strictly smaller than k, then we can delete one column from each of these unseparated t-sets, and are guaranteed then to have at least k columns. The math continues almost identically to the basic method in Perfect Hash Families (a brief overview).

But note that this method does not rely on having exactly 2k columns – note that if we started with any number of columns m > k, and delete m-k columns, we are still ok. So therefore PHFN(k,v,t) <= c*(log(m choose t) – log(k-m)) where c is the same constant as before.

The question then is to determine what is the “best” choice of m here, since it is a discrete value. One could use Stirling’s approximation and find the minimum of such a function by taking the derivative of it with respect to m.

The function is: m log(m) – t log(t) – (m-t)log(m-t) – log(m-k), and the derivative is log(m) – log(m-t) – 1/(m-k). WolframAlpha cannot find a value of m that has this derivative equal to 0. We could note that t log(t) is much smaller than mlog(m), and approximate t log (t) by 0, but this does not help.

If someone can identify how to find the minimum of this function, that would be of great use!


Another neat problem – Number of Distinct Languages recognized by DFAs/NFAs

A cute problem came to my attention recently: let DFA(n,s) be the number of distinct languages recognized by DFAs of n states and s symbols, and NFA(n,s) the same but for NFAs.

It is clear that DFA(1,s) = 2, and NFA(1,s) = 2^s.

But what is the answer in general? It is evident through simple analysis that DFA(2,2) <= 33 (it actually can be shown to be <= 27).

I used a Python program to figure out the answer for NFA(2,2); it is quite large, but at most 800, and DFA(3,1) is at most 400.

Neat Problem – Longest Sandpile Spill?

Consider an n * n matrix of entries in {0, 1, 2, 3, 4, 5, 6}. If there is an element of value at least 4, then a sandpile topple occurs: that element’s value decreases by 4, and the 4 adjacent elements are increased by 1. For elements on the edges/corners, there is no wrap-around. The process continues until there are no elements in the matrix of value at least 4.

So clearly, the process will terminate in a finite number of steps. Call s(n) to be the maximum number of iterations over all n * n matrices of entries in {0, 1, 2, 3, 4, 5, 6}. What is s(n) in general?

Clearly, s(1) = 1. For n at least 2, this is not straightforward. However, I conjecture that every optimal matrix has only 5’s and 6’s, and there is an optimal matrix of only 6’s.

I have written a Python program that computes s(n), given n. What I have found so far:

  • s(2) = 8
  • s(3) = 33
  • s(4) = 82
  • s(5) >= 178
  • s(6) >= 302
  • s(7) >= 529
  • s(8) >= 856
  • s(9) >= 1355

It seems difficult to compute recursive upper bounds because having multiple “ingredients” influence each other (because sand is “propagated” into the other ingredients). Also, the OEIS does not contain this sequence.

Some extensions:

  • Instead of {0, 1, 2, 3, 4, 5, 6}, why not extend to {0, 1, 2, …, v-1} for v at least 7?
  • Instead of sand falling to the adjacent 4 neighbors, why not the 4 corners as well?

Perfect Hash Family Tables

Perfect Hash Families are very extensive in combinatorial design theory literature, and have many applications to secure frame proof codes, to lookup of frequently used information, and even to my own research! There is a list of the “smallest” PHFs, located here, but has not been updated since 2006.

I am in the process of creating a “new” set of tables, not only taking into account constructions of mine (as in the last post), but also those that were published since 2006 (and there are many).

Here is a sneak peek! screen-shot-2017-03-05-at-8-25-47-pm

New Bounds on Perfect Hash Families

I was able to improve a perfect hash family bound, in that PHFN(9, 4, 3) = 3 , whereas the previous known bound (I believe) was PHFN(9, 4, 3) \le 4.

The output of my program that generated it:

a(1,1,1) a(1,2,3) a(1,3,2) a(1,4,4) a(1,5,2) a(1,6,1) a(1,7,4) a(1,8,3) a(1,9,3) a(2,2,3) a(2,1,4) a(2,3,1) a(2,4,1) a(2,5,2) a(2,6,1) a(2,7,3) a(2,8,4) a(2,9,2) a(3,1,1) a(3,2,4) a(3,3,4) a(3,4,3) a(3,5,1) a(3,6,2) a(3,7,1) a(3,8,3) a(3,9,2)

where a(R,C,V) means at row R, column C, is value V.

I’m working on this as a class project – hopefully I can get better bounds for CPHFs as well!

SoDA Talk

I will be giving a 2-talk series on my recent research work to the Software Developers’ Association at ASU on February 9 and 14th, 7:30 PM, in CAVC351. I’ll update this if there are any changes to day/time/location. It will be about new work with Covering Arrays but mostly on the optimizations in designing the suite in C++.