Imagine there is a population/database/dictionary and we would like to distinguish its elements.

So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.

We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.

For such ensemble of random trees of given number of leaves (n), we can calculate Shannon entropy (H_n) - the amount of information it contains.

It turns out that it asymptotically grows with at average 2.77544 bits per element (1/2+(1+\gamma)lg(e)).

The calculations can be found here:

http://arxiv.org/abs/1206.4555
Is it the minimal cost of distinguishability/individuality?

How to understand it?

ps. related question: can anyone find D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n ?