32.2 Weighted uniform sampling

Most computer programming languages provide a primitive which allows us to sample uniformly a random integer in the range [1,N][1,N] for any given natural number NN.

Theorem 32.2.1

Let S={n1,,nt}S=\{n_{1},...,n_{t}\} be a set of natural numbers and N=n1+n2++ntN=n_{1}+n_{2}+...+n_{t}. Then there exists an O(n)O(n) algorithm to sample an nSn\in S according to its weighting in the set. That is, if XX is the random variable for which of the objects, then

Pr[X=t]=nt/Ni\Pr[X=t]=n_{t}/N_{i} (32.1)

The algorithm is this

  • sample k[1,N]k\in[1,N]

  • let x1x\leftarrow 1

  • while n1+n2++nx<kn_{1}+n_{2}+...+n_{x}<k increment XX by 1

  • return xx

To prove that this algorithm satisfies the condition, we can check how many different values of kk are associated with each nin_{i}.

First we can examine an easier case, before tackling the more general one.

For n1n_{1} we know that if 1k<n11\leq k<n_{1} that the value output will be n1n_{1} (as n1<n1n_{1}<n_{1} so we will return x=1x=1) and there are n1n_{1} integer values in the range 1k<n11\leq k<n_{1}.

In the more general case that for nin_{i} we will output x=ix=i provided that kn1++ni1k\geq n_{1}+...+n_{i-1} and k<n1++nik<n_{1}+...+n_{i}, as these are the values of kk for which i1i-1 is the greatest value for which n1+n2++ni1<nin_{1}+n_{2}+...+n_{i-1}<n_{i}, so we will increment xx a total of i1i-1 times, and thus we will return x=1+i1=ix=1+i-1=i. As we are after, there are nin_{i} values in this interval.