32.3 Target shooting ‣ Chapter 32 Randomized algorithms ‣ Part IV Algorithms ‣ A calculus of the absurd‣ Chapter 32 Randomized algorithms ‣ Part IV Algorithms ‣ A calculus of the absurd

Here is a general problem; suppose that we a set $U$ and a set $S\subseteq U$ , and that we would like to estimate what proportion of elements of $S$ are in $U$ .

Of course, if we know the values of $|U|$ and $|S|$ then the answer is just

R:=\frac{|S|}{|U|}

(32.2)

but oftentimes we don’t, or to compute the actual sets would be impractical. One idea is to sample elements from $U$ and check if they are in $S$ , something like this

•

randomly sample $u_{1},u_{2},...,u_{N}$ from $U$ (uniformly and independently)
•

calculate $a_{N}=(1/N)\times\sum_{1\leq i\leq N}I(u_{i})$ where $I(u)=1$ if $u\in S$ and $I(u)=0$ otherwise.

clearly $a_{N}$ gives us a rough idea of what $R$ (which above we defined as $\frac{|S|}{|U|}$ ) is. But just how good?

Let $\varepsilon>0$ . Let $X$ be the output of our algorithm, and we seek to derive a bound on

\Pr[|X-R|\geq\varepsilon|S|/|U|]

(32.3)

that is, the probability that $X$ is more than a factor of $\varepsilon$ away from the true value of $|S|/|U|$ .

Note that our algorithm $X$ is equivalent to

(1/N)\times\sum_{1\leq i\leq N}I(u_{i})

(32.4)

and as each $I(u_{i})\sim\text{Bern}(R)$ , we know that

\sum_{1\leq i\leq N}I(u_{i})\sim B(N,|S|/|U|)

(32.5)

and therefore we can use a Chernoff bound (as we have a binomially distributed discrete random variable). It follows that

E[X]=\frac{1}{N}N\frac{|S|}{|U|}=\frac{|S|}{|U|}

(32.6)

and therefore

\displaystyle\Pr[|X-R|\geq\varepsilon|S|/|U|]

\displaystyle=\Pr[|X-E[X]|\geq\varepsilon E[X]

(32.7)

then we can let $Z=NX$ which means that $Z\sim B(N,|S|/|U|)$ and is slightly more pleasant to work with. This means that

$\displaystyle\Pr[\|X-E[X]\|\geq\varepsilon E[X]$	$\displaystyle=\Pr[\|Z-E[Z]\|\geq\varepsilon E[Z]]\text{ (multiplying by $N$)}$	(32.9)
	$\displaystyle=\Pr[\{Z-E[Z]\geq\varepsilon E[Z]\}\cup\{Z-E[Z]\leq-E[Z]\}]$	(32.10)
	$\displaystyle\leq\Pr[Z\geq(1+\varepsilon)E[Z]]+\Pr[Z\leq(1-\varepsilon)E[Z]]$	(32.11)
	$\displaystyle\leq 2\times\exp\left(-\frac{1}{3}\varepsilon^{2}E[Z]\right)$	(32.12)
	$\displaystyle=2\times\exp\left(-\frac{1}{3}\varepsilon^{2}N\frac{\|S\|}{\|U\|}\right)$	(32.13)

and through our choice of $N$ we can reduce the probability of failure arbitrarily. If we let $\delta>0$ then we can make

2\times\exp\left(-\frac{1}{3}\varepsilon^{2}N\frac{|S|}{|U|}\right)\leq\delta

(32.14)

through choosing $N=-3\frac{|U|}{|S|}\frac{1}{\varepsilon^{2}}\times\ln(\delta/2)$ (we essentially want to cancel everything out so that we get $e^{\ln(\delta/2)}=\delta/2$ and then the $2$ cancels and we are left with $\delta$ ).

Thus we can make the failure probability arbitrarily small.