# 32.3 Target shooting

Here is a general problem; suppose that we a set $U$ and a set $S\subseteq U$, and that we would like to estimate what proportion of elements of $S$ are in $U$.

Of course, if we know the values of $|U|$ and $|S|$ then the answer is just

$R:=\frac{|S|}{|U|}$ (32.2)

but oftentimes we don’t, or to compute the actual sets would be impractical. One idea is to sample elements from $U$ and check if they are in $S$, something like this

• randomly sample $u_{1},u_{2},...,u_{N}$ from $U$ (uniformly and independently)

• calculate $a_{N}=(1/N)\times\sum_{1\leq i\leq N}I(u_{i})$ where $I(u)=1$ if $u\in S$ and $I(u)=0$ otherwise.

clearly $a_{N}$ gives us a rough idea of what $R$ (which above we defined as $\frac{|S|}{|U|}$) is. But just how good?

Let $\varepsilon>0$. Let $X$ be the output of our algorithm, and we seek to derive a bound on

$\Pr[|X-R|\geq\varepsilon|S|/|U|]$ (32.3)

that is, the probability that $X$ is more than a factor of $\varepsilon$ away from the true value of $|S|/|U|$.

Note that our algorithm $X$ is equivalent to

$(1/N)\times\sum_{1\leq i\leq N}I(u_{i})$ (32.4)

and as each $I(u_{i})\sim\text{Bern}(R)$, we know that

$\sum_{1\leq i\leq N}I(u_{i})\sim B(N,|S|/|U|)$ (32.5)

and therefore we can use a Chernoff bound (as we have a binomially distributed discrete random variable). It follows that

$E[X]=\frac{1}{N}N\frac{|S|}{|U|}=\frac{|S|}{|U|}$ (32.6)

and therefore

 $\displaystyle\Pr[|X-R|\geq\varepsilon|S|/|U|]$ $\displaystyle=\Pr[|X-E[X]|\geq\varepsilon E[X]$ (32.7)

then we can let $Z=NX$ which means that $Z\sim B(N,|S|/|U|)$ and is slightly more pleasant to work with. This means that

 $\displaystyle\Pr[|X-E[X]|\geq\varepsilon E[X]$ $\displaystyle=\Pr[|Z-E[Z]|\geq\varepsilon E[Z]]\text{ (multiplying by N)}$ (32.9) $\displaystyle=\Pr[\{Z-E[Z]\geq\varepsilon E[Z]\}\cup\{Z-E[Z]\leq-E[Z]\}]$ (32.10) $\displaystyle\leq\Pr[Z\geq(1+\varepsilon)E[Z]]+\Pr[Z\leq(1-\varepsilon)E[Z]]$ (32.11) $\displaystyle\leq 2\times\exp\left(-\frac{1}{3}\varepsilon^{2}E[Z]\right)$ (32.12) $\displaystyle=2\times\exp\left(-\frac{1}{3}\varepsilon^{2}N\frac{|S|}{|U|}\right)$ (32.13)

and through our choice of $N$ we can reduce the probability of failure arbitrarily. If we let $\delta>0$ then we can make

$2\times\exp\left(-\frac{1}{3}\varepsilon^{2}N\frac{|S|}{|U|}\right)\leq\delta$ (32.14)

through choosing $N=-3\frac{|U|}{|S|}\frac{1}{\varepsilon^{2}}\times\ln(\delta/2)$ (we essentially want to cancel everything out so that we get $e^{\ln(\delta/2)}=\delta/2$ and then the $2$ cancels and we are left with $\delta$).

Thus we can make the failure probability arbitrarily small.