We’re writing a test case that samples a “random” black box. Given n numbers, how many times would we need to sample the black box to have n unique responses with a 99.9% confidence?

Question

We’re writing a test case that samples a “random” black box. Given n numbers, how many times would we need to sample the black box to have n unique responses with a 99.9% confidence?

Eg, if we were rolling a six-sided die, how many times would we need to roll it in order to see every number 99.9% of the time?

Answer 1

To determine the number of times you would need to sample the black box to have n unique responses with a 99.9% confidence, you can use statistical calculations. One approach is to use the concept of the Coupon Collector's Problem.

In the Coupon Collector's Problem, you imagine collecting coupons from a set of different types. Each time you sample, you get a coupon that represents a unique response from the black box. The goal is to collect all n different types of coupons.

The expected number of samples needed to collect all n unique coupons can be calculated using the formula:

E(n) = n * (1 + 1/2 + 1/3 + ... + 1/n)

This formula represents the expected value of the number of samples you would need to collect all n unique coupons.

Now, to achieve a desired confidence level of 99.9%, you need to determine the minimum number of samples needed with this confidence. This can be done by calculating the confidence interval using the formula:

CI = E(n) + Z * sqrt(V(n))

Where CI is the confidence interval, Z is the z-score corresponding to the desired confidence level (e.g., z-score of 3.290 for a 99.9% confidence), and V(n) is the variance of the number of samples needed to collect all n unique coupons.

The variance V(n) can be approximated by:

V(n) = n^2 * (1 + 1/2 + 1/3 + ... + 1/n) - n^2

By plugging these values into the formula, you can calculate the minimum number of samples needed to have n unique responses with a 99.9% confidence.

For example, if you were rolling a six-sided die and wanted to see every number with 99.9% confidence, you would set n = 6. Plugging this into the formulas, you would calculate E(6), V(6), and the confidence interval CI. The value of CI would approximate the minimum number of rolls needed.

Please note that this method assumes that the black box produces responses uniformly at random, as in the case of rolling a fair die. If the distribution of responses is not uniform, the calculations might be different.