says I have a set of values e.g. A, B, C, D and each of them have a probability. If I want to generate these values randomly base on their assigned percentage, why do I need to put the assigned percentages into cumulative percentages first and generate the values base on the cumulative percentages?

Question

says I have a set of values e.g. A, B, C, D and each of them have a probability. If I want to generate these values randomly base on their assigned percentage, why do I need to put the assigned percentages into cumulative percentages first and generate the values base on the cumulative percentages?

Answer 1

To sample values A, B, C, D based on their assigned probabilities, you need to ensure that each value is drawn with the appropriate frequency over many trials. Given that random number generation typically produces uniform results over a continuous range (e.g., between 0 and 1), you’ll need a way to map these uniformly generated values to your non-uniformly distributed values A, B, C, D.

This is where cumulative probabilities come into play. By converting individual probabilities into cumulative probabilities, we create "bins" or "intervals" on the probability range [0, 1) that correspond to each value. Here's how you can calculate cumulative probabilities:

1. Sort the values by their probability (this step is optional depending on the subsequent sampling method).
2. Start with a cumulative probability of 0.
3. For each value, add its assigned probability to the running total of cumulative probability. The cumulative probability for a value is the sum of its own probability and all the probabilities that came before it.

For example, suppose we have 4 values with the following probabilities:
- A: 20% (0.2)
- B: 30% (0.3)
- C: 40% (0.4)
- D: 10% (0.1)

The cumulative probabilities would be:
- A: 0.2 (0.2 + 0)
- B: 0.5 (0.3 + 0.2)
- C: 0.9 (0.4 + 0.5)
- D: 1.0 (0.1 + 0.9)

Now, let’s use these cumulative probabilities to generate values randomly:

1. Generate a random number (r) in the range [0, 1).
2. If r is less than or equal to the cumulative probability of value A (0.2 in this case), then A is selected.
3. If r is greater than the cumulative probability of A but less than or equal to the cumulative probability of B (0.5), then B is selected.
4. Continue in this manner for values C and D.

By using the cumulative probabilities, you partition the [0, 1) interval into segments that each span a range equal to the probability of the corresponding value. When you generate a uniform random number, it will fall into one of these segments, thus randomly selecting a value with the correct probability.

Without using cumulative probabilities, it would be more complex to sample according to non-uniform distributions, as you would need another method to properly map your uniform random number to the outcomes with their assigned probabilities.