How can you by using a matrix, find the optimal strategies for each player in the prisoner dilemma, and calculate the odds of them choosing that option?

Question

How can you by using a matrix, find the optimal strategies for each player in the prisoner dilemma, and calculate the odds of them choosing that option?

╔══════╦═══════╦══════╗
║ ║ Confess ║ Deny ║
╠══════╬═══════╬══════╣
║ Confess ║ 3y/3y ║ 10y/1y ║
╠══════╬═══════╬══════╣
║ Deny ║ 1y/10y ║ 2y/2y ║
╚══════╩═══════╩══════╝
The column player confesses with the probability of P and denies with the probability of P-1

When I try to calculate the probability for the row player, I do:
3p+1(1−p)=10p+2(1−p)

However then I get:
P=−1/6
Which doesnt work. So I wonder what did I do wrong?

Answer 1

To find the optimal strategies for each player in the prisoner's dilemma and calculate the odds of choosing that option, we can use the concept of expected payoffs.

In the given matrix, the payoffs represent the outcomes for each combination of strategies chosen by the row player (prisoner A) and the column player (prisoner B). The numbers in the matrix represent the payoff for the row player followed by the payoff for the column player.

To calculate the probability for the row player (prisoner A), we need to set up the equation based on the expected payoffs. Let's assume the row player confesses with a probability of P and denies with a probability of 1-P. Similarly, the column player confesses with a probability of Q and denies with a probability of 1-Q.

For prisoner A:
Expected payoff for confessing = 3P + 1(1-P)
Expected payoff for denying = 10P + 2(1-P)

To find the optimal strategy, we need to set these expected payoffs equal to each other.

3P + 1(1-P) = 10P + 2(1-P)

Expanding the equation:
3P + 1 - P = 10P + 2 - 2P

Combining like terms:
2P + 1 = 8P + 2

Simplifying:
7P = 1

Dividing both sides by 7:
P = 1/7

So, the probability of prisoner A confessing is 1/7, and the probability of prisoner A denying is 1 - 1/7 = 6/7.

To calculate the probability of prisoner B's choices, we can use the same approach. Let's assume the column player confesses with a probability of Q and denies with a probability of 1-Q.

For prisoner B:
Expected payoff for confessing = 3Q + 10(1-Q)
Expected payoff for denying = 1Q + 2(1-Q)

Setting these expected payoffs equal to each other:
3Q + 10(1-Q) = Q + 2(1-Q)

Expanding:
3Q + 10 - 10Q = Q + 2 - 2Q

Simplifying:
-Q + 10 = -Q + 2

Since the variables cancel out:
10 = 2

This equation is contradictory, which means there is no valid probability distribution for prisoner B in this case. It indicates that prisoner B's choices do not depend on probabilities but rather on a specific strategy (always confessing or always denying).

Therefore, prisoner B's optimal strategy is to always confess, as that leads to the highest payoff of 3 years. Prisoner A's optimal strategy is to deny, as that leads to a payoff of 10 years, given prisoner B's strategy of always confessing.

In summary, prisoner A's optimal strategy is to deny, with a probability of 6/7, and prisoner B's optimal strategy is to confess, with a probability of 1 (since there is no probabilistic choice for prisoner B in this case).