Fisher's Exact Test p-value

Question

Fisher's Exact Test p-value

Fisher's Exact Test provides a method based on the hypergeometric distribution to test hypotheses of the form:

H_0: \pi _{\text {treatment}} = \pi _{\text {control}}, i.e. treatment has no effect on the rate of occurence of a targeted outcome

H_ A:\pi _{\text { treatment}} < \pi _{\text {control}}, i.e. treatment lowers (or raises, or changes) the rate of occurence of a targeted outcome.

In the mammography study, the targeted outcome is death due to breast cancer.

We define the test statistic T to be the number of targeted outcomes in the treatment group. Under the null hypothesis that the treatment has no effect, T follows a hypergeometric distribution \text {hypergeometric}(N,K,n), with parameters:

N: \text {Size of experiment, i.e. total number of individuals in both treatment and control groups}

K: \text {Size of the treatment group}

n: \text {Total number of targeted outcomes}.

Recall the p-value is defined to be the probability that we obtain an observation as extreme or more extreme than the one observed,in the direction of the alternative hypothesis, under the null hypothesis. In a Fisher's exact test, this corresponds to the probability under a tail of the hypergeometric pmf.

Application to the mammography study

In the mammography study, the hypotheses of our Fisher's exact test is:

H_0: \pi _{\text {treatment}} = \pi _{\text {control}}, i.e. Treatment has no effect to the death rate due to breast cancer;

H_ A:\pi _{\text {treatment}} < \pi _{\text {control}}, i.e. Treatment lowers the death rate due to breast cancer.

The test statistic T for this test is the number of breast cancer deaths in the treatment group, which is distributed as hypergeometric with parameters N=62000, K=31000 and n=102, as we discussed on the previous page.

The p-value is then the sum of probabilities of obtaining a value of T that is more extreme than 39, in the direction of the alternate hypothesis. That is,

\displaystyle \displaystyle p\, =\, \mathbf{P}_{H_0}(T \leq 39) \displaystyle = \displaystyle \sum _{t = 0}^{39} \frac{\binom {31000}{t}\binom {31000}{102 - t}}{\binom {62000}{102}}.
From this, based on the significance level \alpha, we can either

reject the null hypothesis if p \leq \alpha, or

fail to reject the null hypothesis if p > \alpha.

Contingency Table

Data for Fisher's exact test can typically be presented in a contingency table , which shows how the targeted outcomes are divided between the treatment and control group, as well as the sizes of these groups.

In the mammography study, the contingency table looks like the following.

Fisher's exact test practice
1/2 points (graded)
Reproduce the Fisher's exact test for the mammography study.

What is the p-value given the observed data?
(Please enter the value with a precision of at least 4 decimal places. Hint : Use the fisher_exact function in SciPy with the appropriate values. )

Answer 1

To calculate the p-value using the Fisher's exact test for the mammography study, we need to use the contingency table provided. However, the contingency table is not specified in the question, so we cannot answer this without more information. The provided information does mention the parameters for the hypergeometric distribution, but without the actual data, we cannot calculate the p-value.

Answer 2

To calculate the p-value for the Fisher's exact test using the observed data, you can use the `fisher_exact` function in SciPy with the appropriate values. The observed data for the mammography study is not given in the provided information.