Which one of the following data modeling scenarios require one to strictly use a generalized linear model over a Gaussian linear model? (Choose all that apply.)

Question

Which one of the following data modeling scenarios require one to strictly use a generalized linear model over a Gaussian linear model? (Choose all that apply.)

Note: While it is true that one can use a Gaussian linear model to fit any data (without paying attention to whether it is appropriate or not), in this problem we should use a GLM when it is more appropriate under a given scenario.

We observe data Y_ i \in \{ 0,1, \dots , n_ i\} as a function of X_ i's that take on integers n_ i > 0 and we wish to model the proportions Y_ i / n_ i.

We observe Y_ i \in \mathbb {R} that we know are non-linearly related to the explanatory variables \mathbf{X}_ i.

The dependent variable Y > 0 has a discrete distribution whose expectation we wish to relate to the explanatory variable \mathbf{X}.

Answer 1

- The first scenario: We observe data Y_i ∈ {0,1,...,n_i} as a function of X_i's that take on integers n_i > 0 and we wish to model the proportions Y_i / n_i. In this scenario, it is more appropriate to use a generalized linear model (GLM) because the response variable Y_i is discrete and takes on a range of values. The GLM allows for modeling the proportions in a way that accounts for the discrete nature of the data.

- The second scenario: We observe Y_i ∈ ℝ that we know are non-linearly related to the explanatory variables X_i. In this scenario, it is also more appropriate to use a generalized linear model (GLM) because the relationship between the response variable Y_i and the explanatory variables X_i is non-linear. The GLM allows for flexible modeling of non-linear relationships between variables.

Therefore, both the first and second scenarios require the use of a generalized linear model (GLM) over a Gaussian linear model.

Answer 2

The scenarios that require one to strictly use a generalized linear model (GLM) over a Gaussian linear model are:

1. We observe data Y_i in {0,1,...,n_i} as a function of X_i's that take on integers n_i > 0 and we wish to model the proportions Y_i/n_i. In this case, a GLM is more appropriate because we are dealing with proportions, which follow a discrete distribution.

2. The dependent variable Y > 0 has a discrete distribution whose expectation we wish to relate to the explanatory variable X. Again, since we are dealing with a discrete distribution, a GLM should be used.

Therefore, both scenario 1 and scenario 3 require the use of a GLM.