An airline company wishes to build a predictor for whether or not a passenger will show up on time for a flight (a yes/no answer), based on predictive features such as: (1) how many times the passenger missed their flight in the past, (2) the time that the ticket was purchased and (3) the predicted amount of traffic for that day.

Let [mathjaxinline]\mathbf X[/mathjaxinline] be the vector of predictive features and let [mathjaxinline]Y[/mathjaxinline] be the desired feature we wish to predict for future passengers.

Remark: In this problem, we consider reasons for which a more generalized version of regression – as opposed to simple linear regression – might be more appropriate.

In the same setting as the previous problem (and in the context of the discussion of the solution), which of the following are true statements about \mu and the pair (\mathbf X,Y)? Choose all that apply.

The range of values of Y is bounded.

The range of values of \mu is strictly positive.

Based on the range of values of Y, it is harder to assume that the noise is Gaussian.

Mathematically, linear regression is impossible to compute for Yes/No responses.

Mathematically, linear regression is impossible to compute for integer-valued features, (e.g. X_1 = the number of missed flights).

The correct statements are:

Based on the range of values of Y, it is harder to assume that the noise is Gaussian.

Mathematically, linear regression is impossible to compute for Yes/No responses.

Mathematically, linear regression is impossible to compute for integer-valued features, (e.g. [mathjaxinline]X_1[/mathjaxinline] = the number of missed flights).

Explanation:

1. The range of values of Y is bounded: This statement is not necessarily true. The range of values of Y may or may not be bounded depending on the specific problem and data.

2. The range of values of [mathjaxinline]\mu[/mathjaxinline] is strictly positive: This statement is not necessarily true. [mathjaxinline]\mu[/mathjaxinline] represents the expected value of Y, and it can be any real number, including negative or zero values.

3. Based on the range of values of Y, it is harder to assume that the noise is Gaussian: This statement is true. If the range of values of Y is limited (e.g., bounded between 0 and 1 for a Yes/No response), it is harder to assume that the noise follows a Gaussian distribution, which is typically assumed in linear regression.

4. Mathematically, linear regression is impossible to compute for Yes/No responses: This statement is true. Linear regression assumes a continuous target variable, and it cannot be directly applied to Yes/No responses. Instead, methods like logistic regression or other classification algorithms should be used.

5. Mathematically, linear regression is impossible to compute for integer-valued features: This statement is true. Linear regression assumes that the predictors (features) are continuous variables. If the features are integer-valued, linear regression cannot be directly applied. Different regression techniques, such as generalized linear regression, may be more appropriate.

In the given setting, let's address each of the statements and determine which ones are true.

1) The range of values of Y is bounded.
- This statement is true. Since the desired feature we wish to predict is whether or not a passenger shows up on time for a flight (a yes/no answer), the range of values for Y is bounded to two possibilities: "yes" or "no".

2) The range of values of μ is strictly positive.
- This statement cannot be determined based on the given information. The term μ refers to the true underlying conditional expectation of Y given X. Whether it is strictly positive or not depends on the specific data and context, so we cannot determine it solely from the features mentioned.

3) Based on the range of values of Y, it is harder to assume that the noise is Gaussian.
- This statement is not necessarily true. The range of values for Y being bounded does not directly imply that the noise is not Gaussian. Gaussian noise can still manifest in a binary outcome variable like Y.

4) Mathematically, linear regression is impossible to compute for Yes/No responses.
- This statement is true. Linear regression assumes a continuous response variable, which is not the case for Yes/No responses. For binary outcomes, other approaches like logistic regression or classification methods are more appropriate.

5) Mathematically, linear regression is impossible to compute for integer-valued features (e.g. X₁ = the number of missed flights).
- This statement is not true. Linear regression can handle integer-valued features without any issue. However, when it comes to modeling binary outcomes like Y, linear regression is not suitable.

Therefore, the true statements are:
- The range of values of Y is bounded.
- Mathematically, linear regression is impossible to compute for Yes/No responses.