第一,影响/决定y的因素有很多,不可能只有一个。因此纳入更多的covariates,对y的explanation/prediction才越准确。
第二,为了解决confounders的问题,我们需要控制一些covariates,也就是把它们引入自变量集合中。
Sample regression model:
$$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + + \beta_K x_{iK} + \epsilon_i\\ i = 1,2, \cdots, n $$
Matrix form:
$$ \bold{y} = X\bold{\beta} + \bold{\epsilon} $$
That is:
$$ \underbrace{\vphantom{\begin{bmatrix}1 & x_1\\1 & x_2\\\vdots &\vdots\\1&x_n\end{bmatrix}}\begin{bmatrix} y_1\\ y_2\\\vdots\\y_n\end{bmatrix}}{\textstyle \begin{gathered}\bold{y}\end{gathered}}=\underbrace{\begin{bmatrix}1 & x{11} & x_{12} & \cdots & x_{1K}\\1 & x_{21} & x_{22} & \cdots & x_{2K}\\\vdots &\vdots &\vdots &\vdots &\vdots\\1 & x_{n1} & x_{n2} & \cdots & x_{nK}\end{bmatrix}}{\textstyle \begin{gathered}=X\end{gathered}} \underbrace{\vphantom{\begin{bmatrix}1 & x_1\\1 & x_2\\\vdots &\vdots\\1&x_n\end{bmatrix}}\begin{bmatrix}\beta_0 \\\beta_1\\\beta_2\\\vdots\\\beta_K\end{bmatrix}}{\textstyle \begin{gathered}\beta\end{gathered}}+\underbrace{\vphantom{\begin{bmatrix}1 & x_1\\1 & x_2\\\vdots &\vdots\\1&x_n\end{bmatrix}}\begin{bmatrix}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_n \end{bmatrix}}_{\textstyle \begin{gathered}+\epsilon\end{gathered}} $$
Assumptions:
Interpretation of regression coefficient:
Standardized coefficients: