1. Motivation

第一，影响/决定y的因素有很多，不可能只有一个。因此纳入更多的covariates，对y的explanation/prediction才越准确。

第二，为了解决confounders的问题，我们需要控制一些covariates，也就是把它们引入自变量集合中。

2. Model

Sample regression model:

$$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + + \beta_K x_{iK} + \epsilon_i\\ i = 1,2, \cdots, n $$

Matrix form:

$$ \bold{y} = X\bold{\beta} + \bold{\epsilon} $$

That is:

$$ \underbrace{\vphantom{\begin{bmatrix}1 & x_1\\1 & x_2\\\vdots &\vdots\\1&x_n\end{bmatrix}}\begin{bmatrix} y_1\\ y_2\\\vdots\\y_n\end{bmatrix}}{\textstyle \begin{gathered}\bold{y}\end{gathered}}=\underbrace{\begin{bmatrix}1 & x{11} & x_{12} & \cdots & x_{1K}\\1 & x_{21} & x_{22} & \cdots & x_{2K}\\\vdots &\vdots &\vdots &\vdots &\vdots\\1 & x_{n1} & x_{n2} & \cdots & x_{nK}\end{bmatrix}}{\textstyle \begin{gathered}=X\end{gathered}} \underbrace{\vphantom{\begin{bmatrix}1 & x_1\\1 & x_2\\\vdots &\vdots\\1&x_n\end{bmatrix}}\begin{bmatrix}\beta_0 \\\beta_1\\\beta_2\\\vdots\\\beta_K\end{bmatrix}}{\textstyle \begin{gathered}\beta\end{gathered}}+\underbrace{\vphantom{\begin{bmatrix}1 & x_1\\1 & x_2\\\vdots &\vdots\\1&x_n\end{bmatrix}}\begin{bmatrix}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_n \end{bmatrix}}_{\textstyle \begin{gathered}+\epsilon\end{gathered}} $$

Assumptions:

Linearity
Independence
Normality
Equal variance

Interpretation of regression coefficient:

$\beta_0$：所有covariates为0时，mean(y)的值。
$\beta_k$：当其他covariates不变时，$x_k$每增加1个单位，mean(y)的变化值。被称为“偏效应(partial effect)”或“净效应(net effect)”。

Standardized coefficients: