1. Model evaluation
A good model:
- is not overly complicated.
- meets the four conditions of the linear regression model.
- allows you to answer your research question of interest.
Some criteria for evaluation:
- Adjusted R2(大好)
- MSE(小好)
- Mean Square Error
- SSE/(n-#params)
- 和Adjusted R2等价
- RSE(小好)
- Residual Standard Error
- sqrt(SSE/(n-#params))
- 其实residual e_i的真实标准误为「RSE×(1-h_ii)」
- Information criterion(小好,其中p为参数数量,包括截距项)
- AIC = -2ln(L)+2p
- BIC = -2ln(L) + pln(n)
2. Model building (variable selection)
2.1 Backward elimination
从全自变量开始,每次扔掉一个p值大自变量,直到模型中所有自变量的p值都足够小。过程如下:
- Select a significance level ATR(alpha-to-remove)
- Fit the full model with all possible regressors
- Read the summary and find the regressor with highest p-value:
- If p > SLS: remove the regressor and continue
- If p ≤ SLS: stop, keep current model
2.2 Forward selection
每次挑一个p值小的自变量,直到p值太大、没有自变量可以进入模型。过程如下: