Transforming y and/or x has the potential to remedy a number of model problems. We try a transformation and then check to see if the transformation eliminated the problems with the model. If it doesn't help, we try another transformation, and so on. We continue this cyclical process until we've built a model that is appropriate and we can use it.

1. Log transformation

一些经验法则↓

2. Polynomial regression

$$ y = \beta_0 + \beta_1 x + \beta_2x^2 + \beta_3x^3 + \cdots $$

Remarks:

  1. Hierarchical principle: 若higher term保留在model中,则lower term也必须在,不论lower term是否显著——因为没有低次项,就难以解释高次项。
  2. 多重共线性:用mean center来缓解。
  3. 解释:x每增加一个单位,E(y)的变化不再是固定的,而是取决于x的值。
  4. 假设检验:首先对该自变量的所有次方项进行联合检验,以确定是否需要引入该变量;若通过,再对高次项进行检验,看是否需要引入该变量的高次项
  5. 一个应用:piecewise polynomial (i.e. splines)
    1. Motivation: relationship between y and x is different for different ranges of x
    2. Hence, divide the range of x into some segments; fit an appropriate f in each seg
      1. step 1: 决定切点(i.e. knots)

      2. step 2:

        Untitled

3. Dummy variables

某个定类变量有M组,则先确定baseline group,然后对其余组构造M-1个dummy variables。例如race变量表示种族,有三个可取值——Caucasian, African American, Asian。若我们以Asian为baseline,引入x1和x2: