7.1 Overview

The tidyverse is a powerful collection of R packages that are actually data tools for transforming and visualizing data. Some core packages:

install.packages("tidyverse")
library(tidyverse)
glimpse(某个dataset)

7.2 The filter verb (过滤)

filter: extracts only particular observations from a dataset (筛选出满足条件的obs)

# gapminder是一个dataframe
# 筛选出「year是2007」并且「country是"United States"」的observations
gapminder %>%
	filter(year == 2007, country == "United States")

# 注意%in%的用法
selected_names <- babynames %>%
  # Filter for the names Steven, Thomas, and Matthew 
  filter(name %in% c("Steven", "Thomas", "Matthew"))

%>% means “take whatever is before it and feed it into the next step”.

7.3 The arrange verb (排序)

arrange: sort with respect to some variables (对observations进行排序)

# 基于字段的升序排序
gapminder %>%
	arrange(gdpPercap)

# 基于字段的降序排序
gapminder %>%
	arrange(desc(gdpPercap))

# 先筛选出2007年的数据
# 然后基于gdpPercap降序排列
gapminder %>%
	filter(year == 2007) %>%
		arrange(desc(gdpPercap))

7.4 The mutate verb (修改)

mutate: add new variable / change existing variable (修改或增加字段)

# 修改:将pop列除以1000000
gapminder %>%
	mutate(pop = pop / 1000000)

# 新增:增加一列gdp,算法是用已有列gdpPercap和pop相乘
gapminder %>%
	mutate(gdp = gdpPercap * pop)

7.5 The select verb (选择)

select: extracts only particular variables from a dataset(截取一些列)