The tidyverse is a powerful collection of R packages that are actually data tools for transforming and visualizing data. Some core packages:
ggplot2
(You can use it to visualize your data)dplyr
(You can use it to manipulate your data)tibble
(It is a modern re-imaginging of the data frame)install.packages("tidyverse")
library(tidyverse)
glimpse(某个dataset)
filter
verb (过滤)filter
: extracts only particular observations from a dataset (筛选出满足条件的obs)
# gapminder是一个dataframe
# 筛选出「year是2007」并且「country是"United States"」的observations
gapminder %>%
filter(year == 2007, country == "United States")
# 注意%in%的用法
selected_names <- babynames %>%
# Filter for the names Steven, Thomas, and Matthew
filter(name %in% c("Steven", "Thomas", "Matthew"))
%>%
means “take whatever is before it and feed it into the next step”.
arrange
verb (排序)arrange
: sort with respect to some variables (对observations进行排序)
# 基于字段的升序排序
gapminder %>%
arrange(gdpPercap)
# 基于字段的降序排序
gapminder %>%
arrange(desc(gdpPercap))
# 先筛选出2007年的数据
# 然后基于gdpPercap降序排列
gapminder %>%
filter(year == 2007) %>%
arrange(desc(gdpPercap))
mutate
verb (修改)mutate
: add new variable / change existing variable (修改或增加字段)
# 修改:将pop列除以1000000
gapminder %>%
mutate(pop = pop / 1000000)
# 新增:增加一列gdp,算法是用已有列gdpPercap和pop相乘
gapminder %>%
mutate(gdp = gdpPercap * pop)
select
verb (选择)select
: extracts only particular variables from a dataset(截取一些列)