+ - 0:00:00
Notes for current slide
Notes for next slide



Changing Variables

Dr. Mine Dogucu

1 / 13
glimpse(lapd)
## Rows: 14,824
## Columns: 3
## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L…
## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "…
## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,…

Goal:

Create a new variable called base_pay_k that represents base_pay in thousand dollars.

2 / 13
lapd %>%
mutate(base_pay_k = base_pay/1000)
## # A tibble: 14,824 x 4
## job_class_title employment_type base_pay base_pay_k
## <chr> <chr> <dbl> <dbl>
## 1 Police Detective II Full Time 119322. 119.
## 2 Police Sergeant I Full Time 113271. 113.
## 3 Police Lieutenant II Full Time 148116 148.
## 4 Police Service Representative II Full Time 78677. 78.7
## 5 Police Officer III Full Time 109374. 109.
## 6 Police Officer II Full Time 95002. 95.0
## # … with 14,818 more rows
3 / 13
glimpse(lapd)
## Rows: 14,824
## Columns: 3
## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L…
## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "…
## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,…

Goal:

Create a new variable called base_pay_level which has Less Than 0, No Income, Less than Median and Greater than 0 and Greater than Median. We will consider $62474 as the median (from previous lecture).

4 / 13

Let's first check to see there is anyone earning exactly the median value.

lapd %>%
filter(base_pay == 62474)
## # A tibble: 0 x 3
## # … with 3 variables: job_class_title <chr>, employment_type <chr>,
## # base_pay <dbl>
5 / 13
lapd %>%
mutate(base_pay_level = case_when(
base_pay < 0 ~ "Less than 0",
base_pay == 0 ~ "No Income",
base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0",
base_pay > 62474 ~ "Greater than Median"))
## # A tibble: 14,824 x 4
## job_class_title employment_type base_pay base_pay_level
## <chr> <chr> <dbl> <chr>
## 1 Police Detective II Full Time 119322. Greater than Median
## 2 Police Sergeant I Full Time 113271. Greater than Median
## 3 Police Lieutenant II Full Time 148116 Greater than Median
## 4 Police Service Representative II Full Time 78677. Greater than Median
## 5 Police Officer III Full Time 109374. Greater than Median
## 6 Police Officer II Full Time 95002. Greater than Median
## # … with 14,818 more rows
6 / 13

We can't really see what we have created

lapd %>%
mutate(base_pay_level = case_when(
base_pay < 0 ~ "Less than 0",
base_pay == 0 ~ "No Income",
base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0",
base_pay > 62474 ~ "Greater than Median")) %>%
select(base_pay_level)
## # A tibble: 14,824 x 1
## base_pay_level
## <chr>
## 1 Greater than Median
## 2 Greater than Median
## 3 Greater than Median
## 4 Greater than Median
## 5 Greater than Median
## 6 Greater than Median
## # … with 14,818 more rows
7 / 13

We can use pipes with ggplot too!

lapd %>%
mutate(base_pay_level = case_when(
base_pay < 0 ~ "Less than 0",
base_pay == 0 ~ "No Income",
base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0",
base_pay > 62474 ~ "Greater than Median")) %>%
select(base_pay_level) %>%
ggplot(aes(x = base_pay_level)) +
geom_bar()

8 / 13
glimpse(lapd)
## Rows: 14,824
## Columns: 3
## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L…
## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "…
## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,…

Goal:

Make job_class_title and employment_type factor variables.

9 / 13
lapd %>%
mutate(employment_type = as.factor(employment_type),
job_class_title = as.factor(job_class_title))
## # A tibble: 14,824 x 3
## job_class_title employment_type base_pay
## <fct> <fct> <dbl>
## 1 Police Detective II Full Time 119322.
## 2 Police Sergeant I Full Time 113271.
## 3 Police Lieutenant II Full Time 148116
## 4 Police Service Representative II Full Time 78677.
## 5 Police Officer III Full Time 109374.
## 6 Police Officer II Full Time 95002.
## # … with 14,818 more rows
10 / 13

as.factor() - makes a vector factor
as.numeric() - makes a vector numeric
as.integer() - makes a vector integer
as.double() - makes a vector double
as.character() - makes a vector character

11 / 13

Once again we did not "save" anything into lapd. As we work on data cleaning it makes sense not to "save" the data frames. Once we see the final data frame we want then we can "save" (i.e. overwrite) it.

12 / 13
lapd <-
lapd %>%
mutate(employment_type = as.factor(employment_type),
job_class_title = as.factor(job_class_title),
base_pay_level = case_when(
base_pay < 0 ~ "Less than 0",
base_pay == 0 ~ "No Income",
base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0",
base_pay > 62474 ~ "Greater than Median"))
13 / 13
glimpse(lapd)
## Rows: 14,824
## Columns: 3
## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L…
## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "…
## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,…

Goal:

Create a new variable called base_pay_k that represents base_pay in thousand dollars.

2 / 13
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow