class: title-slide <br> <br> .pull-right[ # Indicator Variables ## Dr. Mine Dogucu ] --- #### Data `babies` in `openintro` package ``` ## Rows: 1,236 ## Columns: 8 ## $ case <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, … ## $ bwt <int> 120, 113, 128, 123, 108, 136, 138, 132, 120, 143, 140, 144,… ## $ gestation <int> 284, 282, 279, NA, 282, 286, 244, 245, 289, 299, 351, 282, … ## $ parity <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,… ## $ age <int> 27, 33, 28, 36, 23, 25, 33, 23, 25, 30, 27, 32, 23, 36, 30,… ## $ height <int> 62, 64, 64, 69, 67, 62, 62, 65, 62, 66, 68, 64, 63, 61, 63,… ## $ weight <int> 100, 135, 115, 190, 125, 93, 178, 140, 125, 136, 120, 124, … ## $ smoke <int> 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1,… ``` --- class: middle <div align = "center"> | y | Response | Birth weight | Numeric | |---|-------------|-----------------|---------| | x | Explanatory | Smoke | Categorical | --- ## Notation `\(y_i = \beta_0 +\beta_1x_i + \epsilon_i\)` `\(\beta_0\)` is y-intercept `\(\beta_1\)` is slope `\(\epsilon_i\)` is error/residual `\(i = 1, 2, ...n\)` identifier for each point --- ```r model_s <- lm(bwt ~ smoke, data = babies) tidy(model_s) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 123. 0.649 190. 0. ## 2 smoke -8.94 1.03 -8.65 1.55e-17 ``` -- `\(\hat {y}_i = b_0 + b_1 x_i\)` `\(\hat {\text{bwt}_i} = b_0 + b_1 \text{ smoke}_i\)` `\(\hat {\text{bwt}_i} = 123 + (-8.94\text{ smoke}_i)\)` --- class: middle .pull-left[ #### Expected bwt for a baby with a non-smoker mother `\(\hat {\text{bwt}_i} = 123 + (-8.94\text{ smoke}_i)\)` `\(\hat {\text{bwt}_i} = 123 + (-8.94\times 0)\)` `\(\hat {\text{bwt}_i} = 123\)` `\(E[bwt_i | smoke_i = 0] = b_0\)` ] -- .pull-right[ #### Expected bwt for a baby with a smoker mother `\(\hat {\text{bwt}_i} = 123 + (-8.94\text{ smoke}_i)\)` `\(\hat {\text{bwt}_i} = 123 + (-8.94\times 1)\)` `\(\hat {\text{bwt}_i} = 114.06\)` `\(E[bwt_i | smoke_i = 1] = b_0 + b_1\)` ] --- ```r confint(model_s) ``` ``` ## 2.5 % 97.5 % ## (Intercept) 121.77391 124.320430 ## smoke -10.96413 -6.911199 ``` Note that the confidence interval for the "slope" does not contain 0 and all the values in the interval are negative.