class: title-slide <br> <br> .pull-right[ # Samples and Populations ## Dr. Mine Dogucu ] --- ## Research question Do UCI students who exercise regularly have higher GPA? --- ## Population Each research question aims to examine a __population__. Population for this research question is UCI students. --- ## Survey - Do you exercise at least once every week? - What is your GPA? --- ## Sampling When it is impossible or costly to get information from all of the __population__ we take a __sample__. Since it would be almost impossible to give the survey to ALL UCI students, we can give it to a sample of students. --- ## Convenience Sampling - Stand in front of Langson Library - Give the survey to 100 UCI students -- This could introduce __bias__. It is possible that those in front of the library - may study more and thus may have higher GPA. - may be more active than those who study at home/dorm. --- ## Simple Random Sample The goal is to have a sample that is __representative__ of the population. We can use simple random sampling to select the sample. Using simple random sampling technique anybody in the population has an equal chance of being selected to the sample. [Random Number Generator](https://www.google.com/search?channel=cus2&client=firefox-b-1-d&q=random+number+generator) --- ## Simple Random Sample The researcher can - reach out to the registrar to get student emails. - randomly select 100 students. - email them the survey All of the 100 selected students respond. Population: All UCI students Sample: 100 students who have responded --- ## Nonresponse Bias It is unlikely that 100 students will respond. Assume that 86 respond. It is possible that those 14 who did not respond - may be busy exercising and did not have the time to respond. - may be busy studying and did not have the time to respond. This kind of bias is called __nonresponse bias__. --- ## Example A scientist is interested in counting the number of different species of bacteria in San Diego Creek. She takes a bucket of water from San Diego Creek and counts the different specifies of bacteria. The bacteria in the bucket make up the __sample__ and the bacteria in San Diego Creek make up the __population__. Note that this is not a simple random sample! --- ## Example In customer satisfaction surveys, the intent is to get opinion on all customers (population). The sample is based on self-selection. Those who want to provide opinion provide it. Not a random sample! <img src="https://cdn.pixabay.com/photo/2017/01/13/18/56/feedback-1977986_960_720.jpg" width="40%" style="display: block; margin: auto;" /> Image by <a href="https://pixabay.com/users/Tumisu-148124/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=1977986">Tumisu</a> from <a href="https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=1977986">Pixabay</a> --- ## Practice A cell phone factory wants to do quality control of their products to ensure that their customers do not get malfunctioning products. They randomly select 3% of the phones that they produce and manually check if there are any problems or not. All the cell phones in the world make up the population and the cell phones that this factory produces make up the sample. **True or False**? --- ## Generalizibility Why do we care to have a sample that is **representative** of the population? -- We want the findings of studies to be **generalizable** to the population.