11.22.22 Random Sampling
The project this week is to explore random sampling to see if it can reveal something interesting about the dataset. MySQL's rand() function returns a value between 0 and 1. A seed number can be used such as rand(1000) which will return the same random sequence each time it is called.
The assignment is to do a random sampling of checkouts or checkins, or any other form where the result is a sampling and to identify patterns.
Theoretically, each element in the collection has an equal probability to be selected. There is then the interest in finding out to what degree can patterns surface, even though the sampling is random. Are results similar or not if random sampling is done in different media types?
What about sampling errors? At https://www.investopedia.com/terms/s/samplingerror.asp sampling error is described as a deviation in the sampled value versus the true population value.
wk9 - 11.22.22 Random Sampling
wk9 - 11.22.22 Random Sampling
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: wk9 - 11.22.22 Random Sampling
Random Sampling is a method to estimate characteristics of the whole population by sample a subset within the whole population randomly. For this week‘s assignment, I am interested in questions below:
* By using random sampling technique, I try to find if there is any pattern within the sampled data? Is the pattern any different from the whole population?
* If the pattern sampled in CD is any different from that in DVD or Books?
* For each media type, sample 3 times in a row, compare their differences.
* Note that the bibNumber and itemNumber follows a linear pattern within a lower itemNumber range. If we randomly sample the bibNumber and itemNumber within the same range, will the same pattern appear?
Report: Below are results and visualizations
* By using random sampling technique, I try to find if there is any pattern within the sampled data? Is the pattern any different from the whole population?
* If the pattern sampled in CD is any different from that in DVD or Books?
* For each media type, sample 3 times in a row, compare their differences.
* Note that the bibNumber and itemNumber follows a linear pattern within a lower itemNumber range. If we randomly sample the bibNumber and itemNumber within the same range, will the same pattern appear?
Report: Below are results and visualizations
-
- Posts: 8
- Joined: Tue Oct 04, 2022 10:24 am
Re: wk9 - 11.22.22 Random Sampling
In this report, I repeatedly draw samples of increasing size to demonstrate that higher sample size results in a more representative sample. I use all checkouts in 2018 as my population and use item type distribution as the main parameter to evaluate how well samples represent original data. I was able to achieve acceptable results with sample sizes of 50000 and 100000 (1.5% and 3% of the population).
- Attachments
-
- itemtypes_sample_100000.csv
- (190 Bytes) Downloaded 59 times
-
- itemtypes_sample_50000.csv
- (186 Bytes) Downloaded 61 times
-
- itemtypes_sample_20000.csv
- (177 Bytes) Downloaded 63 times
-
- itemtypes_sample_10000.csv
- (170 Bytes) Downloaded 58 times
-
- itemtypes_sample_5000.csv
- (170 Bytes) Downloaded 61 times
-
- itemtypes_sample_2000.csv
- (146 Bytes) Downloaded 67 times
-
- itemtypes_sample_1000.csv
- (128 Bytes) Downloaded 64 times
-
- itemtypes_sample_500.csv
- (90 Bytes) Downloaded 58 times
-
- itemtypes_sample_100.csv
- (84 Bytes) Downloaded 63 times
-
- itemtypes_all_data.csv
- (194 Bytes) Downloaded 67 times
-
- Assignment 7.pdf
- (193.84 KiB) Downloaded 72 times
-
- Posts: 15
- Joined: Tue Mar 29, 2022 3:30 pm
Re: wk9 - 11.22.22 Random Sampling
Abstract
This week’s assignment calls for us to explore the technique of random sampling in the Seattle Library database. For this project, I decided to test the implementation of a blind book date event at the Seattle Public Library. Blind book dates have become popular in the most recent years as a way for readers to detach their biases from pretty book covers and focus more on the work itself. The idea is that the books are either physically or virtually covered/blocked so that the patron may not see the title, and instead may only read the description of the book. They then make a selection from a random sample of profiles and make their pick - all without even knowing their fated book’s name! This project will focus on creating a simulation of such an event using randomization.
Below is the pdf titled "Week 9 Random Sampling" along with its respective queries that are also included within the pdf itself.
AN UPDATED PDF CONTAINING THE TEST FOR RANDOMNESS IS ATTACHED (TITLED "UPDATED..")
This week’s assignment calls for us to explore the technique of random sampling in the Seattle Library database. For this project, I decided to test the implementation of a blind book date event at the Seattle Public Library. Blind book dates have become popular in the most recent years as a way for readers to detach their biases from pretty book covers and focus more on the work itself. The idea is that the books are either physically or virtually covered/blocked so that the patron may not see the title, and instead may only read the description of the book. They then make a selection from a random sample of profiles and make their pick - all without even knowing their fated book’s name! This project will focus on creating a simulation of such an event using randomization.
Below is the pdf titled "Week 9 Random Sampling" along with its respective queries that are also included within the pdf itself.
AN UPDATED PDF CONTAINING THE TEST FOR RANDOMNESS IS ATTACHED (TITLED "UPDATED..")
- Attachments
-
- UPDATED Week 9_ Random Sampling .pdf
- (8.41 MiB) Downloaded 59 times
-
- Week 9_ Random Sampling.pdf
- (7.32 MiB) Downloaded 69 times
-
- Week9QueryD - Week9QueryD (1).pdf
- (17.68 KiB) Downloaded 66 times
-
- Week9QueryE - Week9QueryE.pdf
- (23.54 KiB) Downloaded 107 times
-
- Week9QueryC - Week9QueryC (1).pdf
- (59.7 KiB) Downloaded 69 times
-
- Week 9 Query A - Week9QueryB (1).pdf
- (55.32 KiB) Downloaded 57 times
Last edited by nataliadubon on Tue Nov 29, 2022 3:19 am, edited 1 time in total.
-
- Posts: 11
- Joined: Fri Sep 23, 2022 10:04 am
Re: wk9 - 11.22.22 Random Sampling
I will take a look at a random sampling of the check-ins and checkouts at the Seattle Public library. I will randomly select the month of 2% of the returns from the SPL filtering by 4 distinct item types. Following, I will compare the results of the queries and identify any possible patterns that arise.
Here is a pdf with the queries, analysis, and conclusion of the assignment: Here are the output CSV files:
Here is a pdf with the queries, analysis, and conclusion of the assignment: Here are the output CSV files: