wk9 - 11.22.22 Random Sampling

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

wk9 - 11.22.22 Random Sampling

Post by glegrady » Sun Oct 30, 2022 4:41 pm

11.22.22 Random Sampling

The project this week is to explore random sampling to see if it can reveal something interesting about the dataset. MySQL's rand() function returns a value between 0 and 1. A seed number can be used such as rand(1000) which will return the same random sequence each time it is called.

The assignment is to do a random sampling of checkouts or checkins, or any other form where the result is a sampling and to identify patterns.

Theoretically, each element in the collection has an equal probability to be selected. There is then the interest in finding out to what degree can patterns surface, even though the sampling is random. Are results similar or not if random sampling is done in different media types?

What about sampling errors? At https://www.investopedia.com/terms/s/samplingerror.asp sampling error is described as a deviation in the sampled value versus the true population value.
George Legrady
legrady@mat.ucsb.edu

shaokang
Posts: 8
Joined: Fri Sep 23, 2022 10:07 am

Re: wk9 - 11.22.22 Random Sampling

Post by shaokang » Sun Nov 20, 2022 8:02 pm

Random Sampling is a method to estimate characteristics of the whole population by sample a subset within the whole population randomly. For this week‘s assignment, I am interested in questions below:

* By using random sampling technique, I try to find if there is any pattern within the sampled data? Is the pattern any different from the whole population?
* If the pattern sampled in CD is any different from that in DVD or Books?
* For each media type, sample 3 times in a row, compare their differences.
* Note that the bibNumber and itemNumber follows a linear pattern within a lower itemNumber range. If we randomly sample the bibNumber and itemNumber within the same range, will the same pattern appear?

Report:
Week 09 Random Sampling.pdf
(588.68 KiB) Downloaded 40 times
Below are results and visualizations
bibDist_03.png
bibDist_01.csv
(13.13 KiB) Downloaded 30 times
bibDist_03.csv
(13.15 KiB) Downloaded 29 times
bibDist_01.png
bibDist_02.png
bibDist_02.csv
(13.12 KiB) Downloaded 35 times
CD_100_03.csv
(3.63 KiB) Downloaded 35 times
CD_100_02.csv
(3.44 KiB) Downloaded 39 times
CD_100_01.csv
(3.55 KiB) Downloaded 32 times
Book_100_02.csv
(4.62 KiB) Downloaded 34 times
Book_100_01.csv
(4.91 KiB) Downloaded 45 times
Book_100_03.csv
(4.69 KiB) Downloaded 37 times
Book_all.csv
(9.68 MiB) Downloaded 40 times
DVD_100_01.csv
(3.74 KiB) Downloaded 38 times
DVD_100_02.csv
(3.48 KiB) Downloaded 40 times
DVD_100_03.csv
(3.85 KiB) Downloaded 43 times
DVD_all.csv
(789.2 KiB) Downloaded 29 times
CD_all.csv
(1.01 MiB) Downloaded 34 times

ilianikiforov
Posts: 8
Joined: Tue Oct 04, 2022 10:24 am

Re: wk9 - 11.22.22 Random Sampling

Post by ilianikiforov » Mon Nov 21, 2022 9:37 pm

In this report, I repeatedly draw samples of increasing size to demonstrate that higher sample size results in a more representative sample. I use all checkouts in 2018 as my population and use item type distribution as the main parameter to evaluate how well samples represent original data. I was able to achieve acceptable results with sample sizes of 50000 and 100000 (1.5% and 3% of the population).
Attachments
itemtypes_sample_100000.csv
(190 Bytes) Downloaded 35 times
itemtypes_sample_50000.csv
(186 Bytes) Downloaded 37 times
itemtypes_sample_20000.csv
(177 Bytes) Downloaded 35 times
itemtypes_sample_10000.csv
(170 Bytes) Downloaded 28 times
itemtypes_sample_5000.csv
(170 Bytes) Downloaded 36 times
itemtypes_sample_2000.csv
(146 Bytes) Downloaded 40 times
itemtypes_sample_1000.csv
(128 Bytes) Downloaded 34 times
itemtypes_sample_500.csv
(90 Bytes) Downloaded 32 times
itemtypes_sample_100.csv
(84 Bytes) Downloaded 36 times
itemtypes_all_data.csv
(194 Bytes) Downloaded 39 times
Assignment 7.pdf
(193.84 KiB) Downloaded 45 times

nataliadubon
Posts: 15
Joined: Tue Mar 29, 2022 3:30 pm

Re: wk9 - 11.22.22 Random Sampling

Post by nataliadubon » Tue Nov 22, 2022 12:01 pm

Abstract
This week’s assignment calls for us to explore the technique of random sampling in the Seattle Library database. For this project, I decided to test the implementation of a blind book date event at the Seattle Public Library. Blind book dates have become popular in the most recent years as a way for readers to detach their biases from pretty book covers and focus more on the work itself. The idea is that the books are either physically or virtually covered/blocked so that the patron may not see the title, and instead may only read the description of the book. They then make a selection from a random sample of profiles and make their pick - all without even knowing their fated book’s name! This project will focus on creating a simulation of such an event using randomization.

Below is the pdf titled "Week 9 Random Sampling" along with its respective queries that are also included within the pdf itself.

AN UPDATED PDF CONTAINING THE TEST FOR RANDOMNESS IS ATTACHED (TITLED "UPDATED..")
Attachments
UPDATED Week 9_ Random Sampling .pdf
(8.41 MiB) Downloaded 39 times
Week 9_ Random Sampling.pdf
(7.32 MiB) Downloaded 38 times
Week9QueryD - Week9QueryD (1).pdf
(17.68 KiB) Downloaded 37 times
Week9QueryE - Week9QueryE.pdf
(23.54 KiB) Downloaded 76 times
Week9QueryC - Week9QueryC (1).pdf
(59.7 KiB) Downloaded 40 times
Week 9 Query A - Week9QueryB (1).pdf
(55.32 KiB) Downloaded 31 times
Last edited by nataliadubon on Tue Nov 29, 2022 3:19 am, edited 1 time in total.

briannagriffin
Posts: 11
Joined: Fri Sep 23, 2022 10:04 am

Re: wk9 - 11.22.22 Random Sampling

Post by briannagriffin » Fri Nov 25, 2022 1:22 pm

I will take a look at a random sampling of the check-ins and checkouts at the Seattle Public library. I will randomly select the month of 2% of the returns from the SPL filtering by 4 distinct item types. Following, I will compare the results of the queries and identify any possible patterns that arise.

Here is a pdf with the queries, analysis, and conclusion of the assignment:
Week 9 _ Random Sampling Assignment.pdf
(172.68 KiB) Downloaded 29 times
Here are the output CSV files:
cin_book.csv
(108 Bytes) Downloaded 31 times
cin_CD.csv
(105 Bytes) Downloaded 34 times
cin_DVD.csv
(105 Bytes) Downloaded 32 times
cin_record.csv
(55 Bytes) Downloaded 28 times
cin_videoVHS.csv
(93 Bytes) Downloaded 35 times
checkins_comparison.csv
(941 Bytes) Downloaded 41 times

Post Reply