wk6 - 11.01.22 MidTerm Presentation

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

wk6 - 11.01.22 MidTerm Presentation

Post by glegrady » Fri Sep 16, 2022 7:57 am

11.01.22 MidTerm Presentation

Presentation is focused on Frequency Pattern mining.


. What patterns emerge in terms of what circulates at what hour of the day
. What are temporal patterns throughout the day, days of the week, months, years
. Is there a correlation between checkout and return times and topics?
. What are co-occurrence patterns through frequency-pattern algorithm searches?
. Prediction analysis: If certain things circulate over certain periods, what are the chances of

--
. Are there correlations between topics and items that disappear?
. What are short-term, long-term performance of titles, topics, media, etc.
. What is an object’s life expectancy in relation to the subject’s performance based on their ID?
. Sequential history: when something is returned, what items are then checked-out

--

Frequent Pattern Mining (AKA Association Rule Mining) is an analytical process that finds frequent patterns, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other data repositories. Given a set of transactions, this process aims to find the rules that enable us to predict the occurrence of a specific item based on the occurrence of other items in the transaction.

Let’s look at an example of Frequent Pattern Mining. First, we will want to understand the terminology used in this type of analysis. While there are numerous metrics and factors used in this technique, for this example, we will only consider two factors namely, Support and Confidence.

Support: The support of a rule x -> y (where x and y are each items/events etc.) is defined as the proportion of transactions in the data set which contain the item set x as well as y. So, Support (x -> y)= no. of transactions which contain the item set x & y / total no. of transactions.

Confidence: The confidence of a rule x -> y is defined as: Support (x -> y) / support (x). So, it is the ratio of the number of transactions that include all items in the consequent (y in this case), as well as the antecedent (x in this case) to the number of transactions that include all items in the antecedent (x in this case).

In the table below, Support (milk->bread) = 0.4 means milk and bread are purchased together occur in 40% of all transactions. Confidence (milk->bread) = 0.5 means that if there are 100 transactions containing milk then there will be 50 that will also contain bread.

The attached drawing comes from this website: https://www.dataversity.net/frequent-pa ... -analysis/#
Attachments
chart.jpg
George Legrady
legrady@mat.ucsb.edu

shaokang
Posts: 8
Joined: Fri Sep 23, 2022 10:07 am

Re: wk6 - 11.01.22 MidTerm Presentation

Post by shaokang » Tue Nov 01, 2022 2:06 am

Frequency-pattern related algorithm is used as an analytical process that finds frequent patterns or associations from data sets. For example, grocery store transaction data might have a frequent pattern that people usually buy chips and beer together. With this tool, I would like to check if there’s any pattern in Album checkouts, specifically, what albums would people like to checkout together? To take a deeper dive in the CD checkout data, questions below are also considered:

* What does people like to checkout together with a specific singer (in this case, Adele’s / Taylor Swift’s) CD?

* FP-Grow (Problem)
* Generally, what do people like to checkout together?
* How long did each CD last in Library.
* New purchase by year together compared with its popularity
Week 5 Frequency Mining.pdf
(803.77 KiB) Downloaded 46 times
SQL Queries, Python code, link, snapshot of results are all attached in PDF.

Some of the data
fearless.csv
(95.64 KiB) Downloaded 38 times
CD_2022_09.csv
(180.46 KiB) Downloaded 39 times
1989.csv
(41.3 KiB) Downloaded 36 times
lover.csv
(9.58 KiB) Downloaded 39 times
reputation.csv
(11.23 KiB) Downloaded 36 times
adele.csv
(204.68 KiB) Downloaded 38 times
sep_2022.csv
(11.72 MiB) Downloaded 43 times

briannagriffin
Posts: 11
Joined: Fri Sep 23, 2022 10:04 am

Re: wk6 - 11.01.22 MidTerm Presentation

Post by briannagriffin » Tue Nov 01, 2022 10:48 am

This week, I am focusing on the first book and movie of The Hunger Games trilogy. I have found that checkouts of both the book and DVD copy of the movie have been popular at the Seattle Public Library over time. Due to this prevalence, I will be analyzing the popularity and trends in checkouts of each over time. I am also looking at the difference between tracking barcodes and itemNumber for the novel and movie during a set period of time in order to see if there are any discrepancies in the library’s tracking and data categorization system. Finally, I want to look at the lives of the many copies of the Hunger Games books. I want to see when the first time it was checked out in comparison to the last time it was checked out, measuring the time in between and how long the physical copy has been used for.

Here is a pdf of the analysis along with results that I found:
Frequency Pattern Mining Assignment.pdf
(306.64 KiB) Downloaded 44 times
Here are my queries:
SQL Queries_ Week 6.pdf
(52.09 KiB) Downloaded 36 times
Following, here are the output CSV files and supplemental graphs that I created:
Week6_DATA.xlsx
(1.65 MiB) Downloaded 33 times

ilianikiforov
Posts: 8
Joined: Tue Oct 04, 2022 10:24 am

Re: wk6 - 11.01.22 MidTerm Presentation

Post by ilianikiforov » Tue Nov 01, 2022 2:23 pm

In this report, I dive deeper into the exploration of seasonal trends in the SPL, this time using daily Data from March 2015. I explore variables such as total checkouts, checkouts by type, diversity of titles, diversity of types, and the weight of a top performing item each day against temperature and precipitation. I use both visualization and a correlation matrix. The conclusion: there seems to be no seasonal trends based on daily weather in SPL, contrary to last week’s findings.
Attachments
top_items_per_day.csv
(34.95 KiB) Downloaded 37 times
distinct_items_and_types_by_day.csv
(385 Bytes) Downloaded 34 times
cout_march_2015_by_type.csv
(4.21 KiB) Downloaded 41 times
cout_march_2015.csv
(281 Bytes) Downloaded 44 times
Assignment 4.pdf
(406.34 KiB) Downloaded 39 times

nataliadubon
Posts: 15
Joined: Tue Mar 29, 2022 3:30 pm

Re: wk6 - 11.01.22 MidTerm Presentation

Post by nataliadubon » Tue Nov 01, 2022 2:37 pm

For my project for this week, I would like to continue what I had started last week but dive deeper into pattern recognition and hopefully create a better machine learning model for prediction. Regression is a method of modeling a target value based on independent predictors. This method is mostly used for forecasting and finding out the cause and effect relationship between variables (association). Regression techniques mostly differ based on the number of independent variables and the type of relationship between the independent and dependent variables. The idea is to apply linear regression to multiple sets of data as I had started with last week. It's the mathematically best way of determining the trend over the hour for each ItemNumber and it will select out only the ones with a positive trend. For this week’s student forum on patterns, I decided to explore the Seattle Public Library dataset to find statistical correlations between the progression of time and a subject’s total corresponding checkouts. For this, I’ve decided to choose all items that relate to Data Science (as last week we found that Data Science had the greatest upward trend). I essentially want to discover if I can make a predictable statistical linear model that will be able to answer my question regarding such correlation. All queries are cited along the descriptions/analysis and can also be found in its own section further below. Note that I have chosen to use both SQL and R for this week’s assignment due to some limitations I find SQL to have in comparison to R regarding running statistical methods.

The queries and files are attached to the pdf below!
Attachments
Week 5_ Finding Patterns within Library Data (1).pdf
(412.5 KiB) Downloaded 40 times

Post Reply