wk10 - 11.29.22 Final Project

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

wk10 - 11.29.22 Final Project

Post by glegrady » Fri Sep 16, 2022 8:01 am

11.29.22 Final Project

The final project is a project of your choice. You can present anything of interest to you that deals with data analytics. The database can be the one we have been using, or else, any other database. Additionally, it is an opportunity to recommend a topic that ws not covered but could be important to add. And finally, please rate assignments from most important to least.
George Legrady
legrady@mat.ucsb.edu

shaokang
Posts: 8
Joined: Fri Sep 23, 2022 10:07 am

Re: wk10 - 11.29.22 Final Project

Post by shaokang » Mon Nov 28, 2022 10:44 pm

Final Project: Clustering & Dimensionality Reduction
Clustering and Dimensionality Reduction are 2 effective approaches used in data analysis. Clustering is often used to see if there’s any grouping pattern in the data, while dimensionality reduction are helpful in visualizing high-dimesional data.

In this project, I collect 3 dimensional data for each of the title in database. `Number of copies`, `Number of checkout`s and `average borrow duration`. Using clustering and dimensionality reduction algorithm, I am able to find some patterns with respect to the data and also perform data visualization in a 2D plane.

Analysis is mainly written in Jupyter notebook (in Python). Check the .ipynb.zip (Since uploading .ipynb file is not allowed) file in the attachment.
project.ipynb.zip
(160.66 KiB) Downloaded 138 times
data.csv
(2.82 MiB) Downloaded 119 times
Final-Clustering & Dimensionality Reduction.pdf
(1.01 MiB) Downloaded 122 times

nataliadubon
Posts: 15
Joined: Tue Mar 29, 2022 3:30 pm

Re: wk10 - 11.29.22 Final Project

Post by nataliadubon » Tue Nov 29, 2022 2:16 am

Abstract
For most of this course, we primarily focused on analyzing data from the past and keeping it there. For this project, I thought I could focus more on future predictions as another means of exploring a topic that hasn’t been assigned yet. During week 5, I did a similar project focusing on trends, but this time I plan on solely focusing on prediction using week 8’s data set (outliers) with some tweaking. The goal is to focus on the future versus the past.

The pdf is attached labeled "Week 10 Future Predictions" along with its respective queries.

Ranking assignments (not including midterm and final):
1. Discover patterns
2. Outliers
3. Random Sampling
4. New MySQL commands
5. 2nd MySQL project
6. 1st MySQL project
Attachments
Pandemic - Week8QueryC (1).pdf
(43.75 KiB) Downloaded 147 times
2021 Dataset - Week10_QueryA.pdf
(52.55 KiB) Downloaded 117 times
Week 10_ Future Predictions.pdf
(400.81 KiB) Downloaded 140 times

briannagriffin
Posts: 11
Joined: Fri Sep 23, 2022 10:04 am

Re: wk10 - 11.29.22 Final Project

Post by briannagriffin » Tue Nov 29, 2022 11:37 am

For my final project, I will be using Python, R, and Tableau as technologies to analyze a data set that I found online. The data set that I found is from Kaggle, and originally contained 5 different CSV files. The context of the data is Udemy Courses. Udemy is an online platform in which you can take courses in a variety of subjects. These courses are either free or of charge. I will first clean the data set then analyze it, perform some statistical analysis, linear modeling, and visualize some of the results and findings. The ranking of past projects for the course is also included within the pdf below.

Here are the pdfs containing my project write up and code:
Final Project - MAT 265.pdf
(956.53 KiB) Downloaded 124 times
r_script_finalproject.pdf
(95.16 KiB) Downloaded 128 times
final_project_statistics.pdf
(152.24 KiB) Downloaded 128 times

ilianikiforov
Posts: 8
Joined: Tue Oct 04, 2022 10:24 am

Re: wk10 - 11.29.22 Final Project

Post by ilianikiforov » Tue Nov 29, 2022 2:17 pm

In this report, I try to model the amount of time that passes between check-out and return using several variables that I constructed and a sample of 2813 observations. The linear regression showed that adult items, CDs, and DVDs tend to be returned faster. However, the regression method with my dataset failed several important diagnostics, so I conclude that these preliminary findings should be tested using a different method, more appropriate for this data.

My ranking of assignments:
1) Patterns
2) New commands
3) Outliers
4) 2nd project
5) 1st project
6) sampling
Attachments
final_analysis.pdf
(2.66 MiB) Downloaded 135 times
Final.pdf
(345.5 KiB) Downloaded 110 times
checkouts_sample_custom_variables.csv
(124.95 KiB) Downloaded 119 times

Post Reply