11.29.22 Final Project
The final project is a project of your choice. You can present anything of interest to you that deals with data analytics. The database can be the one we have been using, or else, any other database. Additionally, it is an opportunity to recommend a topic that ws not covered but could be important to add. And finally, please rate assignments from most important to least.
wk10 - 11.29.22 Final Project
wk10 - 11.29.22 Final Project
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: wk10 - 11.29.22 Final Project
Final Project: Clustering & Dimensionality Reduction
Clustering and Dimensionality Reduction are 2 effective approaches used in data analysis. Clustering is often used to see if there’s any grouping pattern in the data, while dimensionality reduction are helpful in visualizing high-dimesional data.
In this project, I collect 3 dimensional data for each of the title in database. `Number of copies`, `Number of checkout`s and `average borrow duration`. Using clustering and dimensionality reduction algorithm, I am able to find some patterns with respect to the data and also perform data visualization in a 2D plane.
Analysis is mainly written in Jupyter notebook (in Python). Check the .ipynb.zip (Since uploading .ipynb file is not allowed) file in the attachment.
Clustering and Dimensionality Reduction are 2 effective approaches used in data analysis. Clustering is often used to see if there’s any grouping pattern in the data, while dimensionality reduction are helpful in visualizing high-dimesional data.
In this project, I collect 3 dimensional data for each of the title in database. `Number of copies`, `Number of checkout`s and `average borrow duration`. Using clustering and dimensionality reduction algorithm, I am able to find some patterns with respect to the data and also perform data visualization in a 2D plane.
Analysis is mainly written in Jupyter notebook (in Python). Check the .ipynb.zip (Since uploading .ipynb file is not allowed) file in the attachment.
-
- Posts: 15
- Joined: Tue Mar 29, 2022 3:30 pm
Re: wk10 - 11.29.22 Final Project
Abstract
For most of this course, we primarily focused on analyzing data from the past and keeping it there. For this project, I thought I could focus more on future predictions as another means of exploring a topic that hasn’t been assigned yet. During week 5, I did a similar project focusing on trends, but this time I plan on solely focusing on prediction using week 8’s data set (outliers) with some tweaking. The goal is to focus on the future versus the past.
The pdf is attached labeled "Week 10 Future Predictions" along with its respective queries.
Ranking assignments (not including midterm and final):
1. Discover patterns
2. Outliers
3. Random Sampling
4. New MySQL commands
5. 2nd MySQL project
6. 1st MySQL project
For most of this course, we primarily focused on analyzing data from the past and keeping it there. For this project, I thought I could focus more on future predictions as another means of exploring a topic that hasn’t been assigned yet. During week 5, I did a similar project focusing on trends, but this time I plan on solely focusing on prediction using week 8’s data set (outliers) with some tweaking. The goal is to focus on the future versus the past.
The pdf is attached labeled "Week 10 Future Predictions" along with its respective queries.
Ranking assignments (not including midterm and final):
1. Discover patterns
2. Outliers
3. Random Sampling
4. New MySQL commands
5. 2nd MySQL project
6. 1st MySQL project
- Attachments
-
- Pandemic - Week8QueryC (1).pdf
- (43.75 KiB) Downloaded 465 times
-
- 2021 Dataset - Week10_QueryA.pdf
- (52.55 KiB) Downloaded 446 times
-
- Week 10_ Future Predictions.pdf
- (400.81 KiB) Downloaded 449 times
-
- Posts: 11
- Joined: Fri Sep 23, 2022 10:04 am
Re: wk10 - 11.29.22 Final Project
For my final project, I will be using Python, R, and Tableau as technologies to analyze a data set that I found online. The data set that I found is from Kaggle, and originally contained 5 different CSV files. The context of the data is Udemy Courses. Udemy is an online platform in which you can take courses in a variety of subjects. These courses are either free or of charge. I will first clean the data set then analyze it, perform some statistical analysis, linear modeling, and visualize some of the results and findings. The ranking of past projects for the course is also included within the pdf below.
Here are the pdfs containing my project write up and code:
Here are the pdfs containing my project write up and code:
-
- Posts: 8
- Joined: Tue Oct 04, 2022 10:24 am
Re: wk10 - 11.29.22 Final Project
In this report, I try to model the amount of time that passes between check-out and return using several variables that I constructed and a sample of 2813 observations. The linear regression showed that adult items, CDs, and DVDs tend to be returned faster. However, the regression method with my dataset failed several important diagnostics, so I conclude that these preliminary findings should be tested using a different method, more appropriate for this data.
My ranking of assignments:
1) Patterns
2) New commands
3) Outliers
4) 2nd project
5) 1st project
6) sampling
My ranking of assignments:
1) Patterns
2) New commands
3) Outliers
4) 2nd project
5) 1st project
6) sampling
- Attachments
-
- final_analysis.pdf
- (2.66 MiB) Downloaded 441 times
-
- Final.pdf
- (345.5 KiB) Downloaded 423 times
-
- checkouts_sample_custom_variables.csv
- (124.95 KiB) Downloaded 437 times