Proj 1 - Data Exploration Concept & MySQL Query
Posted: Fri Dec 23, 2022 7:33 am
Proj 1 - Data Exploration Concept & MySQL Query
This is the 1st of 3 assignments to be realized in the MAT 259 Data Visualization course: https://www.mat.ucsb.edu/~g.legrady/aca ... 3w259.html
----------------------------------
PROJECT ASSIGNMENT: The assignment is to come up with an interesting MySQL query with results exploring the Seattle Public Library (SPL) database. Click on the red POST REPLY to post your pdf of your assignment.
In the posting provide:
a) A one paragraph description of your SQL data search project which cna be repeated in the attached pdf file
b) Add a PDF that documents the work you did. The PDF should have your name in the title
c) Add the results of your query as a CSV file(s).
----------------------------------
The PDF should include the following:
. A one-paragraph description of your query search
. Concept description
. MySQL Query
. The data and data results
. Discussion/Analysis of results
----------------------------------
WHAT SHOULD THE QUERY BE ABOUT?
. The query can be about a topic of interest to you that might be represented in the library database
. The query can explore the structure of how the database is organized, how things are classified
. The query can reveal anomalies, errors, outliers in either the data or how the data is organized, or how things are classified
. Consider that this is a warm-up exercise to produce data to be used to create a 3D visualization.
The 3D visualization project will need 5 columns of values:
1) A value for horizontal position
2) A value for vertical position
3) A value for depth position
4) A value for scaling or color for each cell in the visualization
5) Possibly a string value for labeling each indivisual cell
----------------------------------
SOME PREVIOUS STUDENT EXAMPLES
. PREDICTION: https://vislab.mat.ucsb.edu/2020/p1/Gua ... rName.html
. ITEMNUMBER: https://vislab.mat.ucsb.edu/2019/p1/Jia ... index.html
. PROBLEMS WITH DATA: https://vislab.mat.ucsb.edu/2019/p1/Ale ... eport.html
. CORRELATION: https://vislab.mat.ucsb.edu/2022/p1/Lij ... Cheng.html
. INTEREST IN A TOPIC: https://vislab.mat.ucsb.edu/2021/p1/Lar ... hetty.html
REVIEW OTHER TOPICS in previous first assignements: https://vislab.mat.ucsb.edu/courses.html
Additional examples from the 2022 Fall M265 class: viewforum.php?f=87
----------------------------------
EVALUATION CRITERIA
. The topic should be relevant or interesting. Describe why.
. The query should result in extensive data saved as a csv file.
. Consider the whole database, going beyond the Dewey numeric classification
----------------------------------
SCHEDULE
January 10, 2023, Tues - Course Overview & Introduction to MySQL
--
January 12, 2023, Thurs - Review & Test MySQL examples
https://www.mat.ucsb.edu/~g.legrady/aca ... wMysql.pdf (Examples for each of the SPL metadata)
https://www.mat.ucsb.edu/~g.legrady/aca ... MySQL2.pdf
https://www.mat.ucsb.edu/~g.legrady/aca ... tudent.pdf
--
January 17, 2023, Tues - Review of the MySQL Assignment
--
January 19, 2023, Thurs - Further examples of previous project documentations and In Lab work production
--
January 24, 2023, Thurs - In Class result presentations, Introduction to Processing
----------------------------------
FURTHER DISCUSSION: Data Exploration & Knowledge Discovery through a large multivariate dataset
The first assignment is to explore a large database consisting of multivariate data with the intent to discover and extract patterns with MySQL that may reveal something of interest. SQL is the standardized language used to access the database: https://www.mysqltutorial.org/what-is-mysql/
We have access to a unique database consisting of checkouts of books, cds, dvd's documented by the hour since January 1, 2006 that represents the aggregated cultural interests of downtown Seattle, but also of the larger national interests. The database currently consists of over 101 million checkouts and returns.
----------------------------------
THE CONCEPT: In any database, there lies hidden knowledge. What does a database contain, and what can MySQL queries reveal? Your first assignment is to find something of interest based on your own interests and skillsets. Here are some options:
1) Patterns, Probability & Prediction
. What is the performance of a topic(s), a title(s), media over time, or by volume, or by trends?
. What can be predicted based on previous performance?
. Can it be predicted how a sequence of data change over time?
2) The Database Organizational Structure
. What anomalies, errors, outliers, illogical classification methods, etc. may be revealed within the organizational and classification of how items are encoded
(All databases have outliers, anomalies, errors in the system as its impossible to precisely classify all things within a structured form)
3) Additional Irregularities in the Data Situation
. 2020 has had an unprecedented impact on the database as the library was closed for over 5 months between March to September. Nonetheless items were circulating.
. Electronic books have been in the collection since 2009 but these are not recorded in the database we receive. Nonetheless they can be reviewed through other means:
4) Data Analytics Query Methods
. Explore statistical methods or algorithms to retrieve or process data
. Are there any machine-learning opportunities in analyzing the data?
----------------------------------
The database consists of multivariate data. For each checkout there exists the following metadata:
Ordinal (In a numeric sequence)
ID: Assigned by the database to keep track of each entry
ItemNumber: Assigned by the library when an object enters the system
Dewey Classification (Dewey numeric) The item's dewey classification if it is recorded as a Dewey (non-fiction) item
Interval Scale (Time-Stamp)
Check-out/check-in in minutes, hour, day, month, year
Categorical (Not necessarily numerically orderable)
BibNumber: Each title has a specific number, copies of titles all have same number. Defined by the LIbrary of Congress
Barcode: Each item has a unique number on RFID sticker
CallNumber: by which to locate items on shelves - Ordinal if Dewey, otherwise categorical. Multiple copies of same item may share same call number but have different barcodes and itemNumber
CollCode: What the item is and where its located: https://data.seattle.gov/Community/Libr ... /6vkj-f5xf
Semantic (Text-based)
Title: Each Item has a title
ItemType: books, cds, dvds, music sheets, etc.
Subjects: Keywords (arbitrary labeling). These are located in a separate database:
----------------------------------
The library uses the Dewey Decimal system by which to organize non-fiction item but the majority of the items in the library collection do not have Dewey classification labels. For instance, music, movies, seem to be distributed in both Dewey and non-Dewey ways of classification. The most popular Dewey tend to be comic books, cookbooks, health, travel books, etc. A daily insight to the Dewey performance can be tracked at: http://128.111.26.109/parsing/index.php
---------------------------------
Specific labeling of your Documents
Please make sure to label your documents like csv files by the name of your project, or your name so we can identify where they come from
This is the 1st of 3 assignments to be realized in the MAT 259 Data Visualization course: https://www.mat.ucsb.edu/~g.legrady/aca ... 3w259.html
----------------------------------
PROJECT ASSIGNMENT: The assignment is to come up with an interesting MySQL query with results exploring the Seattle Public Library (SPL) database. Click on the red POST REPLY to post your pdf of your assignment.
In the posting provide:
a) A one paragraph description of your SQL data search project which cna be repeated in the attached pdf file
b) Add a PDF that documents the work you did. The PDF should have your name in the title
c) Add the results of your query as a CSV file(s).
----------------------------------
The PDF should include the following:
. A one-paragraph description of your query search
. Concept description
. MySQL Query
. The data and data results
. Discussion/Analysis of results
----------------------------------
WHAT SHOULD THE QUERY BE ABOUT?
. The query can be about a topic of interest to you that might be represented in the library database
. The query can explore the structure of how the database is organized, how things are classified
. The query can reveal anomalies, errors, outliers in either the data or how the data is organized, or how things are classified
. Consider that this is a warm-up exercise to produce data to be used to create a 3D visualization.
The 3D visualization project will need 5 columns of values:
1) A value for horizontal position
2) A value for vertical position
3) A value for depth position
4) A value for scaling or color for each cell in the visualization
5) Possibly a string value for labeling each indivisual cell
----------------------------------
SOME PREVIOUS STUDENT EXAMPLES
. PREDICTION: https://vislab.mat.ucsb.edu/2020/p1/Gua ... rName.html
. ITEMNUMBER: https://vislab.mat.ucsb.edu/2019/p1/Jia ... index.html
. PROBLEMS WITH DATA: https://vislab.mat.ucsb.edu/2019/p1/Ale ... eport.html
. CORRELATION: https://vislab.mat.ucsb.edu/2022/p1/Lij ... Cheng.html
. INTEREST IN A TOPIC: https://vislab.mat.ucsb.edu/2021/p1/Lar ... hetty.html
REVIEW OTHER TOPICS in previous first assignements: https://vislab.mat.ucsb.edu/courses.html
Additional examples from the 2022 Fall M265 class: viewforum.php?f=87
----------------------------------
EVALUATION CRITERIA
. The topic should be relevant or interesting. Describe why.
. The query should result in extensive data saved as a csv file.
. Consider the whole database, going beyond the Dewey numeric classification
----------------------------------
SCHEDULE
January 10, 2023, Tues - Course Overview & Introduction to MySQL
--
January 12, 2023, Thurs - Review & Test MySQL examples
https://www.mat.ucsb.edu/~g.legrady/aca ... wMysql.pdf (Examples for each of the SPL metadata)
https://www.mat.ucsb.edu/~g.legrady/aca ... MySQL2.pdf
https://www.mat.ucsb.edu/~g.legrady/aca ... tudent.pdf
--
January 17, 2023, Tues - Review of the MySQL Assignment
--
January 19, 2023, Thurs - Further examples of previous project documentations and In Lab work production
--
January 24, 2023, Thurs - In Class result presentations, Introduction to Processing
----------------------------------
FURTHER DISCUSSION: Data Exploration & Knowledge Discovery through a large multivariate dataset
The first assignment is to explore a large database consisting of multivariate data with the intent to discover and extract patterns with MySQL that may reveal something of interest. SQL is the standardized language used to access the database: https://www.mysqltutorial.org/what-is-mysql/
We have access to a unique database consisting of checkouts of books, cds, dvd's documented by the hour since January 1, 2006 that represents the aggregated cultural interests of downtown Seattle, but also of the larger national interests. The database currently consists of over 101 million checkouts and returns.
----------------------------------
THE CONCEPT: In any database, there lies hidden knowledge. What does a database contain, and what can MySQL queries reveal? Your first assignment is to find something of interest based on your own interests and skillsets. Here are some options:
1) Patterns, Probability & Prediction
. What is the performance of a topic(s), a title(s), media over time, or by volume, or by trends?
. What can be predicted based on previous performance?
. Can it be predicted how a sequence of data change over time?
2) The Database Organizational Structure
. What anomalies, errors, outliers, illogical classification methods, etc. may be revealed within the organizational and classification of how items are encoded
(All databases have outliers, anomalies, errors in the system as its impossible to precisely classify all things within a structured form)
3) Additional Irregularities in the Data Situation
. 2020 has had an unprecedented impact on the database as the library was closed for over 5 months between March to September. Nonetheless items were circulating.
. Electronic books have been in the collection since 2009 but these are not recorded in the database we receive. Nonetheless they can be reviewed through other means:
4) Data Analytics Query Methods
. Explore statistical methods or algorithms to retrieve or process data
. Are there any machine-learning opportunities in analyzing the data?
----------------------------------
The database consists of multivariate data. For each checkout there exists the following metadata:
Ordinal (In a numeric sequence)
ID: Assigned by the database to keep track of each entry
ItemNumber: Assigned by the library when an object enters the system
Dewey Classification (Dewey numeric) The item's dewey classification if it is recorded as a Dewey (non-fiction) item
Interval Scale (Time-Stamp)
Check-out/check-in in minutes, hour, day, month, year
Categorical (Not necessarily numerically orderable)
BibNumber: Each title has a specific number, copies of titles all have same number. Defined by the LIbrary of Congress
Barcode: Each item has a unique number on RFID sticker
CallNumber: by which to locate items on shelves - Ordinal if Dewey, otherwise categorical. Multiple copies of same item may share same call number but have different barcodes and itemNumber
CollCode: What the item is and where its located: https://data.seattle.gov/Community/Libr ... /6vkj-f5xf
Semantic (Text-based)
Title: Each Item has a title
ItemType: books, cds, dvds, music sheets, etc.
Subjects: Keywords (arbitrary labeling). These are located in a separate database:
----------------------------------
The library uses the Dewey Decimal system by which to organize non-fiction item but the majority of the items in the library collection do not have Dewey classification labels. For instance, music, movies, seem to be distributed in both Dewey and non-Dewey ways of classification. The most popular Dewey tend to be comic books, cookbooks, health, travel books, etc. A daily insight to the Dewey performance can be tracked at: http://128.111.26.109/parsing/index.php
---------------------------------
Specific labeling of your Documents
Please make sure to label your documents like csv files by the name of your project, or your name so we can identify where they come from