CS 6665: Data Mining
Spring 2022, 10:30 am to 11:45 am on TR, Huntsman Hall 160
The information described here has not been finalized yet. This page will be updated frequently.
Course Descriptions
Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.
Prerequisites
- A solid programming skill (Python is preferred)
- Basic probability and statistics
- Basic linear algebra
Course Material
- [MMDS] Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.
- [MML] Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for Machine Learning. Available Online.
Class Schedule
Date | Topic | Reading |
---|---|---|
Jan 11 | Introduction to Data Mining | MMDS CH.1 |
Jan 13 | Map-Reduce | MMDS CH.2 |
Jan 18 | Matrix Multiplication by MapReduce (Optional) | MMDS CH.2 |
Jan 20 | Spark | MMDS CH.2 |
Jan 25 | Frequent Itemset Mining | MMDS CH.6.1-6.3; CH.6.4 (Optional) |
Jan 27 | Locality-Sensitive Hashing | MMDS CH.3.1-3.4 |
Feb 1 | Locality-Sensitive Hashing | MMDS CH.3.5-3.6 (Optional) |
Feb 3 | Clustering | MMDS CH.7.1-7.3 |
Feb 8 | Hierarchical clustering and K-means | MMDS CH.7.1-7.3 |
Feb 10 | BRF and CURE | MMDS CH.7.3-7.4 |
Feb 15 | EM algorithm (Optional) | MML CH.11.1-11.3 |
Feb 17 | Gaussian Mixture Models (Optional) | MML CH.11.1-11.3 |
Feb 22 | k-nn and Naive Bayes | MMDS CH.12.1,12.4 |
Feb 24 | k-nn and Naive Bayes | MMDS CH.12.1,12.4 |
Mar 1 | SVM | MML CH.12.1,12.2 MMDS CH.12.3 |
Mar 3 | SVM | MML CH.12.1,12.2 MMDS CH.12.3 |
Mar 8 | Spring Break | |
Mar 10 | Spring Break | |
Mar 15 | Decision Tree | MMDS CH.12.5 |
Mar 17 | PageRank | MMDS CH5.1-5.2 |
Mar 22 | Midterm | |
Mar 24 | Dimensionality Reduction | MMDS CH.11.3 |
Mar 29 | Recommender Systems | MMDS CH.9.1-9.3 |
Mar 31 | Mining Data Streams | MMDS CH4.1-4.3 |
April 5 | Mining Data Streams | MMDS CH4.4-4.7 |
April 7 | Mining Data Streams | MMDS CH4.4-4.7 |
April 12 | Trustworthy AI | |
April 14 | Course Project Presentation | |
April 19 | Course Project Presentation | |
April 21 | Course Project Presentation | |
April 26 | Canceled |
Grading
- Homework (35%)
- Six programming assignments
- Course Project (35%)
- Students are required to participate in one Kaggle competition.
- The project will be evaluated based on the technical soundness, presentation, and final report.
Midterm Exam (30%)
- Class Attendance
- Class attendance is not mandatory but recommended.
Course Topics
- Data mining overview
- MapReduce and Spark
- Frequent itemset mining
- Finding similar items
- Clustering
- Classification
- Mining data stream
- Dimensionality reduction
- Recommender systems
- Computational advertising
- Pagerank
- Anomaly detection