The information described here has not been finalized yet. This page will be updated frequently.
Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.
- A solid programming skill (Python is preferred)
- Basic probability and statistics
- Basic linear algebra
- [MMDS] Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.
- [MML] Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for Machine Learning. Available Online.
|Jan 10||Introduction to Data Mining||MMDS CH.1|
|Jan 12||Map-Reduce||MMDS CH.2|
|Jan 17||Matrix Multiplication by MapReduce (Optional)||MMDS CH.2|
|Jan 19||Spark||MMDS CH.2|
|Jan 24||Frequent Itemset Mining||MMDS CH.6.1-6.3; CH.6.4 (Optional)|
|Jan 26||Locality-Sensitive Hashing||MMDS CH.3.1-3.4|
|Jan 31||Locality-Sensitive Hashing||MMDS CH.3.5-3.6 (Optional)|
|Feb 2||Locality-Sensitive Hashing||MMDS CH.7.1-7.3|
|Feb 7||Clustering||MMDS CH.7.1-7.3|
|Feb 9||Hierarchical clustering and K-means||MMDS CH.7.1-7.3|
|Feb 14||BRF and CURE||MMDS CH.7.3-7.4|
|Feb 16||EM algorithm (Optional)||MML CH.11.1-11.3|
|Feb 21||Gaussian Mixture Models (Optional)||MML CH.11.1-11.3|
|Feb 23||k-nn and Naive Bayes||MMDS CH.12.1,12.4|
|Feb 28||k-nn and Naive Bayes||MMDS CH.12.1,12.4|
|Mar 2||Course Project Proposal Presentation|
|Mar 7||Spring Break|
|Mar 9||Spring Break|
|Mar 14||SVM||MML CH.12.1,12.2 MMDS CH.12.3|
|Mar 16||SVM||MML CH.12.1,12.2 MMDS CH.12.3|
|Mar 21||SVM||MML CH.12.1,12.2 MMDS CH.12.3|
|Mar 23||Decision Tree||MMDS CH.12.5|
|Mar 28||PageRank||MMDS CH5.1-5.2|
|April 4||Dimensionality Reduction||MMDS CH.11.3|
|April 6||Recommender Systems||MMDS CH.9.1-9.3|
|April 11||Trustworthy AI|
|April 13||Course Project Presentation|
|April 18||Course Project Presentation|
|April 20||Course Project Presentation|
|April 25||Q&A (no lecture)|
|April 27||Final Exam||3:00 PM – 4:30 PM|
- Homework (35%)
- Four programming assignments
- Course Project (30%)
- Students are required to participate in one Kaggle competition.
- The project will be evaluated based on the technical soundness, presentation, and final report.
Final Exam (35%)
- Class Attendance
- Class attendance is not mandatory but recommended.
- Data mining overview
- MapReduce and Spark
- Frequent itemset mining
- Finding similar items
- Mining data stream
- Dimensionality reduction
- Recommender systems
- Anomaly detection