CS 6665: Data Mining

Spring 2023, 3:00 pm to 4:15 pm on TR, Old Main 117

The information described here has not been finalized yet. This page will be updated frequently.

Course Descriptions

Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.

Prerequisites

  • A solid programming skill (Python is preferred)
  • Basic probability and statistics
  • Basic linear algebra

Course Material

  • [MMDS] Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.
  • [MML] Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for Machine Learning. Available Online.

Class Schedule

DateTopicReading
Jan 10Introduction to Data MiningMMDS CH.1
Jan 12Map-ReduceMMDS CH.2
Jan 17Matrix Multiplication by MapReduce (Optional)MMDS CH.2
Jan 19SparkMMDS CH.2
Jan 24Frequent Itemset MiningMMDS CH.6.1-6.3; CH.6.4 (Optional)
Jan 26Locality-Sensitive HashingMMDS CH.3.1-3.4
Jan 31Locality-Sensitive HashingMMDS CH.3.5-3.6 (Optional)
Feb 2Locality-Sensitive HashingMMDS CH.7.1-7.3
Feb 7ClusteringMMDS CH.7.1-7.3
Feb 9Hierarchical clustering and K-meansMMDS CH.7.1-7.3
Feb 14BRF and CUREMMDS CH.7.3-7.4
Feb 16EM algorithm (Optional)MML CH.11.1-11.3
Feb 21Gaussian Mixture Models (Optional)MML CH.11.1-11.3
Feb 23k-nn and Naive BayesMMDS CH.12.1,12.4
Feb 28k-nn and Naive BayesMMDS CH.12.1,12.4
Mar 2Course Project Proposal Presentation 
Mar 7Spring Break 
Mar 9Spring Break 
Mar 14SVMMML CH.12.1,12.2 MMDS CH.12.3
Mar 16SVMMML CH.12.1,12.2 MMDS CH.12.3
Mar 21SVMMML CH.12.1,12.2 MMDS CH.12.3
Mar 23Decision TreeMMDS CH.12.5
Mar 28PageRankMMDS CH5.1-5.2
Mar 30ChatGPT 
April 4Dimensionality ReductionMMDS CH.11.3
April 6Recommender SystemsMMDS CH.9.1-9.3
April 11Trustworthy AI 
April 13Course Project Presentation 
April 18Course Project Presentation 
April 20Course Project Presentation 
April 25Q&A (no lecture) 
April 27Final Exam3:00 PM – 4:30 PM

Grading

  • Homework (35%)
    • Four programming assignments
  • Course Project (30%)
    • Students are required to participate in one Kaggle competition.
    • The project will be evaluated based on the technical soundness, presentation, and final report.
  • Final Exam (35%)

  • Class Attendance
    • Class attendance is not mandatory but recommended.

Course Topics

  1. Data mining overview
  2. MapReduce and Spark
  3. Frequent itemset mining
  4. Finding similar items
  5. Clustering
  6. Classification
  7. Mining data stream
  8. Dimensionality reduction
  9. Recommender systems
  10. Pagerank
  11. Anomaly detection