CS 6665: Data Mining

Spring 2020, 1:30 pm to 2:45 pm on TR in GEOL 302

Course Descriptions

Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.

Prerequisites

  • A solid programming skill (Python is preferred)
  • Basic probability and statistics
  • Basic linear algebra

Course Material

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.

Class Schedule

Date Topic Reading
Jan 6 Introduction to Data Mining MMDS CH.1
Jan 9 Map-Reduce MMDS CH.2
Jan 14 Introduction to Spark MMDS CH.2
Jan 16 Frequent Itemsets Mining MMDS CH.6.1-6.4
Jan 21 Locality-Sensitive Hashing MMDS CH.3.1-3.4
Jan 23 Locality-Sensitive Hashing MMDS CH.3.5-3.6
Jan 28 Clustering MMDS CH.7.1-7.4
Jan 30 Clustering MMDS CH.7.1-7.4
Feb 4 KNN & Naive Bayes MMDS CH.12.1,12.4
Feb 6 Decision Tree MMDS CH.12.5
Feb 11 Logistic Regression MMDS CH.12.2
Feb 13 Logistic Regression MMDS CH.12.3
Feb 18 SVM MMDS CH.12.3
Feb 20 Mining Data Streams MMDS CH.4.1-4.3
Feb 25 Mining Data Streams MMDS CH.4.4-4.7
Feb 27 Course Project Proposal Presentation  
Mar 3 Spring Break  
Mar 5 Spring Break  
Mar 10 PageRank MMDS CH.5.1-5.2
Mar 12 Dimensionality Reduction MMDS CH.11.1-11.3
Mar 17 Cancelled  
Mar 19 Recommender Systems MMDS CH.9.1-9.3
Mar 24 Recommender Systems MMDS CH.9.4-9.5
Mar 26 Deep Learning MMDS CH.13
Mar 31 Deep Learning MMDS CH.13
Apr 2 Deep Learning MMDS CH.13
Apr 7 Course Project Presentation  
Apr 9 Course Project Presentation  
Apr 14 Course Project Presentation  

Grading

  • Homework (30%)
    • Six assignments involving problem-solving and programming
  • Team Project (30%)
    • Students are required to participate in one Kaggle competition as a team (up to 3 students).
    • The team project will be evaluated based on the technical soundness, presentation, and final report.
  • Final Exam (40%)

Course Topics

  1. Data mining overview
  2. MapReduce and Spark
  3. Frequent itemset mining
  4. Finding similar items
  5. Clustering
  6. Mining data stream
  7. Dimensionality reduction
  8. Recommender systems
  9. Computational advertising
  10. Pagerank
  11. Machine learning
  12. Anomaly detection
  13. Deep learning