CS 6665: Data Mining

Spring 2020, 1:30 pm to 2:45 pm on TR in GEOL 302

Course Descriptions

Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.

Prerequisites

  • A solid programming skill (Python is preferred)
  • Basic probability and statistics
  • Basic linear algebra

Course Material

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.

Class Schedule

DateTopicReading
Jan 6Introduction to Data MiningMMDS CH.1
Jan 9Map-ReduceMMDS CH.2
Jan 14Introduction to SparkMMDS CH.2
Jan 16Frequent Itemsets MiningMMDS CH.6.1-6.4
Jan 21Locality-Sensitive HashingMMDS CH.3.1-3.4
Jan 23Locality-Sensitive HashingMMDS CH.3.5-3.6
Jan 28ClusteringMMDS CH.7.1-7.4
Jan 30ClusteringMMDS CH.7.1-7.4
Feb 4KNN & Naive BayesMMDS CH.12.1,12.4
Feb 6Decision TreeMMDS CH.12.5
Feb 11Logistic RegressionMMDS CH.12.2
Feb 13Logistic RegressionMMDS CH.12.3
Feb 18SVMMMDS CH.12.3
Feb 20Mining Data StreamsMMDS CH.4.1-4.3
Feb 25Mining Data StreamsMMDS CH.4.4-4.7
Feb 27Course Project Proposal Presentation 
Mar 3Spring Break 
Mar 5Spring Break 
Mar 10PageRankMMDS CH.5.1-5.2
Mar 12Dimensionality ReductionMMDS CH.11.1-11.3
Mar 17Cancelled 
Mar 19Recommender SystemsMMDS CH.9.1-9.3
Mar 24Recommender SystemsMMDS CH.9.4-9.5
Mar 26Deep LearningMMDS CH.13
Mar 31Deep LearningMMDS CH.13
Apr 2Deep LearningMMDS CH.13
Apr 7Course Project Presentation 
Apr 9Course Project Presentation 
Apr 14Course Project Presentation 

Grading

  • Homework (30%)
    • Six assignments involving problem-solving and programming
  • Team Project (30%)
    • Students are required to participate in one Kaggle competition as a team (up to 3 students).
    • The team project will be evaluated based on the technical soundness, presentation, and final report.
  • Final Exam (40%)

Course Topics

  1. Data mining overview
  2. MapReduce and Spark
  3. Frequent itemset mining
  4. Finding similar items
  5. Clustering
  6. Mining data stream
  7. Dimensionality reduction
  8. Recommender systems
  9. Computational advertising
  10. Pagerank
  11. Machine learning
  12. Anomaly detection
  13. Deep learning