CS 6665: Data Mining

Spring 2021, 10:30 am to 11:45 am on TR via web broadcast

The information described here has not been finalized yet. This page will be updated frequently.

Course Descriptions

Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.

Prerequisites

  • A solid programming skill (Python is preferred)
  • Basic probability and statistics
  • Basic linear algebra

Course Material

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.

Grading

  • Homework (30%)
    • Five programming assignments
  • Course Project (40%)
    • Students are required to participate in one Kaggle competition.
    • The project will be evaluated based on the technical soundness, presentation, and final report.
  • Midterm Exam (30%)

  • Class Attendance
    • Class attendance is not mandatory but recommended.

Course Topics

  1. Data mining overview
  2. MapReduce and Spark
  3. Frequent itemset mining
  4. Finding similar items
  5. Clustering
  6. Mining data stream
  7. Dimensionality reduction
  8. Recommender systems
  9. Computational advertising
  10. Pagerank
  11. Machine learning
  12. Anomaly detection