CS 6665: Data Mining
Spring 2020, 1:30 pm to 2:45 pm on TR in GEOL 302
Course Descriptions
Data mining aims at finding useful patterns in large data sets. This course will discuss data mining algorithms for analyzing large amounts of data, including association rules mining, finding similar items, clustering, data stream mining, recommender systems, how search engines rank pages, and recent techniques for large scale machine learning. The goal of this class is for students to understand basic and scale data mining algorithms.
Prerequisites
- A solid programming skill (Python is preferred)
- Basic probability and statistics
- Basic linear algebra
Course Material
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press. Available Online.
Class Schedule
Date | Topic | Reading |
---|---|---|
Jan 6 | Introduction to Data Mining | MMDS CH.1 |
Jan 9 | Map-Reduce | MMDS CH.2 |
Jan 14 | Introduction to Spark | MMDS CH.2 |
Jan 16 | Frequent Itemsets Mining | MMDS CH.6.1-6.4 |
Jan 21 | Locality-Sensitive Hashing | MMDS CH.3.1-3.4 |
Jan 23 | Locality-Sensitive Hashing | MMDS CH.3.5-3.6 |
Jan 28 | Clustering | MMDS CH.7.1-7.4 |
Jan 30 | Clustering | MMDS CH.7.1-7.4 |
Feb 4 | KNN & Naive Bayes | MMDS CH.12.1,12.4 |
Feb 6 | Decision Tree | MMDS CH.12.5 |
Feb 11 | Logistic Regression | MMDS CH.12.2 |
Feb 13 | Logistic Regression | MMDS CH.12.3 |
Feb 18 | SVM | MMDS CH.12.3 |
Feb 20 | Mining Data Streams | MMDS CH.4.1-4.3 |
Feb 25 | Mining Data Streams | MMDS CH.4.4-4.7 |
Feb 27 | Course Project Proposal Presentation | |
Mar 3 | Spring Break | |
Mar 5 | Spring Break | |
Mar 10 | PageRank | MMDS CH.5.1-5.2 |
Mar 12 | Dimensionality Reduction | MMDS CH.11.1-11.3 |
Mar 17 | Cancelled | |
Mar 19 | Recommender Systems | MMDS CH.9.1-9.3 |
Mar 24 | Recommender Systems | MMDS CH.9.4-9.5 |
Mar 26 | Deep Learning | MMDS CH.13 |
Mar 31 | Deep Learning | MMDS CH.13 |
Apr 2 | Deep Learning | MMDS CH.13 |
Apr 7 | Course Project Presentation | |
Apr 9 | Course Project Presentation | |
Apr 14 | Course Project Presentation |
Grading
- Homework (30%)
- Six assignments involving problem-solving and programming
- Team Project (30%)
- Students are required to participate in one Kaggle competition as a team (up to 3 students).
- The team project will be evaluated based on the technical soundness, presentation, and final report.
- Final Exam (40%)
Course Topics
- Data mining overview
- MapReduce and Spark
- Frequent itemset mining
- Finding similar items
- Clustering
- Mining data stream
- Dimensionality reduction
- Recommender systems
- Computational advertising
- Pagerank
- Machine learning
- Anomaly detection
- Deep learning