CS 5665: Introduction to Data Science

Fall 2020, 1:30 pm to 2:45 pm on TR via WebBroadcast

The information described here has not been finalized yet. This page will be updated frequently.

Course Descriptions

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data [link]. In recent years, deep learning approaches have obtained very high performance on various data analysis tasks. This course focuses on introducing the basic deep learning approaches. The goal of this course is for students to learn how to use deep learning for solving real-world data analysis problems, especially in the fields of computer vision and natural language processing.

Topics include: Linear Regression, Logistic Regression, Feed-forward Neural Network, Convolutional Neural Network, Recurrent Neural Network, Transformers, Generative Adversarial Networks.

Prerequisites

  • A solid Python programming skill
    • All class assignments will be in Python.
  • Basic probability and statistics
    • Understand basics of probabilities, gaussian distributions, mean, standard deviation, etc.
  • Basic calculus, linear algebra
    • Be comfortable taking derivatives and understanding matrix/vector notation and operations. (e.g., matrix multiplication).

Course Material

  • Aston Zhang, Zachary C. Lipton, Mu Li and Alexander J. Smola. (2020). Dive into Deep Learning. Available Online.
  • Ian Goodfellow, Yoshua Bengio and Aaron Courville. (2016). Deep Learning. Available Online.

The following textbooks/websites are useful as additional reference:

  • Eli Stevens and Luca Antiga. (2020). Deep Learning with PyTorch. Available Online.
  • Michael Nielsen. Neural Networks and Deep Learning. Available Online.
  • Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for machine learning. Available Online.
    • Chapters 5, 6 7 are useful to understand vector calculus and continuous optimization

Grading

  • Homework (50%)
    • Six programming assignments
  • Course Project (30%)
    • Students do course projects solo
    • Online poster sessions will be hosted
  • Final Exam (20%)
  • Bonus

Attendance

  • Attendance is encouraged but not mandatory

Class Schedule

Date Topic Reading
Sep 1 Introduction  
Sep 3 Introduction  
Sep 8 Linear Algebra Recap d2l.ai Ch.2
Sep 10 Linear Regression d2l.ai Ch.3.1, 3.2
Sep 15 Linear Regression d2l.ai Ch.3.1, 3.2
Sep 17 Linear Regression d2l.ai Ch.3.1, 3.2
Sep 22 Bias and Variance  
Sep 24 Perceptron  
Sep 29 Logistic Regression  
Oct 1 Multiclass Classification d2l.ai Ch.3.4
Oct 6 Multilayer Neural Network d2l.ai CH.4.1,4.2
Oct 8 Multilayer Neural Network d2l.ai CH.4.7
Oct 13 Deep Learning Packages  
Oct 15 Convolutional Neural Network-I d2l.ai CH.6
Oct 20 Convolutional Neural Network-II d2l.ai CH.7
Oct 22 Bag of Tricks-I  
Oct 27 Bag of Tricks-II  
Oct 29 Autoencoder  
NOv 3 Text Data Processing Basis  
NOv 5 Word Embeddings  
NOv 10 Recurrent Neural Networks-I  
NOv 12 Recurrent Neural Networks-II  
NOv 17 Neural Machine Translation  
NOv 19 Transformer