CS 5665: Introduction to Data Science

Fall 2022, 3:00 pm to 4:15 pm on TR in Old Main 326

Course Descriptions

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data [link]. In recent years, deep learning approaches have obtained very high performance on various data analysis tasks. This course focuses on introducing the basic deep learning approaches. The goal of this course is for students to learn how to use deep learning for solving real-world data analysis problems, especially in the fields of computer vision and natural language processing.

Topics include: Linear Regression, Logistic Regression, Feed-forward Neural Network, Convolutional Neural Network, Recurrent Neural Network.

Prerequisites

  • A solid Python programming skill
    • All class assignments will be in Python.
  • Basic probability and statistics
    • Understand basics of probabilities, gaussian distributions, mean, standard deviation, etc.
  • Basic calculus, linear algebra
    • Be comfortable taking derivatives and understanding matrix/vector notation and operations. (e.g., matrix multiplication).

Course Material

  • Aston Zhang, Zachary C. Lipton, Mu Li and Alexander J. Smola. (2020). Dive into Deep Learning. Available Online.
  • Ian Goodfellow, Yoshua Bengio and Aaron Courville. (2016). Deep Learning. Available Online.

The following textbooks/websites are useful as additional reference:

  • Eli Stevens, Luca Antiga, Thomas Viehmann. (2020). Deep Learning with PyTorch. Available Online.
  • Michael Nielsen. Neural Networks and Deep Learning. Available Online.
  • Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for machine learning. Available Online.
    • Chapters 5, 6 7 are useful to understand vector calculus and continuous optimization

Grading

  • Homework (50%)
    • Five programming assignments
  • Course Project (30%)
    • Students do course projects as groups
    • Online poster sessions will be hosted
  • Final Exam (20%)

Attendance

  • Attendance is not mandatory but encouraged

Schedule

DateTopic
Aug 30Introduction
Sep 1Introduction
Sep 6Supervised Learning
Sep 8Linear Regression
Sep 13Linear Regression
Sep 15Linear Regression
Sep 20Linear Regression
Sep 22Bias and Variance
Sep 27Regularization
Sep 29Perceptron
Oct 4Logistic Regression
Oct 6Logistic Regression
Oct 11Multiclass Classification
Oct 13Multilayer Neural Network
Oct 18Multilayer Neural Network
Oct 20Deep Learning Package
Oct 25Convolutional Neural Network-I
Oct 27Convolutional Neural Network-II
Nov 1Bag of Tricks-I
Nov 3Bag of Tricks-II
Nov 8Auto-encoder
Nov 10Text Data Processing and Word Embeddings
Nov 15Recurrent Neural Networks-I
Nov 17Recurrent Neural Networks-II
Nov 22Canceled
Nov 24Thanksgiving
Nov 29Project Presentation
Dec 1Project Presentation
Dec 6Final Review
Dec 8Q&A