CS 5665: Introduction to Data Science

Fall 2021, 1:30 pm to 2:45 pm on TR in Old Main 115 and via WebBroadcast

Course Descriptions

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data ^[link]. In recent years, deep learning approaches have obtained very high performance on various data analysis tasks. This course focuses on introducing the basic deep learning approaches. The goal of this course is for students to learn how to use deep learning for solving real-world data analysis problems, especially in the fields of computer vision and natural language processing.

Topics include: Linear Regression, Logistic Regression, Feed-forward Neural Network, Convolutional Neural Network, Recurrent Neural Network, Transformers.

Prerequisites

A solid Python programming skill
- All class assignments will be in Python.
Basic probability and statistics
- Understand basics of probabilities, gaussian distributions, mean, standard deviation, etc.
Basic calculus, linear algebra
- Be comfortable taking derivatives and understanding matrix/vector notation and operations. (e.g., matrix multiplication).

Course Material

Aston Zhang, Zachary C. Lipton, Mu Li and Alexander J. Smola. (2020). Dive into Deep Learning. Available Online.
Ian Goodfellow, Yoshua Bengio and Aaron Courville. (2016). Deep Learning. Available Online.

The following textbooks/websites are useful as additional reference:

Eli Stevens, Luca Antiga, Thomas Viehmann. (2020). Deep Learning with PyTorch. Available Online.
Michael Nielsen. Neural Networks and Deep Learning. Available Online.
Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for machine learning. Available Online.
- Chapters 5, 6 7 are useful to understand vector calculus and continuous optimization

Grading

Homework (50%)
- Six programming assignments
Course Project (30%)
- Students do course projects as groups
Final Exam (20%)
Bonus

Attendance

Attendance is not mandatory but encouraged

Schedule

Date	Topic
Aug 31	Introduction
Sep 2	Introduction
Sep 7	Linear Regression
Sep 9	Linear Regression
Sep 14	Linear Regression
Sep 16	Linear Regression
Sep 21	Bias and Variance
Sep 23	Perceptron
Sep 28	Logistic Regression
Sep 30	Logistic Regression
Oct 5	Multiclass Classification
Oct 7	Multilayer Neural Network
Oct 12	Multilayer Neural Network
Oct 14	Deep Learning Package
Oct 19	Convolutional Neural Network-I
Oct 21	Convolutional Neural Network-II
Oct 26	Bag of Tricks-I
Oct 28	Bag of Tricks-II
Nov 2	Auto-encoder
Nov 4	Text Data Processing and Word Embeddings
Nov 9	Recurrent Neural Networks-I
Nov 11	Recurrent Neural Networks-II
Nov 16	LSTM and GRU
Nov 18	Machine Translation
Nov 23	Canceled
Nov 30	Project Presentation
Dec 2	Project Presentation
Dec 7	Project Presentation
Dec 9	Final Review

Shuhan Yuan