# CS 5665: Introduction to Data Science

Fall 2021,

## Course Descriptions

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data ^{[link]}. In recent years, deep learning approaches have obtained very high performance on various data analysis tasks. This course focuses on introducing the basic deep learning approaches. The goal of this course is for students to learn how to use deep learning for solving real-world data analysis problems, especially in the fields of computer vision and natural language processing.

**Topics include:** Linear Regression, Logistic Regression, Feed-forward Neural Network, Convolutional Neural Network, Recurrent Neural Network, Transformers.

## Prerequisites

- A solid Python programming skill
- All class assignments will be in Python.

- Basic probability and statistics
- Understand basics of probabilities, gaussian distributions, mean, standard deviation, etc.

- Basic calculus, linear algebra
- Be comfortable taking derivatives and understanding matrix/vector notation and operations. (e.g., matrix multiplication).

To prospective students whose major are not in computer science: A solid Python programming skill is required because both homework and course project are coding-based assignments. If you do not feel confortable with programming, I would not recommend you register for this course.

## Course Material

- Aston Zhang, Zachary C. Lipton, Mu Li and Alexander J. Smola. (2020). Dive into Deep Learning. Available Online.
- Ian Goodfellow, Yoshua Bengio and Aaron Courville. (2016). Deep Learning. Available Online.

The following textbooks/websites are useful as additional reference:

- Eli Stevens, Luca Antiga, Thomas Viehmann. (2020). Deep Learning with PyTorch. Available Online.
- Michael Nielsen. Neural Networks and Deep Learning. Available Online.
- Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. (2020). Mathematics for machine learning. Available Online.
- Chapters 5, 6 7 are useful to understand vector calculus and continuous optimization

## Grading

- Homework (50%)
- Five programming assignments

- Course Project (30%)
- Students do course projects as groups

- Final Exam (20%)
- Bonus

## Attendance

- Attendance is not mandatory but encouraged

## Tentative Schedule

Date | Topic |
---|---|

Aug 31 | Introduction |

Sep 2 | Introduction |

Sep 7 | Linear Algebra Recap |

Sep 9 | Linear Regression |

Sep 14 | Linear Regression |

Sep 16 | Linear Regression |

Sep 21 | Bias and Variance |

Sep 23 | Perceptron |

Sep 28 | Logistic Regression |

Sep 30 | Multiclass Classification |

Oct 5 | Multilayer Neural Network |

Oct 7 | Multilayer Neural Network |

Oct 12 | Deep Learning Packages |

Oct 14 | Convolutional Neural Network-I |

Oct 19 | Convolutional Neural Network-II |

Oct 21 | Bag of Tricks-I |

Oct 26 | Bag of Tricks-II |

Oct 28 | Autoencoder |

Nov 2 | Text Data Processing Basis |

Nov 4 | Word Embeddings |

Nov 9 | Recurrent Neural Networks-I |

Nov 11 | Recurrent Neural Networks-II |

Nov 16 | Neural Machine Translation |

Nov 18 | Transformer |

Nov 23 | Transformer |

Nov 30 | Project Presentation |

Dec 2 | Project Presentation |

Dec 7 | Final Review |

Dec 9 | Canceled |