CS 4650: Natural Language Processing
Georgia Tech (Spring 2021)
Course Information
This undergraduate-level course provides an introduction to modern natural language processing using machine learning approaches. Content includes linguistics fundamentals (syntax, semantics, distributional properties of language), machine learning models (classifiers, sequence taggers, deep learning models), key algorithms for inference, and applications to a range of problems. In-person attendance is not required for the course.
- Class Meets (Eastern Time)
- Mondays and Wednesdays, 3:30-4:45pm, Bluejeans
- Piazza (discussion, announcements, QA)
- piazza.com/gatech/spring2021/cs4650
- Canvas (quiz, etc.)
- gatech.instructure.com/courses/175774
- Gradescope (homework submission, grading)
- gradescope.com/courses/222248
- Online Office Hours (Eastern Time)
- Jingfeng Yang: Thursday 10:00-11:00pm
- Kaige Xie: Wednesday 6:00-7:00pm
- Sarmishta Velury: Monday 6:00-7:00pm
Prerequisites
While natural language processing is super cool, it requires usage of many modern machine learning algorithms and involves a lot of math and programming. To be successful in the class, on the math side, you should feel comfortable with probability, linear algebra, and calculus. For example, there will be partial derivatives and multivariable chain rule in some of the lectures. If you are not yet familiar with calculus, the best course of action would be taking Math 2550 or Math 2551 or Math 2561 first, then coming back in a later semester to take 4650. On the programming side, assignments will be in Python; you should understand basic computer science concepts (e.g., recursion), data structures (e.g., trees, graphs), and key algorithms (e.g., search, sorting, etc.). The official prerequisite for CS 4650 is CS 3510/3511, “Design and Analysis of Algorithms.” Ideally, you also have taken CS 3600, the intro-level “Machine Learning” class.
Schedule
Tentative Schedule: https://docs.google.com/spreadsheets/d/1zUnHII9tZ8STB8jknxSJuJL-3pILO8sV_JtN1fbRiIM/edit?usp=sharing
Subject to change as the term progresses.
Date | Topic | Optional Reading |
Jan 14 |
First day of Class. No Class
Background Test Out, Background Test Template, Due on Jan 21 |
|
Jan 18 |
No Class
MLK National Holiday |
|
Jan 20 |
Course Overview
Slides |
|
Jan 25 |
Text Classification (Naive Bayes)
Slides, Notes |
|
Jan 27 |
Logistic Regression, Perceptron, SVM
Slides, Notes, HW1 Out, HW1 Template, Due on Feb 12th 11:59pm |
|
Feb 1 |
Multiclass Classification
Slides, Notes |
|
Feb 3 |
Neural Networks I (Feedforward)
Slides |
|
Feb 8 |
Neural Networks II (Backpropagation)
Slides (updated), Notes |
|
Feb 10 |
Sequence Models I (HMM/Viterbi)
Slides |
|
Feb 15 |
PyTorch Tutorial
Slides |
|
Feb 17 |
Sequence Models II (CRF)
Slides, HW2 Out, HW2 Template, Due on March 2nd 11:59pm |
|
Feb 22 |
Word Embeddings
Slides |
|
Feb 24 |
Word Embeddings (cont')
Slides |
|
Mar 1 |
Recurrent Neural Networks
Slides, Notes |
|
Mar 3 |
Convolutional Neural Networks
Slides, HW3 Out, HW3 Template, HW3 Programming, Due on March 26th 11:59pm |
|
Mar 8 |
Neural CRFs
Slides |
|
Mar 10 |
Course Project or HW5
Slides |
|
Mar 15 |
Statistical Machine Translation
Slides |
|
Mar 17 |
No Class - Joint Office Hour
|
|
Mar 22 |
Statistical Machine Translation (cont')
|
|
Mar 24 | No Class - Wellness Day | |
Mar 29 |
Sequence-to-Sequence Model
Slides |
|
Mar 31 |
Copy/Pointer Network and Transformer
Slides, HW4 Out, HW4 Template, Due on April 14th 11:59pm |
|
Apr 5 |
Pretraining Language Models
Slides |
|
Apr 7 |
Question Answering
Slides |
|
Apr 12 | Question Answering (cont') |
|
Apr 14 |
Information Extraction
Slides, HW5 Out, HW5 QA, Due on May 1st 11:59pm |
|
Grading
- 5% Background Test (individual)
This background test is designed to help you determine whether you have enough math and programming background to succeed in this class. - 55% Homework Assignments (individual)
- Homework 1: 15% (written + 1st programming)
- Homework 2: 10% (written)
- Homework 3: 15% (written + 2nd programming)
- Homework 4: 15% (written + 3rd programming)
- 20% Final Project (group of 1-3) or the 4th programming homework (individual)
The final project is an open-ended assignment, with the goal of gaining experience applying the techniques presented in class to real-world datasets. Students should work in groups of 1-3 (groups of 2-3 preferred, 1 is possible). It is a good idea to discuss your planned project with the instructor to get feedback. The final project report should be 4 pages. The report should describe the problem you are solving, what data is being used, the proposed technique you are applying in addition to what baseline is used to compare against.
Alternatively, you may choose to complete the 4th programming homework individually, instead of the group final project. - 20% Participation
You will receive credit for engaging in class discussion, asking and answering questions related to the homework on Piazza discussion board.
Policies
Late Policy:
Student can at most be late for 2 homework, each for 3 days. Each late day extends the deadline by 24 hours. Using late days will not affect your grade. However, homework submitted late after all late days have been used will receive no credit. Please email your homework to the instructor in case there are any technical issues with submission. No late submission for the final project and quizzes will be accepted.
No late penalties for medical reasons or emergencies. Please see GT Catalog for rules about contacting the office of the Dean of Students.
FAQs
-
The class is full. Can I still get in?
Sorry. The course admins in CoC control this process. Please talk to them. -
I am graduating this Fall and I need this class to complete my degree requirements. What should I do?
Talk to the advisor or graduate coordinator for your academic program. They are keeping track of your degree requirements and will work with you if you need a specific course. -
I have a question. What is the best way to reach the course staff?
Registered students - your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email the instructor.
Textbooks
- (J+M) Jurafsky and Martin, Speech and Language Processing, 3rd edition (Dec 2020 draft)
- (E) Jacob Eisenstein, Natural Language Processing (2018)