NLP: CS-4650/7650
MW 3:00-4:15pm, Kendeda 152
Course Information
This course gives an overview of modern data-driven techniques for natural language processing. The course moves from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phenomena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.
Slides, materials, and projects information for this iteration of NLP courses are borrowed from Jacob Eisenstein, Yulia Tsvetkov and Robert Frederking at CMU, Dan Jurafsky at Stanford, David Bamman at UC Berkeley, Noah Smith at UW, Kai-Wei Chang at UCLA.
- Class Meets
- Mondays and Wednesdays, 3:00-4:15pm; Kendeda Building 152
- Piazza
- piazza.com/gatech/spring2020/cs7650cs4650
- Staff Mailing List
- cs4650-7650-s20-staff@googlegroups.com
- Office Hours
- Ian Stewart: 8-10pm ET on Tuesdays
- Jiaao Chen: 2-4pm ET on Thursdays
- Nihal Singh: 9-11am ET on Fridays
- Jingfeng Yang: 10pm-11:59pm ET on Mondays
- Please check the piazza/canvas for the BlueJeans Link
- Online Instruction FAQs
Schedule
Note: tentative schedule is subject to change.
Date | Topic | Optional Reading |
W1: Jan 6 |
Introduction to NLP
Slides |
|
W1: Jan 8 |
Text Classification
Slides, HW1 Out |
|
W2: Jan 13 |
Neural Networks for Text Classification
Slides |
|
W2: Jan 15 |
Language Modeling I
Slides, HW2 Out |
|
W3: Jan 20 | MLK Day: No class | |
W3: Jan 22 |
Course Project, Pytorch Tutorial
Slides, Pytorch Slides, DL Slides (optional) |
|
W4: Jan 27 |
Language Modeling II
Slides |
|
W4: Jan 29 |
Vector Semantics
Slides |
|
W5: Feb 3 |
Word Embedding
Slides, HW3 Out |
|
W5: Feb 5 |
Sequence Labeling: POS & HMM
Slides |
|
W6: Feb 10 | No Class : AAAI | |
W6: Feb 12 |
Sequence Labeling: Viterbi & Forward Alg
Slides |
|
W7: Feb 17 |
Context Free Grammar
Slides |
|
W7: Feb 19 |
Constituency Parsing
Slides |
|
W8: Feb 24 |
Midterm Review
Slides |
|
W8: Feb 26 | Midterm | |
W9: Mar 2 |
Dependency Parsing Syntax
Slides |
|
W9: Mar 4 |
Dependency Parsing (by Yuval Pinter)
Slides, Upcoming Project Deadline Info |
|
W10: Mar 9 |
Project Feedback
Sign-up |
|
W10: Mar 11 |
Computational Ethics
Slide |
|
W11: Mar 16 | Spring Break | |
W11: Mar 18 | Spring Break | |
W12: Mar 23 | Online Instruction Testing / No Class | |
W12: Mar 25 | Online Instruction Testing / No Class | |
W13: Mar 30 |
Question Answering
Slide |
|
W13: Apr 1 |
Information Extraction
Slide |
|
W14: Apr 6 |
Conversational Agents
Slide |
|
W14: Apr 8 |
Machine Translation I
Slide |
|
W15: Apr 13 |
Machine Translation II
Slide |
|
W15: Apr 15 |
Generation
Slide |
|
W16: Apr 20 |
Computational Social Science
Slide |
Grading
- 45% Homework Assignments
- Homework 1: 6%
- Homework 2: 13%
- Homework 3: 13%
- Homework 4: 13%
- 15% Midterm Exam
- No make-up exam unless under emergency situation
- 30 + (2)% Course Project
- Project proposal (2 pages): 5%
- Midway report (3 pages): 10%
- Final report (7 pages): 15%
- Video Presentation (5 min): Bonus 2%
- 10% Online Instruction
- Awarded to every student enrolled
Policies
Late Policies:
Student will have a total of four late days to use when turning in homework assignments; each late day extends the deadline by 24 hours. There are no restrictions on how the late days can be used (e.g., all 4 could be used on one homework). Using late days will not affect your grade. However, homework submitted late after all late days have been used will receive no credit.
Class Policies:
Attendance will not be taken, but you are responsible for knowing what happens in every class. The instructor will try to post slides and notes online, and to share announcements, but there are no guarantees. So if you cannot attend class in person, make sure you check up with someone who was there.
Respect your classmates and your instructor by avoiding distractions. This means be there on time, turn off your cell phone, and save side conversations for after class.
Multiple studies have shown that using a laptop in class – even for taking notes – reduces students’ educational attainment. You are suggested to try pen and paper for a few weeks, and see if it helps you concentrate. Whatever technology you decide to use, it is your responsibility to ensure that it does not distract your classmates or the instructor.
Prerequisites
The official prerequisite for CS 4650 is CS 3510/3511, “Design and Analysis of Algorithms.” This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.
Furthermore, this course assumes:
- Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
- Background in basic probability, linear algebra, and calculus.
- Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.
People sometimes want to take the course without having all of these prerequisites. Frequent cases are:
- Junior CS students with strong programming skills but limited theoretical and mathematical background,
- Non-CS students with strong mathematical background but limited programming experience.
Students in the first group suffer in the exam and don’t understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.
Project
This semester-long project will involve one to three students and should focus on natural language processing – either focusing on core NLP methods or using NLP in support of an empirical research question. The project will be comprised of four components:
- Project proposal (March 13th) Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages, excluding references)
- Midterm report (April 3rd) By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (3 pages, excluding references)
- Video Presentation (April 23th) (Optional) At the end of the semester, teams can present their work in a video/demo. This should be submitted together with the final report.
- Final report (April 23rd) The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (7 pages, excluding references)
All reports should use the ACL 2020 style files for either LaTeX or Microsoft Word.
FAQs
-
The class is full. Can I still get in?
Sorry. The course admins in CoC control this process. Please talk to them.
-
I am graduating this Fall and I need this class to complete my degree requirements. What should I do?
Talk to the advisor or graduate coordinator for your academic program. They are keeping track of your degree requirements and will work with you if you need a specific course.
-
Can I audit this class or take it pass/fail?
No. Due to the large demand for this class, we will not be allowing audits or pass/fail. Letter grades only. This is to make sure students who want to take the class for credit can.
-
Can I simply sit in the class (no credits)?
In general, we welcome members of the Georgia Tech community (students, staff, and/or faculty) to sit-in. Out of courtesy, we would appreciate if you let us know beforehand (via email or in person). If the classroom is full, we would ask that you please allow registered students to attend.
-
I have a question. What is the best way to reach the course staff?
Registered students – your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email us at the class mailing list cs4650-7650-s20-staff@googlegroups.com