Schedule and Readings
The course schedule may be subject to change.
Dates and Deadlines
Three types of dates are listed on the schedule.
NYU deadlines Import dates from the NYU academic calendar
Assignment Released On Gradescope and/or GitHub
Assignment Due On Wednesdays at 9:30 AM unless stated otherwise
Reading Assignments
All readings are available online free of charge. Some of them may require you to be on campus wi-fi or VPN or to be logged into your NYU Google Drive account.
Research Papers
In this course, we will be reading and discussing many research papers on the topic of language understanding. Readings to be discussed each week are shown in the schedule below. You may read the readings at your own pace, though you are expected to be familiar with their contents when they come up. Your understanding of the readings will be assessed via multiple-choice reading quizzes released periodically on Gradescope.
Textbooks
Many of the readings, especially those assigned during the first half of the semester, will come from the following textbooks and tutorials.
Tutorial Lecture handouts by Sophie, or tutorials from AI blogs
SLP Speech and Language Processing, 3rd Edition Draft by Dan Jurafsky and James H. Martin
D2L Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola
Ling1 Linguistic Fundamentals for Natural Language Processing by Emily M. Bender
Ling2 Linguistic Fundamentals for Natural Language Processing II by Emily M. Bender and Alex Lascarides
EOL Essentials of Linguistics, 2nd Edition by Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi
Textbook readings are designed to teach you basic technical skills required to understand and participate in NLP research. They cover established techniques and practices that are known and used by most top-tier NLP researchers and practitioners.
Schedule
Week 1, Jan. 23/24
What Is Meaning?
We introduce the concept of meaning in natural language, taking inspiration from linguists, philosophers, and data scientists. We learn about the word2vec model of semantics and examine in what sense and to what extent it models the “meaning” of individual words.
- Lecture
- Lexical semantics, the distributional hypothesis, word embeddings, CBOW
- Slides, Zoom Recording
- Lab
- Linear analogies, visualization of embedding spaces
- Colab Notebook, Zoom Recording
- Reading
- Tutorial Week 1 Handout and Semantle, a word-guessing game based on word2vec embeddings
- SLP Chapters 5 and 6
- D2L Chapter 2 (skip Section 2.5), Section 4.1, and Sections 15.1–15.7
- Ling1 Chapter 2
- Ling2 Chapters 3 and 4
- EOL Sections 5.1–5.4 and 4 and 7.5
- SLP Chapters 5 and 6
- Dates
- HW 0 Released HW0 Due ASAP
Week 2, Jan. 30/31
Deep Learning
We learn how to optimize an arbitrary machine learning objective using the stochastic gradient descent algorithm (SGD) and its more popular variant, Adam. We also learn how automatic differentiation is implemented in the PyTorch software library.
- Lecture
- SGD, Adam, automatic differentiation, PyTorch, neural networks, multi-layer perceptrons
- Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
- Lab
- Sentiment classification using a multi-layer perceptron
- Colab Notebook, Zoom Recording
- Reading
- Tutorial Week 2 Handout (to be released), Olah’s (2015) tutorial on backpropagation, and Sanderson’s (2017) video series on deep learning (optional)
- SLP Chapter 7
- D2L Chapter 1, Section 2.5, Chapter 5, Chapter 12 (skip Sections 12.7–12.9), Section 16.1, and Section 19.1 (the rest of the chapter is optional)
- SLP Chapter 7
- Dates
- HW 1 Released Mon 1/29 EC 1 Released Mon 1/29 RQ 1 Released Fri 2/2
Week 3, Feb. 6/7
The Meaning of a Text
We extend word2vec’s distributional method to sequences of words, allowing us to obtain embeddings that represent sentences, paragraphs, and even entire documents. We learn about the BERT model and the Transformer encoder architecture that underlies it. We use the Turing test to evaluate BERT’s understanding of natural language.
- Lecture
- Transformers, transfer learning, foundation models, BERT
- Slides, Zoom Recording
- Lab
- Natural language inference with BERT
- Colab Notebook, Zoom Recording
- Reading
- Tutorial Alammar’s (2018a) tutorial on Transformers, and Alammar’s (2018b) tutorial on BERT
- SLP Chapter 10
- D2L Chapter 11 (skip Sections 11.2, 11.4, and 11.8), Sections 15.8–15.10, and Sections 16.4–16.7 (skip Section 16.5)
- Ling2 Chapter 11 and Chapter 13, #89–90
- Turing (1950), on the imitation game (Turing test)
- The GLUE (Wang et al., 2019a) and SuperGLUE (Wang et al., 2019b) benchmarks
- SLP Chapter 10
- Dates
- Add/Drop Deadline Sun 2/4 HW 1 Due Sat 2/10 EC 1 Due Sat 2/10
Week 4, Feb. 13/14
Passing the Turing Test?
BERT’s ability to identify relationships between sentences is impressive. How does BERT do it? We examine what strategies BERT uses to beat the NLI Turing test, and we analyze what information is represented in BERT’s hidden representations.
- Lecture
- Interpretability and model analysis, targeted challenge benchmarks, attention visualization, probing
- Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
- Lab
- Slides, Zoom Recording
- Reading
- Gurungan et al. (2018) and McCoy et al. (2019), on gaming the system for NLI
- Kocijan et al. (2022), The Defeat of the Winograd Schema Challenge
- Rogers et al. (2020), a survey on what we know about BERT
- Hao and Linzen (2023), on probing and causality
- Jain and Wallace (2019), Wiegreffe and Pinter (2019), and Belinkov (2022): critical takes on interpretability techniques and findings
- Kocijan et al. (2022), The Defeat of the Winograd Schema Challenge
- Dates
- HW 2 Released Mon 2/12 EC 2 Released Tues 2/13 RQ 2 Released Fri 2/16
Week 5, Feb. 20/21
Doing Research in NLP
As you begin exploring project ideas, we will learn the basic scientific methodology of NLP. We discuss how to formulate a research question and design an experiment to answer it. We also learn about the peer review and publishing process for conferences, workshops, and journals affiliated with the Association for Computational Linguistics (ACL).
- Lecture
- NLP subfields, experimental methodology, structure of a research paper, the peer review cycle, the ACL
- Slides, Zoom Recording
- Lab
- How to use HPC
- Slides, Zoom Recording from Last Year (Unfortunately, we forgot to record this lab!)
- Reading
- Tutorial Lectures from Sam Bowman (slides), Chris Potts (slides), and Graham Neubig (slides and video); Ribeiro’s (2022) tutorial on brainstorming
- ACL 2024 and ACL Rolling Review (ARR) call for papers
- Resnik and Lin (2010), Dror et al. (2018) and Ulmer et al. (2022), on NLP experimental methods
- Other papers to be added (examples of NLP project types)
- ACL 2024 and ACL Rolling Review (ARR) call for papers
- Dates
- HW 2 Due Sat 2/24 EC 2 Due Sat 2/24
Week 6, Feb. 27/28
Language Modeling and General Intelligence
Language modeling—the task of predicting what word comes after a string of text from a document—is one of the most fundamental techniques in NLP. Recent advances have shown that highly competent language models exhibit a number of general AI abilities, even if they are not trained on any objective other than word prediction. We explore the technique of prompting, which attempts to solve NLP tasks using the emergent abilities of large language models.
- Lecture
- Large language models, prompting and few-shot learning, scaling and inverse scaling
- Slides, Zoom Recording
- Lab
- Testing GPT-4’s color perception
- Zoom Recording
- Reading
- Tutorial Alammar’s (2019) tutorials on GPT-2 and GPT-3
- The GPT-2 paper (Radford et al., 2019), on language modeling as a general intelligence problem
- The GPT-3 paper (Brown et al., 2020) and Wei et al. (2022a), on prompting
- Srivastava et al. (2022) and Wei et al. (2022b), on scale and emergent abilities
- Lin et al. (2022), McKenzie et al. (2023), and Wei et al. (2023), on inverse scaling
- The GPT-2 paper (Radford et al., 2019), on language modeling as a general intelligence problem
- Dates
- RQ 3 Released Fri 3/1
Week 7, Mar. 5/6
Following Instructions and Being Helpful
We introduce language model alignment, a technique for turning language models into instruction-following assistants like ChatGPT. We learn about reinforcement learning from human feedback (RLHF), the most popular algorithm for alignment. We also explore the intellectual origins of alignment in the Effective Altruism movement.
- Lecture
- Alignment, RLHF, DPO, InstructGPT, AI safety
- Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
- Lab
- Mini-proposal requirements and peer review
- Slides, Zoom Recording
- Reading
- Tutorial Week 7 Handout (to be released)
- Miles (2014), Lewis-Kraus (2022), and Levitz (2022), on Effective Altruism and alignment
- Bommasani et al. (2021), motivating alignment
- Askell et al. (2021), on alignment and the HHH criteria
- The InstructGPT paper (Ouyang et al., 2022) on RLHF and Rafailov et al. (2023) on direct preference optimization (DPO)
- Miles (2014), Lewis-Kraus (2022), and Levitz (2022), on Effective Altruism and alignment
- Dates
- HW 3 Released Mon 3/5
Week 8, Mar. 12/13
What Is Language?
We learn about what language is and how to think about it objectively and scientifically. We critically examine popular assumptions about language, which are often driven by politics and prejudice rather than linguistics. We consider ways in which the techniques covered in this course do not apply equally to all languages.
- Lecture
- Descriptive vs. prescriptive grammar, languages vs. dialects, linguistic nationalism and prejudice, generative grammar, typology and low-resource NLP
- Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
- Lab
- Slides, Zoom Recording
- Reading
- Tutorial David R. Mortensen’s slides on low-resource NLP
- Ling1 Chapter 1
- EOL Chapters 1, 2, 6, and 14
- Hobsbawn (1996) on language and nationalism; Davis (2000), Introduction and Chapter 8, on the creation of Italy and the Italian language
- Ling1 Chapter 1
- Dates
- RQ 4 Released Fri 3/15
Spring Break, Mar. 18–22
No Class
Week 9, Mar. 26/27
Knowledge, Beliefs, and Memorization
What does a language model “know” or “believe”? We explore recent attempts to define these concepts and create Turing tests for them. We examine how Transformer language models come to acquire and memorize information, and survey attempts to improve the factuality of language model outputs.
- Lecture
- Factuality, beliefs, question answering, retrieval-augmented models, question answering, FactScore
- Slides, Zoom Recording
- Lab
- Zoom Recording
- Reading
- Petroni et al. (2019) and Min et al. (2023), on knowledge and factuality
- Hase et al. (2021), Arora et al. (2023), and Scherrer et al. (2023), on beliefs and values
- Roberts et al. (2020) and Dai et al. (2021): how Transformers store knowledge
- Lewis et al. (2020) and McCoy et al. (2021), on memorization
- Tian et al. (2023), Yu et al. (2023), Vu et al. (2023), and Du et al. (2023): improving factuality in LLMs
- Hase et al. (2021), Arora et al. (2023), and Scherrer et al. (2023), on beliefs and values
- Dates
- Project Mini-Proposal Due 3/30
Week 10, Apr. 2/3
Artificial vs. Human Intelligence
Humans acquire, generate, and process language in a very particular way. We explore ways in which humans and language models are remarkably similar in their language processing capabilities, yet also incredibly different. We review recent attempts to train language models in a way that mimics the developmental process of human language acquisition.
- Lecture
- Surprisal theory, generalization, inductive bias, poverty of the stimulus, language acquisition
- Slides, Zoom Recording
- Lab
- History of the Transformer architecture
- Slides, Zoom Recording
- Reading
- Wilcox et al. (2023), Oh and Schuler (2023), and Huang et al. (2023), on surprisal theory
- Hupkes et al. (2020), Kim and Linzen (2020), and Kim et al. (2022), on compositionality
- Mueller et al. (2022) and Mueller et al. (2023), on hierarchical generalization
- Linzen (2020), Yedetore et al. (2023) and Warstadt et al. (2023), on pre-training vs. human language acquisition
- Hupkes et al. (2020), Kim and Linzen (2020), and Kim et al. (2022), on compositionality
- Dates
- HW 3 Due
Week 11, Apr. 9/10
Logic and Reasoning
Can language models engage in complex, multi-step logical reasoning? We survey various Turing tests for reasoning skills, as well as ways to help language models reason better. Guest lecturer Will Merrill, an NYU PhD student in Data Science, will tell us about theoretical limitations imposed by the Transformer architecture on language model reasoning abilities.
- Lecture
- Circuit complexity, AC0 and TC0, expressive power of the Transformer
- Slides, Zoom Recording
- Lab
- Linguistic evaluation of language models
- Colab Notebook, Zoom Recording
- Reading
- Kojima et al. (2022), Saparov and He (2023), Saparov et al. (2023), and Press et al. (2023), on multi-hop reasoning
- Li et al. (2018) and Kim and Schuster (2023), on world models and entity state tracking
- Weiss et al. (2021), Merrill et al. (2022), and Merrill and Sabharwal (2023), on the computational expressive power of Transformers
- Yang et al. (2023) and Trinh et al. (2024): enhancing the reasoning capabilities of language models
- Li et al. (2018) and Kim and Schuster (2023), on world models and entity state tracking
- Dates
- RQ 5 Released Fri 4/12
Week 12, Apr. 16/17
Writing and Communication
Communication is key if you want to participate in the scientific community of NLP. As you work on your final paper drafts, we will learn how to communicate your research findings effectively in writing and in your presentations.
- Lecture
- Writing skills, how to give a talk
- Slides, Zoom Recording
- Lab
- Practice talks
- Zoom Recording, Example Talk Video (Jason Wei, on CoT prompting)
- Readings
- Tutorial Sam Bowman’s slides and Schwartz’s (2023) tutorial on writing
- ACL guidelines for formatting and citations
Week 13, Apr. 23/24
NLP, Technology, and Society
As NLP takes on an increasingly prominent role in modern life, concerns regarding the social impact of NLP are more pertinent than ever. As you finish up your projects, we highlight the potential for natural language technology to facilitate illicit or malicious activities, reinforce prejudice and discrimination, and contribute to climate change. We learn strategies adopted by the scientific community for doing research responsibly.
- Lecture
- Ethics, energy usage, bias and fairness, copyright
- Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
- Lab
- Extra office hours with Cara and Sophie
- Reading
- Bommasani et al. (2022), Section 5, and Bender et al. (2021), on the risks of language models
- Strubell et al. (2019) and Kang et al. (2023), on efficiency and energy usage
- Blodgett et al. (2020), Stanczak and Augenstein (2021), and Pessach and Shmueli (2022), on bias and fairness
- Karamolegkou et al. (2023), on copyright
- Strubell et al. (2019) and Kang et al. (2023), on efficiency and energy usage
- Dates
- Pass/Fail Deadline Tues 4/23 Withdrawal Deadline Tues 4/23 Project Full Proposal Due 4/27 RQ 6 Released Fri 4/26
Week 14, Apr. 30/May 1
What Is Understanding?
We have now learned many ways to teach a model about “meaning.” But do models truly understand natural language? We conclude the course with a discussion of the notion of “understanding,” and highlight the limitations of current techniques.
- Lecture
- Understanding, behavioralism, grounding, consciousness
- Slides, Zoom Recording
- Lab
- TBD
- Reading
- Pfungst (1907/1911), on Clever Hans
- Bender and Koller (2020) and NYU debate on grounding
- Chalmers (2023) and Butlin et al. (2023), on consciousness
- Bender and Koller (2020) and NYU debate on grounding
- Dates
- Final Paper Draft Due 5/4