Schedule and Readings

The course schedule may be subject to change.

Dates and Deadlines

Three types of dates are listed on the schedule.

NYU deadlines Import dates from the NYU academic calendar
Assignment Released On Gradescope and/or GitHub
Assignment Due On Wednesdays at 9:30 AM unless stated otherwise

Reading Assignments

All readings are available online free of charge. Some of them may require you to be on campus wi-fi or VPN or to be logged into your NYU Google Drive account.

Research Papers

In this course, we will be reading and discussing many research papers on the topic of language understanding. Readings to be discussed each week are shown in the schedule below. You may read the readings at your own pace, though you are expected to be familiar with their contents when they come up. Your understanding of the readings will be assessed via multiple-choice reading quizzes released periodically on Gradescope.

Textbooks

Many of the readings, especially those assigned during the first half of the semester, will come from the following textbooks and tutorials.

Tutorial Lecture handouts by Sophie, or tutorials from AI blogs
SLP Speech and Language Processing, 3rd Edition Draft by Dan Jurafsky and James H. Martin
D2L Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola
Ling1 Linguistic Fundamentals for Natural Language Processing by Emily M. Bender
Ling2 Linguistic Fundamentals for Natural Language Processing II by Emily M. Bender and Alex Lascarides
EOL Essentials of Linguistics, 2nd Edition by Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi

Textbook readings are designed to teach you basic technical skills required to understand and participate in NLP research. They cover established techniques and practices that are known and used by most top-tier NLP researchers and practitioners.

Schedule

Week 1, Jan. 23/24

What Is Meaning?

We introduce the concept of meaning in natural language, taking inspiration from linguists, philosophers, and data scientists. We learn about the word2vec model of semantics and examine in what sense and to what extent it models the “meaning” of individual words.

Lecture
Lexical semantics, the distributional hypothesis, word embeddings, CBOW
Slides, Zoom Recording
Lab
Linear analogies, visualization of embedding spaces
Colab Notebook, Zoom Recording
Reading
Tutorial Week 1 Handout and Semantle, a word-guessing game based on word2vec embeddings
SLP Chapters 5 and 6
D2L Chapter 2 (skip Section 2.5), Section 4.1, and Sections 15.1–15.7
Ling1 Chapter 2
Ling2 Chapters 3 and 4
EOL Sections 5.1–5.4 and 4 and 7.5
Dates
HW 0 Released HW0 Due ASAP

Week 2, Jan. 30/31

Deep Learning

We learn how to optimize an arbitrary machine learning objective using the stochastic gradient descent algorithm (SGD) and its more popular variant, Adam. We also learn how automatic differentiation is implemented in the PyTorch software library.

Lecture
SGD, Adam, automatic differentiation, PyTorch, neural networks, multi-layer perceptrons
Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
Lab
Sentiment classification using a multi-layer perceptron
Colab Notebook, Zoom Recording
Reading
Tutorial Week 2 Handout (to be released), Olah’s (2015) tutorial on backpropagation, and Sanderson’s (2017) video series on deep learning (optional)
SLP Chapter 7
D2L Chapter 1, Section 2.5, Chapter 5, Chapter 12 (skip Sections 12.7–12.9), Section 16.1, and Section 19.1 (the rest of the chapter is optional)
Dates
HW 1 Released Mon 1/29 EC 1 Released Mon 1/29 RQ 1 Released Fri 2/2

Week 3, Feb. 6/7

The Meaning of a Text

We extend word2vec’s distributional method to sequences of words, allowing us to obtain embeddings that represent sentences, paragraphs, and even entire documents. We learn about the BERT model and the Transformer encoder architecture that underlies it. We use the Turing test to evaluate BERT’s understanding of natural language.

Lecture
Transformers, transfer learning, foundation models, BERT
Slides, Zoom Recording
Lab
Natural language inference with BERT
Colab Notebook, Zoom Recording
Reading
Tutorial Alammar’s (2018a) tutorial on Transformers, and Alammar’s (2018b) tutorial on BERT
SLP Chapter 10
D2L Chapter 11 (skip Sections 11.2, 11.4, and 11.8), Sections 15.8–15.10, and Sections 16.4–16.7 (skip Section 16.5)
Ling2 Chapter 11 and Chapter 13, #89–90
Turing (1950), on the imitation game (Turing test)
The GLUE (Wang et al., 2019a) and SuperGLUE (Wang et al., 2019b) benchmarks
Dates
Add/Drop Deadline Sun 2/4 HW 1 Due Sat 2/10 EC 1 Due Sat 2/10

Week 4, Feb. 13/14

Passing the Turing Test?

BERT’s ability to identify relationships between sentences is impressive. How does BERT do it? We examine what strategies BERT uses to beat the NLI Turing test, and we analyze what information is represented in BERT’s hidden representations.

Lecture
Interpretability and model analysis, targeted challenge benchmarks, attention visualization, probing
Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
Lab
Slides, Zoom Recording
Reading
Gurungan et al. (2018) and McCoy et al. (2019), on gaming the system for NLI
Kocijan et al. (2022), The Defeat of the Winograd Schema Challenge
Rogers et al. (2020), a survey on what we know about BERT
Hao and Linzen (2023), on probing and causality
Jain and Wallace (2019), Wiegreffe and Pinter (2019), and Belinkov (2022): critical takes on interpretability techniques and findings
Dates
HW 2 Released Mon 2/12 EC 2 Released Tues 2/13 RQ 2 Released Fri 2/16

Week 5, Feb. 20/21

Doing Research in NLP

As you begin exploring project ideas, we will learn the basic scientific methodology of NLP. We discuss how to formulate a research question and design an experiment to answer it. We also learn about the peer review and publishing process for conferences, workshops, and journals affiliated with the Association for Computational Linguistics (ACL).

Lecture
NLP subfields, experimental methodology, structure of a research paper, the peer review cycle, the ACL
Slides, Zoom Recording
Lab
How to use HPC
Slides, Zoom Recording from Last Year (Unfortunately, we forgot to record this lab!)
Reading
Tutorial Lectures from Sam Bowman (slides), Chris Potts (slides), and Graham Neubig (slides and video); Ribeiro’s (2022) tutorial on brainstorming
ACL 2024 and ACL Rolling Review (ARR) call for papers
Resnik and Lin (2010), Dror et al. (2018) and Ulmer et al. (2022), on NLP experimental methods
Other papers to be added (examples of NLP project types)
Dates
HW 2 Due Sat 2/24 EC 2 Due Sat 2/24

Week 6, Feb. 27/28

Language Modeling and General Intelligence

Language modeling—the task of predicting what word comes after a string of text from a document—is one of the most fundamental techniques in NLP. Recent advances have shown that highly competent language models exhibit a number of general AI abilities, even if they are not trained on any objective other than word prediction. We explore the technique of prompting, which attempts to solve NLP tasks using the emergent abilities of large language models.

Lecture
Large language models, prompting and few-shot learning, scaling and inverse scaling
Slides, Zoom Recording
Lab
Testing GPT-4’s color perception
Zoom Recording
Reading
Tutorial Alammar’s (2019) tutorials on GPT-2 and GPT-3
The GPT-2 paper (Radford et al., 2019), on language modeling as a general intelligence problem
The GPT-3 paper (Brown et al., 2020) and Wei et al. (2022a), on prompting
Srivastava et al. (2022) and Wei et al. (2022b), on scale and emergent abilities
Lin et al. (2022), McKenzie et al. (2023), and Wei et al. (2023), on inverse scaling
Dates
RQ 3 Released Fri 3/1

Week 7, Mar. 5/6

Following Instructions and Being Helpful

We introduce language model alignment, a technique for turning language models into instruction-following assistants like ChatGPT. We learn about reinforcement learning from human feedback (RLHF), the most popular algorithm for alignment. We also explore the intellectual origins of alignment in the Effective Altruism movement.

Lecture
Alignment, RLHF, DPO, InstructGPT, AI safety
Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
Lab
Mini-proposal requirements and peer review
Slides, Zoom Recording
Reading
Tutorial Week 7 Handout (to be released)
Miles (2014), Lewis-Kraus (2022), and Levitz (2022), on Effective Altruism and alignment
Bommasani et al. (2021), motivating alignment
Askell et al. (2021), on alignment and the HHH criteria
The InstructGPT paper (Ouyang et al., 2022) on RLHF and Rafailov et al. (2023) on direct preference optimization (DPO)
Dates
HW 3 Released Mon 3/5

Week 8, Mar. 12/13

What Is Language?

We learn about what language is and how to think about it objectively and scientifically. We critically examine popular assumptions about language, which are often driven by politics and prejudice rather than linguistics. We consider ways in which the techniques covered in this course do not apply equally to all languages.

Lecture
Descriptive vs. prescriptive grammar, languages vs. dialects, linguistic nationalism and prejudice, generative grammar, typology and low-resource NLP
Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
Lab
Slides, Zoom Recording
Reading
Tutorial David R. Mortensen’s slides on low-resource NLP
Ling1 Chapter 1
EOL Chapters 1, 2, 6, and 14
Hobsbawn (1996) on language and nationalism; Davis (2000), Introduction and Chapter 8, on the creation of Italy and the Italian language
Dates
RQ 4 Released Fri 3/15

Spring Break, Mar. 18–22

No Class

Week 9, Mar. 26/27

Knowledge, Beliefs, and Memorization

What does a language model “know” or “believe”? We explore recent attempts to define these concepts and create Turing tests for them. We examine how Transformer language models come to acquire and memorize information, and survey attempts to improve the factuality of language model outputs.

Lecture
Factuality, beliefs, question answering, retrieval-augmented models, question answering, FactScore
Slides, Zoom Recording
Lab
Zoom Recording
Reading
Petroni et al. (2019) and Min et al. (2023), on knowledge and factuality
Hase et al. (2021), Arora et al. (2023), and Scherrer et al. (2023), on beliefs and values
Roberts et al. (2020) and Dai et al. (2021): how Transformers store knowledge
Lewis et al. (2020) and McCoy et al. (2021), on memorization
Tian et al. (2023), Yu et al. (2023), Vu et al. (2023), and Du et al. (2023): improving factuality in LLMs
Dates
Project Mini-Proposal Due 3/30

Week 10, Apr. 2/3

Artificial vs. Human Intelligence

Humans acquire, generate, and process language in a very particular way. We explore ways in which humans and language models are remarkably similar in their language processing capabilities, yet also incredibly different. We review recent attempts to train language models in a way that mimics the developmental process of human language acquisition.

Lecture
Surprisal theory, generalization, inductive bias, poverty of the stimulus, language acquisition
Slides, Zoom Recording
Lab
History of the Transformer architecture
Slides, Zoom Recording
Reading
Wilcox et al. (2023), Oh and Schuler (2023), and Huang et al. (2023), on surprisal theory
Hupkes et al. (2020), Kim and Linzen (2020), and Kim et al. (2022), on compositionality
Mueller et al. (2022) and Mueller et al. (2023), on hierarchical generalization
Linzen (2020), Yedetore et al. (2023) and Warstadt et al. (2023), on pre-training vs. human language acquisition
Dates
HW 3 Due

Week 11, Apr. 9/10

Logic and Reasoning

Can language models engage in complex, multi-step logical reasoning? We survey various Turing tests for reasoning skills, as well as ways to help language models reason better. Guest lecturer Will Merrill, an NYU PhD student in Data Science, will tell us about theoretical limitations imposed by the Transformer architecture on language model reasoning abilities.

Lecture
Circuit complexity, AC0 and TC0, expressive power of the Transformer
Slides, Zoom Recording
Lab
Linguistic evaluation of language models
Colab Notebook, Zoom Recording
Reading
Kojima et al. (2022), Saparov and He (2023), Saparov et al. (2023), and Press et al. (2023), on multi-hop reasoning
Li et al. (2018) and Kim and Schuster (2023), on world models and entity state tracking
Weiss et al. (2021), Merrill et al. (2022), and Merrill and Sabharwal (2023), on the computational expressive power of Transformers
Yang et al. (2023) and Trinh et al. (2024): enhancing the reasoning capabilities of language models
Dates
RQ 5 Released Fri 4/12

Week 12, Apr. 16/17

Writing and Communication

Communication is key if you want to participate in the scientific community of NLP. As you work on your final paper drafts, we will learn how to communicate your research findings effectively in writing and in your presentations.

Lecture
Writing skills, how to give a talk
Slides, Zoom Recording
Lab
Practice talks
Zoom Recording, Example Talk Video (Jason Wei, on CoT prompting)
Readings
Tutorial Sam Bowman’s slides and Schwartz’s (2023) tutorial on writing
ACL guidelines for formatting and citations

Week 13, Apr. 23/24

NLP, Technology, and Society

As NLP takes on an increasingly prominent role in modern life, concerns regarding the social impact of NLP are more pertinent than ever. As you finish up your projects, we highlight the potential for natural language technology to facilitate illicit or malicious activities, reinforce prejudice and discrimination, and contribute to climate change. We learn strategies adopted by the scientific community for doing research responsibly.

Lecture
Ethics, energy usage, bias and fairness, copyright
Slides, Zoom Recording, Extra Slides (that we didn’t get to in class)
Lab
Extra office hours with Cara and Sophie
Reading
Bommasani et al. (2022), Section 5, and Bender et al. (2021), on the risks of language models
Strubell et al. (2019) and Kang et al. (2023), on efficiency and energy usage
Blodgett et al. (2020), Stanczak and Augenstein (2021), and Pessach and Shmueli (2022), on bias and fairness
Karamolegkou et al. (2023), on copyright
Dates
Pass/Fail Deadline Tues 4/23 Withdrawal Deadline Tues 4/23 Project Full Proposal Due 4/27 RQ 6 Released Fri 4/26

Week 14, Apr. 30/May 1

What Is Understanding?

We have now learned many ways to teach a model about “meaning.” But do models truly understand natural language? We conclude the course with a discussion of the notion of “understanding,” and highlight the limitations of current techniques.

Lecture
Understanding, behavioralism, grounding, consciousness
Slides, Zoom Recording
Lab
TBD
Reading
Pfungst (1907/1911), on Clever Hans
Bender and Koller (2020) and NYU debate on grounding
Chalmers (2023) and Butlin et al. (2023), on consciousness
Dates
Final Paper Draft Due 5/4

Final Exam Period, May 8–14

Final Project Presentations

You will give a talk of no longer than 5 minutes for your final project. More information to come.

Dates
Final Paper Due Tues 5/14 Reading Quizzes Due Tues 5/14