Natural Language Understanding and Computational Semantics
DS-GA 1012, Spring 2024
New York University
Since at least the proposal of the Turing test, building computational systems that can communicate with humans using natural language has been a central goal for Al research. Understanding real, naturally occurring human language is the key to reaching this goal. This course surveys recent successes in language understanding and prepares students to do original research in this area, culminating with a substantial final project.
Course Staff
Instructor
- Sophie Hao she/her (NYU email:
sophie.hao
)
Section Leaders
- Cara Leong she/her (NYU email:
caraleong
) - Jackson Petty he/him (NYU email:
petty
)
Graders
- Anisha Bhatnagar she/her (NYU email:
ab10945
) - Manoj Middepogu he/him (NYU email:
mm12799
) - Nori Naka he/him (NYU email:
nn1331
)
Logistics
Class sessions take place in-person and on Zoom. Recordings will be provided. Office hours may be in-person and/or on Zoom, at the discretion of the person holding the office hours.
Lectures
Tuesdays, 6:45 PM–8:25 PM, with Sophie
Room G08, 12 Waverly Place and on Zoom
Lab
Wednesdays, 7:10 PM–8:00 PM, with Cara or Jackson
Room 102, 19 University Place and on Zoom
Office Hours
Tuesdays, 4:00 PM–5:00 PM, with Jackson
Room 507, Arthur L. Carter Hall (10 Washington Place)
Wednesdays, 4:00 PM–5:00 PM, with Cara
Room 307, Arthur L. Carter Hall (10 Washington Place)
Fridays, 11:00 AM–12:00 PM, with Sophie
Room 700, 60 5th Avenue and on Zoom
Prerequisites
Students are expected to have had some experience with most of the following concepts.
Calculus and Linear Algebra
Partial derivatives, gradients, vectors, matrices, matrix multiplication, vector spaces
Probability and Statistics
Probability distributions, conditional probabilities, Bayes’s theorem, linear regression
Machine Learning and Data Science
Features (discrete vs. continuous), optimization, train/dev/test, dimensionality reduction (e.g., PCA), deep learning
Python Programming
Basic syntax, iterables/comprehension, Jupyter notebooks, package managers (e.g.,
pip
), modules, object-oriented programming, data typesNatural Language Processing
Tokenization, vector semantics, language modeling
Since this is a graduate-level course with students from a diverse array of backgrounds (data science, computer science, linguistics, and undergrads), many students will be unfamiliar with one or more of the above topics. This is okay, as long as you feel comfortable looking up anything that you don’t understand or asking for help when necessary.