Natural Language Understanding and Computational Semantics

DS-GA 1012, Spring 2024
New York University

Since at least the proposal of the Turing test, building computational systems that can communicate with humans using natural language has been a central goal for Al research. Understanding real, naturally occurring human language is the key to reaching this goal. This course surveys recent successes in language understanding and prepares students to do original research in this area, culminating with a substantial final project.

Course Staff

Instructor

  • Sophie Hao she/her (NYU email: sophie.hao)

Section Leaders

  • Cara Leong she/her (NYU email: caraleong)
  • Jackson Petty he/him (NYU email: petty)

Graders

  • Anisha Bhatnagar she/her (NYU email: ab10945)
  • Manoj Middepogu he/him (NYU email: mm12799)
  • Nori Naka he/him (NYU email: nn1331)

Logistics

Class sessions take place in-person and on Zoom. Recordings will be provided. Office hours may be in-person and/or on Zoom, at the discretion of the person holding the office hours.

Lectures

Tuesdays, 6:45 PM–8:25 PM, with Sophie
Room G08, 12 Waverly Place and on Zoom

Lab

Wednesdays, 7:10 PM–8:00 PM, with Cara or Jackson
Room 102, 19 University Place and on Zoom

Office Hours

Tuesdays, 4:00 PM–5:00 PM, with Jackson
Room 507, Arthur L. Carter Hall (10 Washington Place)

Wednesdays, 4:00 PM–5:00 PM, with Cara
Room 307, Arthur L. Carter Hall (10 Washington Place)

Fridays, 11:00 AM–12:00 PM, with Sophie
Room 700, 60 5th Avenue and on Zoom

Prerequisites

Students are expected to have had some experience with most of the following concepts.

Calculus and Linear Algebra

Partial derivatives, gradients, vectors, matrices, matrix multiplication, vector spaces

Probability and Statistics

Probability distributions, conditional probabilities, Bayes’s theorem, linear regression

Machine Learning and Data Science

Features (discrete vs. continuous), optimization, train/dev/test, dimensionality reduction (e.g., PCA), deep learning

Python Programming

Basic syntax, iterables/comprehension, Jupyter notebooks, package managers (e.g., pip), modules, object-oriented programming, data types

Natural Language Processing

Tokenization, vector semantics, language modeling

Since this is a graduate-level course with students from a diverse array of backgrounds (data science, computer science, linguistics, and undergrads), many students will be unfamiliar with one or more of the above topics. This is okay, as long as you feel comfortable looking up anything that you don’t understand or asking for help when necessary.