LING 406: Introduction to Computational Linguistics

Richard Sproat

Spring 2008

MW 4-5:20, FLB G32

Office Hours: Wednesdays 8-9:50, FLB 4016D

Overview   Syllabus   Prerequisites and Requirements   Texts   Homeworks, Exams, Grading

Overview

The goal of this course is to give as broad a survey as possible of the field of computational linguistics, natural language processing and (to some extent) speech processing. Since we will be covering a wide range of material, it will not be possible to cover everything in depth (though in some cases I will attempt to "drill down" to give a more in depth view of particular problems). So you should not expect to come away from this course with everything you need to be a practicing computational linguist. On the other hand, you should expect to come away with an understanding of what the issues in computational linguistics are, and you should know enough to start to look more in depth at particular problems that (hopefully) will interest you in the future.

Roughly speaking, the first half of the course will be devoted to categorical methods, and the second mostly to statistical methods, which have become very popular over the past fifteen years. Note that many of the statistical methods can be thought of as algorithmic extensions of the more traditional non-statistical approaches: for example, probabilistic context free grammars might be implemented using a chart parsing algorithm, extended to include computations of the production probabilities. So learning the more traditional approaches is really a prerequisite for understanding the statistical approaches in many cases.

Syllabus

Week                Topic Reading Homework
1: 1/14, 1/16
  • J&M: Ch. 1.
  • Bob Coyne and Richard Sproat, "WordsEye: An Automatic Text-to-Scene Conversion system", SIGGRAPH 2001, Los Angeles, CA, 2001.
  • J&M: Chs. 2-3
  • Review of finite-state automata and transducers from Sproat, 2000. A Computational Theory of Writing Systems;
  • (You can also look over the introductory chapter of R&S, though that deals with both unweighted and weighted automata, the latter of which we'll be dealing with later on.)
HW 1
2: 1/23  
3: 1/28, 1/30
  • Chapter 2 from my 1992 book Morphology and Computation.
  • Review chapter from Dale, Moisl, and Harold Somers (eds.), 2000, Handbook of Natural Language Processing.
  • J&M: Ch. 3 (if you haven't already ...)
  • R&S: Ch. 2
HW 2
4: 2/4, 2/6
  • R&S: Ch. 3
  • R&S: Ch. 4 (if you haven't already read it)
  • J&M: 8.1
 
5: 2/11, 2/13
  • R&S: 7
  • J&M: 9--10
HW 3
6: 2/18, 2/20  
7: 2/25, 2/27
  • No additional reading this week: Midterm will be made available WEDNESDAY.
 
8: 3/3, 3/5 MIDTERM
9: 3/10, 3/12
  • TBD
  • Maybe no class on 3/12: finish your midterms
   
  Spring Recess    
10: 3/24, 3/26
  • R&S, 6
  • M&S, 9
HW 4
11: 3/31, 4/2  
12: 4/7, 4/9
  • The Monday class is cancelled: you should attend Mark Johnson's lecture, 1404 Siebel Center, 4-5.
  • Finish word sense disambiguation
   
13: 4/14, 4/16 HW 5
14: 4/21, 4/23
  • J&M 8
  • J&M 9
HW 6
15: 4/28, 4/30
  • 4/28 -- Project Presentations: Wei, Laehoon, Matt
  • 4/30 -- Project Presentations: Juiting, Ben, Mahmoud
   

Course Prerequisites and Requirements

The first prerequisite of this course is that you know how to program. Not all the homeworks will involve programming. But I am expecting that I will be able to ask you to write programs that implement algorithms discussed in class or in the readings. Students who do not have any programming experience will find that some of the exercises are quite difficult. If you are in that situation, then you should take either an intro-level computer science programming course, or
LING 402 (offered in the Fall), before taking LING 406.

It is assumed that you will have access to a computer from which you can access the internet, and do some of the simple computer programming exercises that will be assigned. Students for whom this will be a problem should let me know right away. (There is a Linux server in FLB where we can set up an account for you if you need it.)

Texts

There is one required book for this course: The following book is recommended. We won't use it a lot, but it is a good book to have if you are serious about this area: Also, we will be making heavy use of Jurafsky and Martin's Speech and Language Processing. There is a print edition, but it is soon to be obsolete once the new edition is out, so I don't recommend buying it. For this semester, we will make use of the online MS, which until recently was found here here. Fortunately I have a backup copy: here. (NB: This is password protected: I'll give out the password in class.)

A lot of the material we will cover will be in these books, but there will be frequent supplemental readings mostly from papers and articles available on the Web.

Homeworks, Exams, Grading

There will be roughly weekly or biweekly homeworks.

Unless I say otherwise, you should assume that all problems are equally weighted with respect to grading.

For assignments involving programming: I do not care what programming language or programming environment you use. But you must turn in evidence that you have actually written the code that is required, and show that it works by running examples. Note that in some cases, particularly when we get to corpus-based methods, there will be a right answer, one which it would be difficult for you to compute without having written a program.

There will be a take home midterm exam due on March 12.

There will also be a final, format to be determined on or before the last class, April 30.

70% of your grade will depend on the homeworks and 15% each on the midterm and final.

Final Project Option

In lieu of the final, you may elect instead to do a project relating to any general topic area that is covered in the course. The project should be non-trivial in that it should deal with a substantive issue; it will likely involve some programming, though this is not strictly required (i.e., it can address some theoretical issue in computational linguistics). I am completely open as to what topic you pick. I will also be willing to propose topics if you want to do a project but don't have any ideas what to do. But if you want to do a project you must be prepared to: The writeup of the project is due by 5:20PM on 5/2. The writeup should include a few pages of description of the problem and your solution. If there is code you should include a listing of the code, and some demonstration that it does what you claim it does.