Homework 3

Due Monday, February 25

See below on how you will be graded on this homework.

Problems

Data for this problem set can be found at http://catarina.ai.uiuc.edu/L406_08/Homeworks/hw3.tgz. Untar it and go into the subdirectory "hw3".
  1. Write a grammar converter that takes an arbitrary CF grammar, and converts it to CNF. There is an example of a small grammar in "grammar.txt". You should make sure that your output matches "grammar.cnf.txt", modulo differences in the numbering of the generated nodes: but please make your newly generated nodes at least follow the same format as mine, namely something of the form "X[0-9][0-9][0-9]".
  2. Implement the non-probabilistic "CKY" (or "CYK") recognizer. (Note that both R&S's description and J&M's description, as I pointed out in class, are for the probabilistic version.) The recognizer should take a grammar in CNF format, and output analyses of the sentence in the format shown in "sentences.out". This is the result of running the CNF grammar you produced in the first problem, on the sentences in "sentences". Again, you should make sure that what you produce matches what I have in "sentences.out". Note in particular how ambiguity is represented for the words "dog" and "red".
  3. Fix the grammar to allow subject-verb agreement. I assume you will use purely CF devices to do this, even though as we went over in class this is not a practical method for handling agreement phenomena in general.

    Add the following words to the grammar from Problem 1:

    NS	bananas
    NS	cats
    NS	dogs
    NS	walls
    NS	shins
    
    VS	eats
    VS	chases
    VS	kicks
    V	eat
    V	chase
    V	kick
    
    Show the resulting grammar. Show that your parser does the right thing with the grammar for the following:

    the dogs eat the bananas
    the dog eats the bananas
    *the dog eat the bananas
    *the dogs eats the bananas

How you will be graded

This homework will be partly self-evaluated, along the lines of the various (D)ARPA speech recognition bakeoffs and other such competitions, in that you will be responsible for running your scripts and producing output, which I will then compare with what I get from my own scripts.

Since you will be running your own code, I do not care what programming language you use to write your programs.

Here's what will happen. Please follow these instructions carefully as I will deduct points if I have to do extra work to check your results :

  1. At 8:00 PM on the evening of Sunday, February 24, there will appear a file at http://catarina.ai.uiuc.edu/L406_08/Homeworks/hw3-eval.tgz
  2. Once you have unpacked that tarfile, you will change directories to hw3-eval and you will find:
  3. You will run your CNF expander on "grammar.txt" and put the result in "grammar.cnf.txt".
  4. You will then run your CKY recognizer on "sentences", and put the result in "sentences.out"
  5. Finally, you will run the CKY recognizer with the grammar from Problem 3 on the sentences in "sentences2" and put them in "sentences2.out". The file "additional_words" will contain some additional words that you will need to add to your Problem 3 grammar in order to parse the sentences in "sentences2".
Once you have run these and created the output files, I want you to create a directory called "hw3_youremailhandle" (where "youremailhandle" should be substituted with your email handle). In that directory put the "grammar.cnf.txt", "sentences.out" and "sentences2.out" that you created just above. You should also put in that directory a copy of your CNF expander, a copy of your CKY parser, and the expanded grammar you created for problem 3. (I take it that it's obvious, but if you use a compiled language like C or C++, I want the source code, not the executable; in that case also provide a Makefile that shows how to compile your code.) Then tar up the result as follows:
 tar -cvf hw3_youremailhandle.tar hw3_youremailhandle 
and post it somewhere where I can find it.

Note that the grammars and sentence files I will provide will be quite a bit bigger than the small examples that you worked with before. So you won't want to be doing this by hand.