Homework 5

Due Friday, May 2


Note that this is the first of two homeworks in lieu of a final exam, to be completed by everyone who is not doing a final project. The second homework will be distributed on April 23.

Implement a version of Yarowsky's log-likelihood-based approach to sense disambiguation for the two homographs bass (FISH versus MUSICAL-RANGE) and sake (CAUSE versus RICE-BEER). The following files contain training and test data for each sense:

Each line in the file contains the target word (bass or sake), followed by a colon, followed by five words of left context, the target word, and five words of right context. Lines beginning with a star are to be interpreted as follows: The senses were tagged by hand, by me, so there are likely to be some errors.

You are to implement Yarowsky's basic log-likelihood-based decision list model and report on the results. Some suggestions:

You should report on the following for each of the two test sets:

You should also check your test results (there are only 100 in each case): it is possible that some of your errors are actually correct, and reflect mistaggings on my part. Report if you find any such cases.