CLASS N1: "baflopub": Nom: baflopub Singular Gen: baflopubler Dat: baflopupte Acc: baflopubzo Nom: baflopupte Plural Gen: baflopupser Dat: baflopubne Acc: baflopubve CLASS N2: "bait": Nom: baita Singular Gen: baitlar Dat: baitta Acc: baitsu Nom: baitta Plural Gen: baitsar Dat: baitna Acc: baitva CLASS N3: "beeb": Nom: beebata Singular Gen: beebatalar Dat: beebata Acc: beebatsu Nom: beebata Plural Gen: beebasar Dat: beebanna Acc: beebatva
There are also two aspects marked on the verb: perfect and imperfect.
Here are some sample verbs for each of the classes:
Class v1, baisun: imperfect: baisunnime perfect: baisunme Class v2, baetog: imperfect: levaetognotime perfect: levaetokteme Class v3, mynfox: imperfect: televynfoxnalime perfect: televynfoxlame
Voiced: b d g z v Unvoiced: p t k s fNote also that /z/ becomes /s/ after a voiceless consonant (as listed above).
Unlenited Lenited m v b v d z g x
An example parsed structure is as follows:
[S zapwoata/NP-nom [VP3 bepsuessogata/NP-dat [VPX [NP-acc tribxwibblaazo/NP-acc xlyatlozuler/NP-gen] telesukplognalime/V3]]]Note that if the non-terminal node dominates just a single word I have it marked as "word/cat".
In what you turn in, include your grammar, and a demonstration that this does indeed match what I gave you. You will probably want to use lextools for this, but you don't have to: if you want to do this in some programming language where you program in the morphology, that is also fine. But see the important caveat below.
Important caveat: you will only be able to do the problems below if you do this in a way that allows you to handle previously unseen cases. So simply compiling the list I give you here is not going to work. You are strongly advised to set it up so that if I give you a word and its category, you can compose it with your morphological analyzer and produce all legal surface forms and their features.
You will augment the CYK program you wrote for Homework 4 to include the backpointers, so that you can actually recover the structure.
Then:
word_category -> actual_surface_formfor each word category associated with each actual surface form. There will be ambiguities, but your parser should in any case be able to handle that. Then you can combine your non-terminal expansion rules and the terminal rules that you just created into one big grammar file. (Of course you could also have your program read multiple grammar files. It's up to you.)
So to clarify, here are the names of the files you will be getting and what they are:
For the set of words, produce all possible analyses using your morphological analyzers. Arrange the analyses in the form:
word1 analysis1 word1 analysis2 . . . . . . word2 analysis1 word2 analysis2 . . . .Note that this is exactly the same format as you had here.
For the sentences, produce the legal parse for each sentence. Note that all sentences will be accepted by the grammar (assuming you did the grammar correctly). Your output should be in the form:
sentence1 output1 sentence2 output2as in the example in problem 2.
Once you have run done the problems, I want you to create a directory called "midterm_youremailhandle" (where "youremailhandle" should be substituted with your email handle). Put all files in subdirectories corresponding to each of the problems above. Then tar it up as before:
tar -cvf midterm_youremailhandle.tar midterm_youremailhandleand post it somewhere where I can find it.