Homework 1

Due Monday, January 28

  1. Given the dictionary here, write regular expressions in your favorite language -- python, perl ... -- to find:

    1. All words ending in the suffix "ness"
    2. All words beginning with the prefix "un"
    3. All words ending in "able", but not words like "able", "table", "cable" with at most one letter before the "able"
    4. All words containing at least two adjacent "e"s
    5. All words containing at least two non-adjacent "e"s
    6. All words containing exactly two non-adjacent "e"s
    7. A reasonable set of Irish and Scottish Gaelic names (e.g. ones that begin with Mac or Mc...)

    Show your code for each case, and show some output (to show that it works).

  2. Assume the alphabet {a,i,u,p,t,k,w}. Draw an FST that implements the following rule:

    w -> epsilon / V __ V

    (delete a "w" between two vowels) where "epsilon" is the empty string, and "V" is {a,i,u}. To simplify the diagram, it is fine to use a class label ("V") on an arc to represent multiple arcs each labeled with an individual letter (e.g. "a", "i", or "u").

  3. For the following transducer:

    Show what it outputs for the following strings:

    1. aaa
    2. dcefa
    3. bcad
    4. cadcef
    5. abbbacad
    6. accaddda

    Note that "<epsilon>" denotes the empty string.

    Is the transducer deterministic? Why or why not?

  4. The transducer in the previous example was compiled from two rewrite rules. On the basis of the behavior in the above strings, and by looking at the transducer, see if you can figure out what those two rules were. Hint: you should pay attention to the function of the two non-final states 4 and 5. They are non-final for a reason: if you land in one of those states, you are forced to take another arc to get out.

  5. Lextools Installation and Testing.

    You do not need to hand in anything for this problem, but please do this anyway since you will need this for next week's homework.

    Please see the instructions here for how to download and test the fsm tools and lextools.