-
Given the dictionary
here,
write regular expressions in your favorite language -- python, perl
... -- to find:
-
All words ending in the suffix "ness"
- All words beginning with the prefix "un"
- All words ending in "able", but not words like "able", "table",
"cable" with at most one letter before the "able"
- All words containing at least two adjacent "e"s
- All words containing at least two non-adjacent "e"s
- All words containing exactly two non-adjacent "e"s
- A reasonable set of Irish and Scottish Gaelic names (e.g. ones
that begin with Mac or Mc...)
Show your code for each case, and show some output (to show that it
works).
-
Assume the alphabet {a,i,u,p,t,k,w}. Draw an FST that implements the
following rule:
w -> epsilon / V __ V
(delete a "w" between two vowels) where "epsilon" is the empty string,
and "V" is {a,i,u}. To simplify the diagram, it is fine to use a class
label ("V") on an arc to represent multiple arcs each labeled with an
individual letter (e.g. "a", "i", or "u").
-
For the following transducer:
Show what it outputs for the following strings:
- aaa
- dcefa
- bcad
- cadcef
- abbbacad
- accaddda
Note that "<epsilon>" denotes the empty string.
Is the transducer deterministic? Why or why not?
-
The transducer in the previous example was compiled from two rewrite
rules. On the basis of the behavior in the above strings, and by
looking at the transducer, see if you can figure out what those two
rules were. Hint: you should pay attention to the function of the two
non-final states 4 and 5. They are non-final for a reason: if you land
in one of those states, you are forced to take another arc to get
out.
-
Lextools Installation and Testing.
You do not need to hand in anything for this problem, but
please do this anyway since you will need this for next week's
homework.
Please see the instructions here for
how to download and test the fsm tools and lextools.