BloggerTank.com http://www.bloggertank.com
Text search and closure properties Vikash Shankar http://www.bloggertank.com/search/label/Notes
Text search
The program grep grep -E regexp file Searches for the occurrence of patterns matching a regular expression cat|12
{cat, 12}
union
[abc]
{a, b, c}
shorthand for a|b|c
[ab][12] (ab)*
{a1, a2, b1, b2} {e, ab, abab, ...}
concatenation star
[ab]? (cat)+
{e, a, b} {cat, catcat, ...}
zero or one one or more
[ab]{2}
{aa, ab, ba, bb}
{n} copies
Searching with grep Words containing savor or savour
Words with 5 consecutive a or b
grep –E `savou?r` words
grep –E `[ab]{5}` words
outsavor savor savored savorer savorily savoriness savoringly savorless savorous
savorsome savory savour unsavored unsavoredly unsavoredness unsavorily unsavoriness unsavory
grabbable
grep –E `zo+zo+` words zoozoo
More grep commands . any symbol [a-z] anything in a range \<
beginning of line
$
end of line
5
n 2 s u 1
3
r e g
1 If a DFA is too hard, I do an ... 2 CSCI 3130 homework makes me ... 3 If a DFA likes it, it is ... 4 $10000000 = $10? 5 I study 3130 hard because it will make me ... grep –E `\<.ff.u..t` words
4
s
a f f l u e n t
a f e r l a r
a r
how do you look for... Words that start in cat and have another cat grep –E `\
Words with at least ten vowels? grep –E `([aeiouy].*){10}` words
Words without any vowels?
[^R]
does not contain R
grep –E `\<[^AEIOUYaeiouy]*$` words
Words with exactly ten vowels? grep –E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words
How grep (could) work regular expression
NFA
NFA without e
DFA
text file
in class
in grep
[ab]?, a+, (cat){3}
not allowed
allowed
input handling
matches whole
looks for pattern
output
accept/reject
finds pattern
differences
Implementation of grep • How do you handle expressions like: [ab]? → ()|[ab]
zero or one R? → e|R
(cat)+ → (cat)(cat)*
one or more R+ → RR*
a{3}
{n} copies R{n} → RR...R
→ aaa
n times
[^aeiouy]
?
not containing any
Closure properties
Example • The language L of strings that end in 101 is regular (0+1)*101 • How about the language L of strings that do not end in 101?
Example • Hint: w does not end in 101 if and only if it ends in: 000, 001, 010, 011, 100, 110 or 111 or it has length 0, 1, or 2
• So L can be described by the regular expression (0+1)*(000+001+010+010+100+110+111) + e + (0 + 1) + (0 + 1)(0 + 1)
Complement • The complement L of a language L is the set of all strings that are not in L • Examples (S = {0, 1}) – L1 = all strings that end in 101 – L1 = all strings that do not end in 101 = all strings end in 000, …, 111 or have length 0, 1, or 2 – L2 = 1* = {e, 1, 11, 111, …} – L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*
Example • The language L of strings that contain 101 is regular (0+1)*101(0+1)* • How about the language L of strings that do not contain 101? You can write a regular expression, but it is a lot of work!
Closure under complement If L is a regular language, so is L. • To argue this, we can use any of the equivalent definitions for regular languages: regular expression
NFA
DFA
• The DFA definition will be most convenient – We assume L has a DFA, and show L also has a DFA
Arguing closure under complement • Suppose L is regular, then it has a DFA M accepts L
• Now consider the DFA M’ with the accepting and rejecting states of M reversed accepts strings not in L this is exactly L
Food for thought • Can we do the same thing with an NFA?
NO!
0, 1 q0
1
q1
0
q2
(0+1)*10
1
q1
0
q2
(0+1)*
0, 1 q0
Intersection • The intersection L L’ is the set of strings that are in both L and L’ • Examples: L = (0 + 1)*11
L’ = 1*
L L’ = 1*11
L = (0 + 1)*10
L’ = 1*
L L’ = ∅
• If L, L’ are regular, is L L’ also regular?
Closure under intersection If L and L’ are regular languages, so is L L’. • To argue this, we can use any of the equivalent definitions for regular languages: regular expression
NFA
DFA
Suppose L and L’ have DFAs, call them M and M’ Goal: Construct a DFA (or NFA) for L L’
An example 1
M r0
1 0
0
M’
r1
s0
0
L = even number of 0s r0, s0
0 1
s1
1
L’ = odd number of 1s 1
r0, s1
1 0
0
r1, s0
0 1
0
r1, s1
1
L L’ = even number of 0s and odd number of 1s
Closure under intersection
states
M and M’
DFA for L L’
Q = {r1, ..., rn} Q’ = {s1, ..., sn’}
Q × Q’ = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn’)}
start state ri for M sj for M’
(ri, sj)
accepting states
F × F’ = {(ri, sj): ri F, sj F’}
F for M F’ for M’
Whenever M is in state ri and M’ is in state sj, the DFA for L L’ will be in state (ri, sj)
Closure under intersection DFA for L L’
M and M’ transitions
ri si’
a a
rj
in M
sj’
in M’
ri, si’
a
rj, sj’
Reversal • The reversal wR of a string w is w written backwards w = cave
wR = evac
• The reversal LR of a language L is the language obtained by reversing all its strings L = {cat, dog}
LR = {tac, god}
Reversal of regular languages • L = all strings that end in 101 is regular (0+1)*101 • How about LR? • This is the language of all strings beginning in 101 • Yes, because it is represented by 101(0+1)*
Closure under reversal If L is a regular language, so is LR.
• How do we argue? regular expression
NFA
DFA
Arguing closure under reversal • Take a regular expression E for L • We will show how to reverse E • A regular expression can be of the following types: – The special symbols and e – Alphabet symbols like a, b – The union, concatenation, or star of simpler expressions
Proof of closure under reversal regular expression E
reversal ER
e
e
a (alphabet symbol)
a
E1 + E2
E1R + E2R
E1E2
E2RE1R
E1*
(E1R)*
A question LDUP = {ww: w L}
Ex.
L = {cat, dog} LDUP = {catcat, dogdog}
If L is regular, is LDUP also regular?
regular expression
?
NFA
DFA
A question • Let’s try with regular expression: L = {a, b} LDUP = {aa, bb} LL = {aa, ab, ba, bb}
LDUP = LL • Let’s try with NFA: q0
e
NFA for L
e
NFA for L
e
q1
An example L = 0*1 is regular LDUP = {1, 01, 001, 0001, ...} LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}
• Let’s try to design an NFA for LDUP
An example 0
0
0
0
1
1
1
1
1
01
001
0001
LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}