Complete Download Grammatical Inference: Algorithms, Routines and Applications 1st Edition Wojciech Wieczorek (Auth.) PDF All Chapters
Complete Download Grammatical Inference: Algorithms, Routines and Applications 1st Edition Wojciech Wieczorek (Auth.) PDF All Chapters
com
https://textbookfull.com/product/grammatical-inference-
algorithms-routines-and-applications-1st-edition-wojciech-
wieczorek-auth/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-loucas/
textboxfull.com
https://textbookfull.com/product/computer-age-statistical-inference-
algorithms-evidence-and-data-science-1st-edition-bradley-efron/
textboxfull.com
https://textbookfull.com/product/computer-age-statistical-inference-
algorithms-evidence-and-data-science-1st-edition-bradley-efron-2/
textboxfull.com
https://textbookfull.com/product/inference-for-heavy-tailed-data-
applications-in-insurance-and-finance-1st-edition-liang-peng/
textboxfull.com
Universal Chess Training 1st Edition Wojciech Moranda
https://textbookfull.com/product/universal-chess-training-1st-edition-
wojciech-moranda/
textboxfull.com
https://textbookfull.com/product/grouping-genetic-algorithms-advances-
and-applications-1st-edition-michael-mutingi/
textboxfull.com
https://textbookfull.com/product/swarm-intelligence-algorithms-
modifications-and-applications-1st-edition-adam-slowik/
textboxfull.com
https://textbookfull.com/product/wireless-algorithms-systems-and-
applications-sriram-chellappan/
textboxfull.com
Studies in Computational Intelligence 673
Wojciech Wieczorek
Grammatical
Inference
Algorithms, Routines and Applications
Studies in Computational Intelligence
Volume 673
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: [email protected]
About this Series
Grammatical Inference
Algorithms, Routines and Applications
123
Wojciech Wieczorek
Institute of Computer Science
University of Silesia Faculty of Computer
Science and Materials Science
Sosnowiec
Poland
Grammatical inference, the main topic of this book, is a scientific area that lies at
the intersection of multiple fields. Researchers from computational linguistics,
pattern recognition, machine learning, computational biology, formal learning
theory, and many others have their own contribution. Therefore, it is not surprising
that the topic has also a few other names such as grammar learning, automata
inference, grammar identification, or grammar induction. To simplify the location
of present contribution, we can divide all books relevant to grammatical inference
into three groups: theoretical, practical, and applicable. In greater part this book is
practical, though one can also find the elements of learning theory, combinatorics
on words, the theory of automata and formal languages, plus some reference to
real-life problems.
The purpose of this book is to present old and modern methods of grammatical
inference from the perspective of practitioners. To this end, the Python program-
ming language has been chosen as the way of presenting all the methods. Included
listings can be directly used by the paste-and-copy manner to other programs, thus
students, academic researchers, and programmers should find this book as the
valuable source of ready recipes and as an inspiration for their further development.
A few issues should be mentioned regarding this book: an inspiration to write it,
a key for the selection of described methods, arguments for selecting Python as an
implementation language, typographical notions, and where the reader can send any
critical remarks about the content of the book (subject–matter, listings etc.).
There is a treasured book entitled “Numerical recipes in C”, in which along with
the description of selected numerical methods, listings in C language are provided.
The reader can copy and paste the fragments of the electronic version of the book in
order to produce executable programs. Such an approach is very useful. We can
find an idea that lies behind a method and immediately put it into practice. It is a
guiding principle that accompanied writing the present book.
For the selection of methods, we try to keep balance between importance and
complexity. It means that we introduced concepts and algorithms which are
essential to the GI practice and theory, but omitted that are too complicated or too
v
vi Preface
long to present them as a ready-to-use code. Thanks to that, the longest program
included in the book is no more than a few pages long.
As far as the implementation language is concerned, the following requirements
had to be taken into account: simplicity, availability, the property of being firmly
established, and allowing the use of wide range of libraries. Python and FSharp
programming languages were good candidates. We decided to choose IronPython
(an implementation of Python) mainly due to its integration with the optimization
modeling language. We use a monospaced (fixed-pitch) font for the listings of
programs, while the main text is written using a proportional font. In listings,
Python keywords are in bold.
The following persons have helped the author in preparing the final version of
this book by giving valuable advice. I would like to thank (in alphabetical order):
Prof. Z.J. Czech (Silesian University of Technology), Dr. P. Juszczuk, Ph.D. stu-
dent A. Nowakowski, Dr. R. Skinderowicz, and Ph.D. student L. Strak (University
of Silesia).
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Problem and Its Various Formulations . . . . . . . . . . . . . . . . . . 1
1.1.1 Mathematical Versus Computer Science Perspectives . . . . . 1
1.1.2 Different Kinds of Output . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Representing Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Complexity Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Assessing Algorithms’ Performance . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Measuring Classifier Performance . . . . . . . . . . . . . . . . . . . . 7
1.2.2 McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 5 2 Cross-Validated Paired t Test . . . . . . . . . . . . . . . . . . 9
1.3 Exemplary Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Peg Solitaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Classification of Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Bibliographical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 State Merging Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Evidence Driven State Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Gold’s Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Grammatical Inference with MDL Principle . . . . . . . . . . . . . . . . . . 27
2.4.1 The Motivation and Appropriate Measures . . . . . . . . . . . . . 28
2.4.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Bibliographical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Partition-Based Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 The k-tails Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Grammatical Inference by Genetic Search . . . . . . . . . . . . . . . . . . . 37
3.3.1 What Are Genetic Algorithms? . . . . . . . . . . . . . . . . . . . . . . 37
vii
viii Contents
xi
Chapter 1
Introduction
Let us start with the presentation of how many variants of a grammatical inference
problem we may be faced with. Informally, we are given a sequence of words and
the task is to find a rule that lies behind it. Different models and goals are given
by response to the following questions. Is the sequence finite or infinite? Does the
sequence contain only examples (positive words) or also counter-examples (negative
words)? Is the sequence of the form: all positive and negative words up to a certain
length n? What is meant by the rule: are we satisfied with regular acceptor, context-
free grammar, context-sensitive grammar, or other tool? Among all the rules that
match the input, should the obtained one be of a minimum size?
The main division of GI models comes from the size of a sequence. When it is infinite,
we deal with mathematical identification in the limit. The setting of this model is that
of on-line, incremental learning. After each new example, the learner (the algorithm)
must return some hypothesis (an automaton or a CFG). Identification is achieved
when the learner returns a correct answer and does not change its decision afterwards.
With respect to this model the following results have been achieved: (a) if we are
given examples and counter-examples of the language to be identified (learning from
informant), and each individual word is sure of appearing, then at some point the
inductive machine will return the correct hypothesis; (b) if we are given only the
examples of the target (learning from text), then identification is impossible for any
super-finite class of languages, i.e., a class containing all finite languages and at least
one infinite language. In this book, however, we only consider the situation when
the input is finite, which can be called a computer science perspective. We are going
to describe algorithms the part of which base on examples only, and the others base
The next point that should be made is that how important it is to pinpoint the kind of
a target. Consider the set of examples: {abc, aabbcc, aaabbbccc}. If a solution is
being sought in the class of regular languages, then one possible guess is presented in
Fig. 1.1. This automaton matches every word starting with one or more as, followed
by one or more bs, and followed by one or more cs. If a solution is being sought
in the class of context-free languages, then one of possible answers is the following
grammar:
S → ABC A → a|aAB
B→b C →c|CC
It is clearly seen that the language accepted by this CFG is {am bm cn : m, n ≥ 1}.
Finally, if a solution is being sought in the class of context-sensitive languages, then
even more precise conjecture can be made:
S →aBc a A → aa b A → Ab
B → Ab B c B c → bc
to this problem have been proposed. First, the collection of different hypotheses
is suggested, and it is up to a user to select the most promising one. Second, the
minimum description length (MDL) principle is applied, in which not only the size
of an output matters but also the amount of information that is needed to encode
every example.
Many definitions specific to particular methods are put in relevant sections. Herein,
we give definitions that will help to understand the precise formulation of our GI
problem and its complexity. Naturally, we skip conventional definitions and notation
from set theory, mathematical logic, and discrete structures, in areas that are on
undergraduate courses.
Definition 1.1 Σ will be a finite nonempty set, the alphabet. A word (or sometimes
string) is a finite sequence of symbols chosen from an alphabet. For a word w, we
denote by |w| the length of w. The empty word λ is the word with zero occurrences of
symbols. Sometimes, to be coherent with external notation, we write epsilon instead
of the empty word. Let x and y be words. Then x y denotes the catenation of x and
y, that is, the word formed by making a copy of x and following it by a copy of y.
We denote, as usual by Σ ∗ , the set of all words over Σ and by Σ + the set Σ ∗ − {λ}.
A word w is called a prefix (resp. a suffix) of a word u if there is a word x such that
u = wx (resp. u = xw). The prefix or suffix is proper if x = λ. Let X, Y ⊂ Σ ∗ .
The catenation (or product) of X and Y is the set X Y = {x y | x ∈ X, y ∈ Y }. In
particular, we define
n
X 0 = {λ}, X n+1 = X n X (n ≥ 0), X ≤n = Xi. (1.1)
i=0
To simplify the representations for languages (i.e., sets of words), we define the
notion of regular expressions, finite-state automata, and context-free grammars over
alphabet Σ as follows.
Definition 1.3 The set of regular expressions (regexes) over Σ will be the set of
words R such that
1. ∅ ∈ R which represents the empty set.
2. Σ ⊆ R, each element a of the alphabet represents language {a}.
4 1 Introduction
Definition 1.5 If in a NFA A for every pair (q, a) ∈ Q × Σ, the transition function
holds |δ(q, a)| ≤ 1, then A is called a deterministic finite automaton (DFA).
1.1.5 Summary
technique can also be used for algorithms that work on S = (S+ , S− ), provided that their
1 This
Suppose that we are given a sample S from a certain domain. How would we evaluate
a GI method or compare more such methods. First, a proper measure should be
chosen. Naturally, it depends on a domain’s characteristic. Sometimes only precision
would be sufficient, but in other cases relying only on a single specific measure—
without calculating any general measure of the quality of binary classifications (like
Matthews correlation coefficient or others)—would be misleading.
After selecting the measure of error or quality, we have to choose between three
basic scenarios. (1) The target language is known, and in case of regular languages
we simply check the equivalence between minimal DFAs; for context-free languages
we are forced to generate the first n words in the quasi-lexicographic order from
the hypothesis and from the target, then check the equality of two sets. Random
sampling is also an option for verifying whether two grammars describe the same
language. When the target is unknown we may: (2) random split S into two subsets,
the training and the test set (T&T) or (3) apply K -fold cross-validation (CV), and then
use a selected statistical test. Statisticians encourage us to choose T&T if |S| > 1 000
and CV otherwise. In this section, McNemar’s test for a scenario (2) and 5 × 2 CV
t test for a scenario (3), are proposed.
Selecting one measure that will describe some phenomenon in a population is cru-
cial. In the context of an average salary, take for example the arithmetic mean and
the median from the sample: $55,000, $59,000, $68,000, $88,000, $89,000, and
$3,120,000. Our mean is $579833.33. But it seems that the median,2 $78,000 is
better suited for estimating our data. Such examples can be multiplied. When we
are going to undertake an examination of binary classification efficiency for selected
real biological or medical data, making the mistake to classify a positive object as
a negative one, may cause more damage than incorrect putting an object to the group
of positives; take examples—the ill, counter-examples—the healthy, for good illus-
tration. Then, recall (also called sensitivity, or a true positive rate) would be more
suited measure than accuracy, as recall quantifies the avoiding of false negatives.
By binary classification we mean mapping a word to one out of two classes by
means of inferred context-free grammars and finite-state automata. The acceptance
of a word by a grammar (resp. an automaton) means that the word is thought to
belong to the positive class. If a word, in turn, is not accepted by a grammar (resp. an
automaton), then it is thought to belong to the negative class. For binary classification
a variety of measures has been proposed. Most of them base on rates that are shown
2 The median is the number separating the higher half of a data sample, a population, or a probability
distribution, from the lower half. If there is an even number of observations, then there is no single
middle value; the median is then usually defined to be the mean of the two middle values.
8 1 Introduction
in Table 1.1. There are four possible cases. For a positive object (an example), if
the prediction is also positive, this is a true positive; if the prediction is negative for
a positive object, this is a false negative. For a negative object, if the prediction is
also negative, we have a true negative, and we have a false positive if we predict
a negative object as positive. Different measures appropriate in particular settings
are given below:
If the class distribution is not uniform among the classes in a sample (e.g. the set of
examples forms a minority class), so as to avoid inflated measurement, the balanced
accuracy should be applied. It has been shown that BAR is equivalent to the AUC (the
area under the ROC curve) score in case of the binary (with 0/1 classes) classification
task. So BAR is considered as primary choice if a sample is highly imbalanced.
Given a training set and a test set, we use two algorithms to infer two acceptors
(classifiers) on the training set and test them on the test set and compute their errors.
The following natural numbers have to be determined:
• e1 : number of words misclassified by 1 but not 2,
• e2 : number of words misclassified by 2 but not 1.
Under the null hypothesis that the algorithms have the same error rate, we expect
e1 = e2 . We have the chi-square statistic with one degree of freedom
1.2 Assessing Algorithms’ Performance 9
(|e1 − e2 | − 1)2
∼ χ12 (1.2)
e1 + e2
and McNemar’s test rejects the hypothesis that the two algorithms have the same
error rate at significance level α if this value is greater than χα,1
2
. For α = 0.05,
χ0.05,1 = 3.84.
2
This test uses training and test sets of equal size. We divide the dataset S randomly
into two parts, x1(1) and x1(2) , which gives our first pair of training and test sets. Then
we swap the role of the two halves and get the second pair: x1(2) for training and x1(1)
( j)
for testing. This is the first fold; xi denotes the j-th half of the i-th fold. To get the
second fold, we shuffle S randomly and divide this new fold into two, x2(1) and x2(2) .
We then swap these two halves to get another pair. We do this for three more folds.
( j)
Let pi be the difference between the error rates (for the test set) of the two
classifiers (obtained from the training set) on fold j = 1, 2 of replication i =
1, . . . , 5. The average on replication i is pi = ( pi(1) + pi(2) )/2, and the estimated
variance is si2 = ( pi(1) − pi )2 + ( pi(2) − pi )2 . The null hypothesis states that the two
algorithms have the same error rate. We have t statistic with five degrees of freedom
p (1)
1 ∼ t5 . (1.3)
5
i=1 is 2
/5
The 5×2 CV paired t test rejects the hypothesis that the two algorithms have the same
error rate at significance level α if this value is outside the interval (−tα/2, 5 , tα/2, 5 ).
If significance level equals 0.05, then t0.025, 5 = 2.57.
number of 1 s in ascending order. We can easily find out that |I | = 1104 for n = 12
and |I | = 6872 for n = 15. This leads us to the following algorithm.
from FAdo.fa import *
from FAdo.reex import *
def moves(w):
result = set()
n = len(w)
if w[:3] == ’110’:
result.add(’1’ + w[3:])
for i in xrange(2, n-2):
if w[i-2:i+1] == ’011’:
result.add(w[:i-2] + ’100’ + w[i+1:])
if w[i:i+3] == ’110’:
result.add(w[:i] + ’001’ + w[i+3:])
if w[-3:] == ’011’:
result.add(w[:-3] + ’1’)
return result
def pegnum(w):
c = 0
for i in xrange(len(w)):
if w[i] == ’1’:
c += 1
return c
def generateExamples(n):
"""Generates all peg words of length <= n
Input: n in {12, 15}
Output: the set of examples"""
rexp = str2regexp("1(1 + 01 + 001)*")
raut = rexp.toNFA()
g = EnumNFA(raut)
numWords = {12: 1104, 15: 6872}[n]
g.enum(numWords)
words = sorted(g.Words, \
cmp = lambda x, y: cmp(pegnum(x), pegnum(y)))
S_plus = {’1’, ’11’}
for i in xrange(4, numWords):
if moves(words[i]) & S_plus:
S_plus.add(words[i])
return S_plus
Step 2
The second step is just the invocation of the k-tails algorithm. Because the algorithm
outputs the sequence of hypothesis, its invocation has been put into the for-loop
structure as we can see in a listing from step 3.
Step 3
The target language is unknown, that is why we have to propose any test for probable
correctness of obtaining automata. To this end, we generate two sets of words, namely
positive test set (Test_pos) and negative test set (Test_neg). The former contains
all words form the set H ∩ {0, 1}≤15 , the latter contains the remaining words over
{0, 1} up to the length 15. An automaton is supposed to be correct if it accepts all
words from the positive test set and accepts no word from the negative test set.
12 1 Introduction
0 1
0 1 1
1 1 0 1
1 1 0
0 0 1
1
1
1 0
0 1
1 0
1
def allWords(n):
"""Generates all words over {0, 1} up to length n
Input: an integer n
Output: all w in (0 + 1)* such that 1 <= |w| <= n"""
rexp = str2regexp("(0 + 1)(0 + 1)*")
raut = rexp.toNFA()
g = EnumNFA(raut)
g.enum(2**(n+1) - 2)
return set(g.Words)
Train_pos = generateExamples(12)
Test_pos = generateExamples(15)
Test_neg = allWords(15) - Test_pos
for A in synthesize(Train_pos):
if all(A.evalWordP(w) for w in Test_pos) \
and not any(A.evalWordP(w) for w in Test_neg):
Amin = A.toDFA().minimalHopcroft()
print Amin.Initial, Amin.Final, Amin.delta
break
The biologist’s question is which method, based on decision trees or based on gram-
matical inference will achieve better accuracy.
As regards decision trees approach, classification and regression trees (CART),
a non-parametric decision tree learning technique, has been chosen. For this purpose
we took advantage of a scikit-learn’s3 optimized version of the CART algorithm:
Sigma = set(list("NMLKIHWVTSRQYGFEDCAP"))
idx = dict(zip(list(Sigma), range(len(Sigma))))
def findACC(f):
score = 0
for w in Test_pos:
if f(w):
score += 1
for w in Test_neg:
if not f(w):
score += 1
if score == 0:
return 0.0
else:
return float(score)/float(len(Test_pos) + len(Test_neg))
X = []
Y = []
for x in Train_pos:
X.append(map(lambda c: idx[c], list(x)))
Y.append(1)
for y in Train_neg:
X.append(map(lambda c: idx[c], list(y)))
Y.append(0)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
print findACC(partial(acceptsBy, clf))
As for GI approach, it was the induction of a minimal NFA choice, since the
sample is not very large.
from functools import partial
from FAdo.common import DFAsymbolUnknown
The resulting automaton is depicted in Fig. 1.3. The ACC scores for the obtained deci-
sion tree and NFA are equal to, respectively, 0.616 and 0.677. It is worth emphasizing,
1.3 Exemplary Applications 15
N, M, Y E, C, T
E, N, W, V, S, Q F, L
q1 F, I
F, N, L, Y N, H, W, V, S, Y
G, E, D, A, L, V, T, S, Q I
F, M, I, T, S, Y
N, W, T
F, A, K, W, V, R q3 q2
F, A, N, L, K, I, V, Q, Y
q0 G, D, W
G, D, N, M, L, H
G, E, C, A, T, Q
though, that McNemar’s test does not reject the hypothesis that the two algorithms
have the same error rate at significance level 0.05 (the computed statistic was equal
to 0.543, so the null hypothesis might be rejected if α ≥ 0.461).
are Book and Otto (1993) and Rozenberg and Salomaa (1997). Some positive GI
results in this context can be found in Eyraud et al. (2007).
Parsing with context-free grammars is an easy task and is analyzed in many books,
for example in Grune and Jacobs (2008). However, context-sensitive membership
problem is PSPACE-complete, which was shown by Kuroda (1964). In fact, the
problem remains hard even for deterministic context-sensitive grammars.
A semi-incremental method described at the end of Sect. 1.1 was introduced by
Dupont (1994). Imada and Nakamura (2009) also applied the similar approach of
the learning process with a SAT solver.
Statistical tests given in this chapter were compiled on Alpaydin (2010). Another
valuable book, especially when we need to compare two (or more) classifiers on
multiple domains, was written by Japkowicz and Shah (2011).
A list of practical applications of grammatical inference can be found in many
works; the reader can refer to Bunke and Sanfelieu (1990), de la Higuera (2005),
Higuera (2010), and Heinz et al. (2015) as good starting points on this topic. The
first exemplary application (peg solitaire) is stated as the 48th unsolved problem
in combinatorial games Nowakowsi (1996). This problem was solved by Moore
and Eppstein (2003) and we have verified that our automaton is equivalent to the
regular expression given by them. The second exemplary application is a hypothetical
problem, though, the data are real and come from Maurer-Stroh et al. (2010).
References
Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press
Angluin D (1976) An application of the theory of computational complexity to the study of inductive
inference. PhD thesis, University of California
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Book RV, Otto F (1993) String-rewriting systems. Springer, Text and Monographs in Computer
Science
Bunke H, Sanfelieu A (eds) (1990) Grammatical inference. World Scientific, pp 237–290
Charikar M, Lehman E, Liu D, Panigrahy R, Prabhakaran M, Sahai A, Shelat A (2005) The smallest
grammar problem. IEEE Trans Inf Theory 51(7):2554–2576
de la Higuera C (2005) A bibliographical study of grammatical inference. Pattern Recogn
38(9):1332–1348
de la Higuera C (2010) Grammatical inference: learning automata and grammars. Cambridge Uni-
versity Press, New York, NY, USA
Domaratzki M, Kisman D, Shallit J (2002) On the number of distinct languages accepted by finite
automata with n states. J Autom Lang Comb 7:469–486
Dupont P (1994) Regular grammatical inference from positive and negative samples by genetic
search: the GIG method. In: Proceedings of 2nd international colloquium on grammatical infer-
ence, ICGI ’94, Lecture notes in artificial intelligence, vol 862. Springer, pp 236–245
Eyraud R, de la Higuera C, Janodet J (2007) Lars: a learning algorithm for rewriting systems. Mach
Learn 66(1):7–31
Gold EM (1967) Language identification in the limit. Inf Control 10:447–474
Gold EM (1978) Complexity of automaton identification from given data. Inf Control 37:302–320
Grune D, Jacobs CJ (2008) Parsing techniques: a practical guide, 2nd edn. Springer
References 17
Heinz J, de la Higuera C, van Zaanen M (2015) Grammatical inference for computational linguistics.
Synthesis lectures on human language technologies. Morgan & Claypool Publishers
Hopcroft JE, Motwani R, Ullman JD (2001) Introduction to automata theory, languages, and com-
putation, 2nd edn. Addison-Wesley
Hunt HB III, Rosenkrantz DJ, Szymanski TG (1976) On the equivalence, containment, and covering
problems for the regular and context-free languages. J Comput Syst Sci 12:222–268
Imada K, Nakamura K (2009) Learning context free grammars by using SAT solvers. In: Proceed-
ings of the 2009 international conference on machine learning and applications, IEEE computer
society, pp 267–272
Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cam-
bridge University Press
Jiang T, Ravikumar B (1993) Minimal NFA problems are hard. SIAM J Comput 22:1117–1141
Kuroda S (1964) Classes of languages and linear-bounded automata. Inf Control 7(2):207–223
Maurer-Stroh S, Debulpaep M, Kuemmerer N, Lopez de la Paz M, Martins IC, Reumers J, Morris
KL, Copland A, Serpell L, Serrano L et al (2010) Exploring the sequence determinants of amyloid
structure using position-specific scoring matrices. Nat Methods 7(3):237–242
Meyer AR, Stockmeyer LJ (1972) The equivalence problem for regular expressions with squaring
requires exponential space. In: Proceedings of the 13th annual symposium on switching and
automata theory, pp 125–129
Moore C, Eppstein D (2003) One-dimensional peg solitaire, and duotaire. In: More games of no
chance. Cambridge University Press, pp 341–350
Nowakowski RJ (ed) (1996) Games of no chance. Cambridge University Press
Rozenberg G, Salomaa A (eds) (1997) Handbook of formal languages, vol 3. Beyond words.
Springer
Trakhtenbrot B, Barzdin Y (1973) Finite automata: behavior and synthesis. North-Holland Publish-
ing Company
Valiant LG (1984) A theory of the learnable. Commun ACM 27:1134–1142
Chapter 2
State Merging Algorithms
2.1 Preliminaries
Before we start analyzing how the state merging algorithms work, some basic func-
tions on automata as well as functions on the sets of words have to be defined. We
assume that below given routines are available throughout the whole book. Please
refer to Appendixes A, B, and C in order to familiarize with the Python program-
ming language, its packages relevant to automata, grammars, and regexes, and some
combinatorial optimization tools. Please notice also that we follow the docstring con-
vention. A docstring is a string literal that occurs as the first statement in a function
(module, class, or method definition). Such string literals act as documentation.
from FAdo.fa import *
def alphabet(S):
"""Finds all letters in S
Input: a set of strings: S
Output: the alphabet of S"""
result = set()
for s in S:
for a in s:
result.add(a)
return result
def prefixes(S):
"""Finds all prefixes in S
Input: a set of strings: S
Output: the set of all prefixes of S"""
result = set()
for s in S:
for i in xrange(len(s) + 1):
result.add(s[:i])
return result
def suffixes(S):
"""Finds all suffixes in S
Input: a set of strings: S
Output: the set of all suffixes of S"""
[Inhoud]
II.
Bestig vond Dirk dat. Die Kees, wâ die kerel tog werke
kon. Allegoar knapte ie hullie op. Verduufeld aa’s tie
nie twee Piets en twee Dirke veur sain rekening nam.
Dá’ most ie tog eerlijk segge!
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com