Decision trees for regular factorial languages
Item Type Article
Authors Moshkov, Mikhail
Citation Moshkov, M. (2022). Decision trees for regular factorial languages.
Array, 15, 100203. https://doi.org/10.1016/j.array.2022.100203 Eprint version Publisher's Version/PDF
DOI 10.1016/j.array.2022.100203
Publisher Elsevier BV
Journal Array
Rights Β© 2022. The Author(s). Published by Elsevier Inc. This is an open access article under the CC-BY-NC-ND 4.0 license http://
creativecommons.org/licenses/by-nc-nd/4.0/
Download date 2023-12-24 21:12:58
Item License https://creativecommons.org/licenses/by-nc-nd/4.0/
Link to Item http://hdl.handle.net/10754/674913
Array 15 (2022) 100203
Available online 8 June 2022
2590-0056/Β© 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by- nc-nd/4.0/).
Contents lists available atScienceDirect
Array
journal homepage:www.elsevier.com/locate/array
Decision trees for regular factorial languages
Mikhail Moshkov
Computer, Electrical and Mathematical Sciences and Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
A R T I C L E I N F O
Keywords:
Regular factorial language Recognition problem Membership problem Deterministic decision tree Nondeterministic decision tree
A B S T R A C T
In this paper, we study arbitrary regular factorial languages over a finite alphabetπ΄. For the set of words πΏ(π)of the lengthπbelonging to a regular factorial languageπΏ, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word fromπΏ(π), we should recognize it using queries each of which, for some πβ {1,β¦, π}, returns theπth letter of the word. In the case of membership problem, for a given word over the alphabetπ΄of the lengthπ, we should recognize if it belongs to the setπΏ(π)using the same queries. For a given problem and type of trees, instead of the minimum depthβ(π)of a decision tree of the considered type solving the problem forπΏ(π), we study the smoothed minimum depthπ»(π) = max{β(π) βΆπβ€π}. With the growth ofπ, the smoothed minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. For other cases (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth ofπ, the smoothed minimum depth of decision trees is either bounded from above by a constant or grows linearly. As corollaries of the obtained results, we study joint behavior of smoothed minimum depths of decision trees for the considered four cases and describe five complexity classes of regular factorial languages. We also investigate the class of regular factorial languages over the alphabet{0,1}each of which is given by one forbidden word.
1. Introduction
In this paper, we study arbitrary regular factorial languages over a finite alphabetπ΄. A factorial language satisfies the following condition:
if a wordπ€1π’π€2belongs to the language, then the wordπ’also belongs to it. For the set of wordsπΏ(π)of the lengthπbelonging to a regular factorial languageπΏ, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from πΏ(π), we should recognize it using queries each of which, for someπβ {1,β¦, π}, returns theπth letter of the word. In the case of membership problem, for a given word over the alphabetπ΄of the length π, we should recognize if it belongs to πΏ(π) using the same queries.
For a given problem (problem of recognition or membership prob- lem) and type of trees (solving the problem deterministically or non- deterministically), instead of the minimum depth β(π)of a decision tree of the considered type solving the problem forπΏ(π), we study the smoothed minimum depthπ»(π) = max{β(π) βΆπβ€π}.
For an arbitrary regular factorial language, with the growth of π, the smoothed minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a
E-mail address: [email protected].
constant, or grows as a logarithm, or linearly. These results follow immediately from more general, obtained in [1] for arbitrary regular languages.
For other cases (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership prob- lem deterministically and nondeterministically), with the growth of π, the smoothed minimum depth of decision trees is either bounded from above by a constant, or grows linearly. In the conference pa- per [2], a classification of arbitrary regular languages depending on the smoothed minimum depth of decision trees solving the problem of recognition nondeterministically was announced without proofs. In the present paper, we consider simpler classification for regular factorial languages with full proof. Results related to the decision trees solving the membership problem are new.
As corollaries of the obtained results, we study joint behavior of smoothed minimum depths of decision trees for the considered four cases and describe five complexity classes of regular factorial lan- guages. We also investigate the class of regular factorial languages over the alphabetπΈ= {0,1}each of which is given by one forbidden word.
A well-known approach to evaluate complexity of an infinite lan- guageπΏover a finite alphabetπ΄is to study its so-called combinatorial
https://doi.org/10.1016/j.array.2022.100203
Received 7 January 2022; Received in revised form 2 June 2022; Accepted 3 June 2022
2 complexity (known also as counting function)ππΏ(π)that is the number of words of the length π in πΏ [3,4]. The present paper proposes additional ways to evaluate the complexity of the languageπΏbased on the study how the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically depends on the length of words. This way is more complicated, but can give more detailed classification of languages. To show this, we compare languages generated by diagramsπΌ3andπΌ4depicted inFigs. 5 and6. For both languages, the counting function grows linearly. For the first language, the minimum depth of decision trees solving the problem of recognition deterministically grows as a logarithm, but for the second language, the minimum depth of decision trees solving the problem of recognition deterministically grows linearly.
We should mention a recent paper [5] in which similar results were obtained for languages over the alphabet πΈthat are subword-closed:
if a wordπ€1π’1π€2β―π€ππ’ππ€π+1belongs to the language, then the word π’1β―π’πalso belongs to it.
It is clear that each subword-closed language is a factorial language.
Moreover, each subword-closed language over a finite alphabet is a regular language [6]. One can show that the language πΏ(00) over the alphabetπΈgiven by one forbidden word00is a regular factorial language, which is not subword-closed. Therefore the class of subword- closed languages over the alphabetπΈis a proper subclass of the class of regular factorial languages over the alphabetπΈ.
The main difference between the present paper and [5] is that, in the latter paper, we do not assume that the subword-closed lan- guages are given by deterministic finite automata. Instead of this, we describe simple criteria (based on the presence in the language of words of special types) for the behavior of the minimum depths of decision trees solving the problem of recognition deterministically and nondeterministically. Differently formulated criteria for the behavior of the minimum depth of decision trees solving the recognition problem require very different proofs. One more difference is that in [5] we directly consider the minimum depth of decision trees.
The rest of the paper is organized as follows. In Section 2, we consider main notions, in Section 3β main results, and in Section4 β two corollaries of these results.
2. Main notions
In this section, we discuss the notions related to regular facto- rial languages and decision trees solving problems of recognition and membership for these languages.
2.1. Regular factorial languages
Letπ= {0,1,2,β¦}be the set of nonnegative integers andπ΄be a finite alphabet with at least two letters. Byπ΄β, we denote the set of all finite words over the alphabetπ΄, including the empty wordπ. A word π€βπ΄βis called a factor of a wordπ’βπ΄βifπ’=π£1π€π£2andπ£1, π£2βπ΄β. A language πΏ β π΄β is called factorial if it contains all factors of its words. A word π€ β π΄βis called a minimal forbidden word forπΏif π€βπΏand all proper factors ofπ€belong toπΏ. We denote byπ πΉ(πΏ) the language of minimal forbidden words forπΏ. It is known [7] that a factorial languageπΏis regular if and only if the languageπ πΉ(πΏ)is regular. In particular, a factorial languageπΏwith a finite set of minimal forbidden wordsπ πΉ(πΏ)is regular. In this paper, we study arbitrary nonempty regular factorial languages.
It is well known that each regular language can be represented by a deterministic finite automaton (DFA) [8]. As in [8], we will consider not only complete DFA with total transition function but also partial DFA with partial transition function. Such DFA can be represented by its transition diagram (diagram for short) [9].
A diagram over the alphabetπ΄ is a tripleπΌ = (πΊ, π0, π), whereπΊ is a finite directed graph, possibly with multiple edges and loops, in which each edge is labeled with a letter fromπ΄and edges leaving each
node are labeled with pairwise different letters,π0is a node ofπΊcalled starting, andπis a nonempty set of the graphπΊnodes called final.
A path of the diagramπΌis an arbitrary sequenceπ=π£1, π1,β¦, π£π, ππ, π£π+1of nodes and edges ofπΊsuch that the edgeππleaves the node π£πand enters the nodeπ£π+1forπ= 1,β¦, π. We now define a wordπ€(π) fromπ΄β in the following way: ifπ = 0, thenπ€(π) = π. Letπ > 0 and letπΏπ be the letter attached to the edge ππ, π = 1,β¦, π. Then π€(π) =πΏ1β―πΏπ. We say that the pathπgenerates the wordπ€(π). Note that different paths which start in the same node generate different words.
We denote byπ―(πΌ) the set of all paths of the diagramπΌ each of which starts in the nodeπ0and finishes in a node fromπ. Let πΏπΌ = {π€(π) βΆπβπ―(πΌ)}.
We say that the diagramπΌgenerates the languageπΏπΌ. It is well known thatπΏπΌ is a regular language.
The diagramπΌ is called complete over the alphabet π΄ if exactly
|π΄| edges leave each node of πΊ. Note that these edges are labeled with pairwise different letters fromπ΄. Such diagram corresponds to a complete DFA [8]. The diagramπΌis called reduced if, for each node of πΊ, there exists a path fromπ―(πΌ), which contains this node. Such diagram corresponds to a reduced DFA [8]. It is known [8] that, for each regular language over the alphabetπ΄, there exists a complete over the alphabetπ΄diagram, which generates this language. Therefore, for each nonempty regular language, there exists a reduced diagram, which generates this language.
Let πΏ be a regular factorial language and πΌ = (πΊ, π0, π) be a reduced diagram that generates the languageπΏ. Since the languageπΏis factorial, we can assume additionally that each node of the graphπΊis final β it will not change the language generated byπΌsince with each word the languageπΏcontains each prefix of this word. The diagram πΌwill be called f-reduced if it is reduced and each node of the graph πΊis final. Further we will assume that a considered regular factorial languageπΏis nonempty and it is given by an f-reduced diagram, which generates this language.
We will not consider nondeterministic finite automata (NFA) to rep- resent regular factorial languages since the study of NFA is essentially more complicated task.
2.2. Decision trees for recognition and membership problems
Let πΏ be a regular factorial language over the alphabet π΄. For any naturalπ, denoteπΏ(π) = πΏβ©π΄π, where π΄π is the set of words over the alphabet π΄, which length is equal to π. We consider two problems related to the set πΏ(π). The problem of recognition: for a given word fromπΏ(π), we should recognize it using attributes (queries) ππ
1,β¦, πππ, whereππ
π,πβ {1,β¦, π}, is a function fromπ΄πtoπ΄such that πππ(π1β―ππ) =ππfor any wordπ1β―ππβπ΄π. The problem of membership:
for a given word fromπ΄π, we should recognize if this word belongs to the setπΏ(π)using the same attributes. To solve these problems, we use decision trees overπΏ(π).
A decision tree overπΏ(π)is a marked finite directed tree with root, which has the following properties:
β’ The root and the edges leaving the root are not labeled.
β’ Each node, which is not the root nor terminal node, is labeled with an attribute from the set{ππ1,β¦, πππ}.
β’ Each edge leaving a node, which is not a root, is labeled with a letter from the alphabetπ΄.
A decision tree overπΏ(π)is called deterministic if it satisfies the following conditions:
β’ Exactly one edge leaves the root.
β’ For any node, which is not the root nor terminal node, the edges leaving this node are labeled with pairwise different letters.
Fig. 1. Decision trees that solve the problem of recognition for the set of words {100,010,001}deterministically and nondeterministically.
Let π€ be a decision tree over πΏ(π). A complete path in π€ is any sequenceπ=π£0, π0,β¦, π£π, ππ, π£π+1of nodes and edges ofπ€ such that π£0 is the root,π£π+1is a terminal node, andπ£πis the initial andπ£π+1is the terminal node of the edge ππ forπ= 0,β¦, π. We define a subset π΄(π, π)of the setπ΄πin the following way: ifπ= 0, thenπ΄(π, π) =π΄π. Letπ >0, the attributeππ
ππ be attached to the nodeπ£π, andππ be the letter attached to the edgeππ,π= 1,β¦, π. Then
π΄(π, π) = {π1β―ππβπ΄πβΆππ
1=π1,β¦, ππ
π =ππ}.
Let πΏ(π) β β . We say that a decision tree π€ over πΏ(π)solves the problem of recognition forπΏ(π)nondeterministically ifπ€ satisfies the following conditions:
β’Each terminal node ofπ€ is labeled with a word fromπΏ(π).
β’For any wordπ€βπΏ(π), there exists a complete pathπin the tree π€ such thatπ€βπ΄(π, π).
β’For any wordπ€βπΏ(π)and for any complete pathπin the treeπ€ such thatπ€βπ΄(π, π), the terminal node of the pathπis labeled with the wordπ€.
We say that a decision treeπ€ overπΏ(π)solves the problem of recog- nition forπΏ(π) deterministically ifπ€ is a deterministic decision tree, which solves the problem of recognition forπΏ(π)nondeterministically.
Examples of decision trees illustrating the considered notions are presented inFig. 1.
We say that a decision tree π€ over πΏ(π) solves the problem of membership for πΏ(π)nondeterministically ifπ€ satisfies the following conditions:
β’Each terminal node ofπ€ is labeled with a number from the set {0,1}.
β’For any wordπ€βπ΄π, there exists a complete pathπin the tree π€ such thatπ€βπ΄(π, π).
β’For any wordπ€βπ΄πand for any complete pathπin the treeπ€ such thatπ€βπ΄(π, π), the terminal node of the pathπis labeled with the number1ifπ€βπΏ(π)and with the number0, otherwise.
We say that a decision treeπ€overπΏ(π)solves the problem of mem- bership forπΏ(π)deterministically ifπ€ is a deterministic decision tree which solves the problem of membership forπΏ(π)nondeterministically.
Letπ€be a decision tree overπΏ(π). We denote byβ(π€)the maximum number of nodes in a complete path in π€ that are not the root nor terminal node. The valueβ(π€)is called the depth of the decision tree π€.
We denote byβπππΏ(π)(βπππΏ(π)) the minimum depth of a decision tree over πΏ(π), which solves the problem of recognition forπΏ(π) nonde- terministically (deterministically). IfπΏ(π) = β , thenβππ
πΏ(π) = βππ
πΏ(π) = 0.
We denote by βππ
πΏ (π) (βπππΏ (π)) the minimum depth of a decision tree over πΏ(π), which solves the problem of membership for πΏ(π) nondeterministically (deterministically). If πΏ(π) = β , then βππ
πΏ (π) = βππ
πΏ (π) = 0.
3. Bounds on decision tree depth
Let πΏ be a nonempty factorial regular language. In this section, we consider the behavior of four functionsπ»ππ
πΏ,π»ππ
πΏ, π»ππ
πΏ , andπ»ππ
πΏ
defined on the setπβ§΅{0}and with values fromπ. For any naturalπ, π»ππ
πΏ(π) = max{βπππΏ(π) βΆ 1β€πβ€π}, π»ππ
πΏ(π) = max{βπππΏ(π) βΆ 1β€πβ€π}, π»ππ
πΏ (π) = max{βπππΏ(π) βΆ 1β€πβ€π}, π»ππ
πΏ (π) = max{βπππΏ (π) βΆ 1β€πβ€π}.
For any pairππβ {ππ, ππ, ππ, ππ}, the functionπ»ππ
πΏ(π)is a smoothed analog of the functionβππ
πΏ(π).
3.1. Decision trees solving recognition problem deterministically
LetπΌ = (πΊ, π0, π)be a f-reduced diagram over the alphabet π΄. A path of the diagramπΌ is called a cycle of the diagramπΌ if there is at least one edge in this path, and the first node of this path is equal to the last node of this path. A cycle of the diagramπΌis called elementary if nodes of this cycle, with the exception of the last node, are pairwise different.
The diagramπΌ is called simple if every two different elementary cycles of the diagramπΌdo not have common nodes. LetπΌbe a simple diagram and π be a path of the diagramπΌ. The number of different elementary cycles of the diagramπΌ, which have common nodes with π, is denoted byππ(π)and is called the cyclic length of the pathπ. The value
ππ(πΌ) = max{ππ(π) βΆπβπ―(πΌ)}
is called the cyclic length of the diagramπΌ.
LetπΌbe a simple diagram,πΆbe an elementary cycle of the diagram πΌ, andπ£be a node of the cycleπΆ. Beginning with the nodeπ£, the cycle πΆgenerates an infinite periodic word over the alphabetπ΄. This word will be denoted byπ(πΌ , πΆ, π£). We denote byπ(πΌ , πΆ, π£)the minimum period of the word π(πΌ , πΆ, π£). The diagram πΌ is called dependent if there exist two different elementary cyclesπΆ1 andπΆ2 of the diagram πΌ, nodesπ£1andπ£2 of the cyclesπΆ1andπΆ2, respectively, and a pathπ of the diagramπΌfromπ£1toπ£2, which satisfy the following conditions:
π(πΌ , πΆ1, π£1) =π(πΌ , πΆ2, π£2)and the length of the pathπis a number di- visible byπ(πΌ , πΆ1, π£1). If the diagramπΌis not dependent, then it is called independent. Next theorem follows immediately from Theorem 2.1 [1], which is a similar statement that holds for all regular languages.
Theorem 1. Let πΏbe a nonempty regular factorial language over the alphabetπ΄andπΌbe a f-reduced diagram, which generates the languageπΏ.
Then the following statements hold:
(a) IfπΌis an independent simple diagram andππ(πΌ)β€1, thenπ»ππ
πΏ(π) = π(1).
(b) IfπΌis an independent simple diagram andππ(πΌ)β₯2, thenπ»ππ
πΏ(π) = π©(logπ).
(c) IfπΌis not independent simple diagram, thenπ»ππ
πΏ(π) =π©(π).
3.2. Decision trees solving recognition problem nondeterministically LetπΏbe a nonempty regular factorial language over the alphabet π΄. For any naturalπ, we define a parameterππΏ(π)of the languageπΏ.
IfπΏ(π) = β , thenππΏ(π) = 0. LetπΏ(π)β β ,π€ = π1β―ππ β πΏ(π), and π½ β {1,β¦, π}. DenoteπΏ(π€, π½) = {π1β―ππ β πΏ(π) βΆ ππ = ππ, π β π½} (if π½ = β , then πΏ(π€, π½) = πΏ(π)) and ππΏ(π, π€) = min{|π½| βΆ π½ β {1,β¦, π},|πΏ(π€, π½)|= 1}. Then
ππΏ(π) = max{ππΏ(π, π€) βΆπ€βπΏ(π)}.
Note that, for any wordπ€βπΏ(π),ππΏ(π, π€)is the minimum number of letters of the wordπ€, which allow us to distinguish it from all other words belonging toπΏ(π).
4 Lemma 2. Let πΏ be a nonempty regular factorial language over the alphabetπ΄. Thenβππ
πΏ(π) =ππΏ(π)for any naturalπ.
Proof. First, we prove thatβππ
πΏ(π)β₯ππΏ(π). Letπ€be a decision tree over πΏ(π), which solves the problem of recognition forπΏ(π)nondeterministi- cally and for whichβ(π€) =βππ
πΏ(π). Letπ€be a word fromπΏ(π)for which ππΏ(π) =ππΏ(π, π€). Then the decision treeπ€ contains a complete pathπ such thatπ€βπ΄(π, π)and the terminal node of the pathπis labeled with the wordπ€. It is clear thatπ΄(π, π) β©πΏ(π) = {π€}. Letπcontainπnodes that are not the root nor terminal node and πππ
1,β¦, πππ
π be attributes attached to these nodes. Denoteπ½ = {π1,β¦, ππ}. ThenπΏ(π€, π½) = {π€}.
Therefore π β₯ ππΏ(π, π€) = ππΏ(π). It is clear that β(π€) β₯ π. Thus, βππ
πΏ(π) =β(π€)β₯πβ₯ππΏ(π, π€) =ππΏ(π).
We now prove that βππ
πΏ(π) β€ ππΏ(π). One can show that, for each π€ β πΏ(π), we can construct a complete pathππ€, which satisfies the following conditions: the number of nodes inππ€that are not the root nor terminal node is equal toππΏ(π, π€),π΄(π, ππ€) β©πΏ(π) = {π€}, and the terminal node ofππ€is labeled with the wordπ€. If we merge roots of all pathsππ€,π€βπΏ(π), we obtain a decision tree, which solves the problem of recognition forπΏ(π)nondeterministically and which depth is equal toππΏ(π). Thus,βππ
πΏ(π)β€ππΏ(π)andβππ
πΏ(π) =ππΏ(π). β‘
Theorem 3. LetπΏ be a nonempty regular factorial language over the alphabetπ΄andπΌ= (πΊ, π0, π)be a f-reduced diagram, which generates the languageπΏ. Then the following statements hold:
(a) IfπΌis an independent simple diagram, thenπ»ππ
πΏ(π) =π(1).
(b) IfπΌis not independent simple diagram, thenπ»ππ
πΏ(π) =π©(π).
Proof. (a) LetπΌbe an independent simple diagram andππ(πΌ)β€1. By Theorem 1,π»ππ
πΏ(π) =π(1). It is clear thatπ»ππ
πΏ(π)β€π»ππ
πΏ(π). Therefore π»πΏππ(π) =π(1).
LetπΌ be an independent simple diagram andππ(πΌ)β₯2. Letπbe a natural number. IfπΏ(π) = β , thenππΏ(π) = 0. LetπΏ(π)β β . Denote byπ the number of nodes in the graphπΊ. In the proof of Lemma 4.5 [1], it was proved thatππΏ(π, π€)β€π(4π+ 1)for any wordπ€βπΏ(π). Therefore ππΏ(π)β€π(4π+ 1). Thus, byLemma 2,βπππΏ(π)β€π(4π+ 1)for any natural πandπ»ππ
πΏ(π) =π(1).
(b) LetπΌbe not simple diagram andπΆ1, πΆ2be different elementary cycles of the diagramπΌ, which have a common nodeπ£. Since πΌ is a f-reduced diagram, it contains a pathπfrom the nodeπ0to the node π£, and π£is a final node. Let the length of the path π be equal to π, the length of the cycleπΆ1 be equal toπ, and the length of the cycle πΆ2 be equal toπ. LetπΌbe the word generated by the pathπ,π½be the word generated by a path fromπ£toπ£obtained by the passageπtimes along the cycleπΆ1, andπΎbe the word generated by a path fromπ£toπ£ obtained by the passageπtimes along the cycleπΆ2. The wordsπ½andπΎ are different and they have the same lengthππ.
Consider the sequence of numbers ππ = π+πππ, π = 1,2,β¦. Let πβπβ§΅{0}. The setπΏ(ππ)contains the wordπΌπΎπand the wordsπΌπΎππ½πΎπβπβ1 forπ= 0,β¦, πβ 1. It is easy to show thatππΏ(ππ, πΌπΎπ)β₯π: to distinguish the wordπΌπΎπfrom the wordsπΌπΎππ½πΎπβπβ1,π= 0,β¦, πβ 1, we need to use at least one letter from each ofπwordsπΎappearing inπΌπΎπ. Therefore ππΏ(ππ) β₯ πand, byLemma 2, βππ
πΏ(ππ) β₯ π = (ππβπ)β(ππ). Letπ β₯ π1 and letπbe the maximum natural number such thatπβ₯ππ. Evidently, πβππ β€ ππ. Hence π»ππ
πΏ(π) β₯ βππ
πΏ(ππ) β₯ (πβππβπ)β(ππ). Therefore π»ππ
πΏ(π) β₯ πβ(2ππ) for large enough π. The inequality π»ππ
πΏ(π) β€ π is obvious. Thus,π»ππ
πΏ(π) =π©(π).
LetπΌbe a dependent simple diagram. Then there exist two different elementary cyclesπΆ1andπΆ2of the diagramπΌ, nodesπ£1andπ£2 of the cyclesπΆ1andπΆ2, respectively, and a pathπof the diagramπΌfromπ£1to π£2, which satisfy the following conditions:π(πΌ , πΆ1, π£1) =π(πΌ , πΆ2, π£2) and the length of the pathπis a number divisible byπ(πΌ , πΆ1, π£1). Let us remind that, forπ= 1,2,π(πΌ , πΆπ, π£π)is the infinite periodic word over the alphabetπ΄generated by the cycleπΆπbeginning with the nodeπ£π, andπ(πΌ , πΆ1, π£1)is the minimum period of the wordπ(πΌ , πΆ1, π£1). Since
Fig. 2.DiagramπΌ0.
πΌis a f-reduced diagram, it contains a pathπfrom the nodeπ0 to the nodeπ£1, and all nodes of the graphπΊare final. Let the pathπgenerate the wordπΌof the lengthπ. Denoteπ=π(πΌ , πΆ1, π£1). Let the length of the cycleπΆ1 be equal toππ, the length of the pathπbe equal toππ, and the pathπgenerate the wordπ½. Denote byπΎ the prefix of the lengthπ of the wordπ(πΌ , πΆ1, π£1). We now define two words of the lengthπππ:
π’=πΎππ andπ€=π½πΎπ(πβ1). It is clear thatπ’β π€.
Consider the sequence of numbersππ = π+ππππ, π = 1,2,β¦. Let πβπβ§΅{0}. The setπΏ(ππ)contains the wordπΌπ’πand the wordsπΌπ’ππ€π’πβπβ1 forπ= 0,β¦, πβ 1. It is easy to show thatππΏ(π, πΌπ’π)β₯π: to distinguish the wordπΌπ’πfrom the wordsπΌπ’ππ€π’πβπβ1,π= 0,β¦, πβ 1, we need to use at least one letter from each ofπwordsπ’appearing inπΌπ’π. Therefore ππΏ(ππ) β₯ πand, byLemma 2,βππ
πΏ(ππ)β₯ π = (ππβπ)β(πππ). Letπβ₯ π1 and letπbe the maximum natural number such thatπβ₯ππ. Evidently, πβππ β€ πππ. Hence π»ππ
πΏ(π) β₯ βππ
πΏ(ππ) β₯ (πβπππβπ)β(πππ). Therefore π»ππ
πΏ(π) β₯ πβ(2πππ) for large enoughπ. The inequalityπ»ππ
πΏ(π) β€ πis obvious. Thus,π»ππ
πΏ(π) =π©(π). β‘
Note that in general case (when we consider not only factorial languages) the classification of reduced diagrams depending on the minimum depth of decision trees solving the problem of recognition nondeterministically is more complicated [2]. In particular, there exists a dependent simple reduced diagramπΌ0 (seeFig. 2) with the starting node labeled with the symbol+and the unique final node labeled with the symbolβthat generates the regular languageπΏ0= {0π10πβΆπ, πβπ}
over the alphabet{0,1}, which is not factorial and for whichπ»ππ
πΏ0(π) = π(1).
3.3. Decision trees solving membership problem
For a regular factorial languageπΏ, the notation|πΏ|= βmeans that πΏis an infinite language, and the notation|πΏ|<βmeans thatπΏis a finite language.
Theorem 4. LetπΏbe a regular factorial language over the alphabetπ΄.
(a) If|πΏ|= βandπΏβ π΄β, thenπ»ππ
πΏ (π) =π©(π)andπ»ππ
πΏ (π) =π©(π).
(b) If|πΏ|<βorπΏ=π΄β, thenπ»ππ
πΏ (π) =π(1)andπ»ππ
πΏ (π) =π(1).
Proof. It is clear thatβπππΏ (π)β€βππ
πΏ (π)for any naturalπ.
(a) Let|πΏ|= β,πΏβ π΄β, andπ€0be a word with the minimum length fromπ΄ββ§΅πΏ. Denote byπ‘the length ofπ€0. Since|πΏ|= β,πΏ(π)β β for any naturalπ. Letπbe a natural number such thatπ > π‘andπ€ be a decision tree overπΏ(π)that solves the problem of membership forπΏ(π) nondeterministically and has the minimum depth. Letπ€βπΏ(π)andπ be a complete path inπ€ such thatπ€βπ΄(π, π). Then the terminal node ofπ is labeled with the number1. Beginning with the first letter, we divide the wordπ€intoβπβπ‘βblocks withπ‘letters in each and the suffix of the lengthπβπ‘βπβπ‘β. Let us assume that the number of nodes labeled with attributes inπis less thanβπβπ‘β. Then there is a block such that queries (attributes) attached to nodes ofπ does not ask about letters from the block. We replace this block in the wordπ€with the word π€0 and denote byπ€β²the obtained word. It is clear that π€β² β πΏand π€β²βπ΄(π, π), but this is impossible since the terminal node of the path π is labeled with the number 1. Therefore the depth ofπ€ is greater than or equal toβπβπ‘β. Thus,βππ
πΏ (π)β₯βπβπ‘β. It is easy to construct a decision tree overπΏ(π)that solves the problem of membership forπΏ(π) deterministically and has the depth equals toπ. Thereforeβππ
πΏ (π)β€π.
Thus,π»ππ
πΏ (π) =π©(π)andπ»ππ
πΏ (π) =π©(π).
Table 1
Complexity classesξ²1,β¦,ξ²5.
πΌis independent ππ(πΌ) πΏπΌ π»ππ
πΏπΌ
π»ππ
πΏπΌ
π»ππ
πΏπΌ
π»ππ
πΏπΌ
simple diagram
ξ²1 Yes = 0 π(1) π(1) π(1) π(1)
ξ²2 Yes = 1 π(1) π(1) π©(π) π©(π)
ξ²3 Yes β₯2 π©(logπ) π(1) π©(π) π©(π)
ξ²4 No β π΄β π©(π) π©(π) π©(π) π©(π)
ξ²5 No =π΄β π©(π) π©(π) π(1) π(1)
(b) Let|πΏ|<β. Then there exists naturalπsuch thatπΏ(π) = β for any naturalπβ₯π. Therefore, for each naturalπβ₯π,βπππΏ (π) = 0and βππ
πΏ (π) = 0. Thus,π»ππ
πΏ (π) =π(1)andπ»ππ
πΏ (π) =π(1).
LetπΏ=π΄β,πbe a natural number, andπ€ be a decision tree over πΏ(π), which consists of the root, a terminal node labeled with1, and an edge that leaves the root and enters the terminal node. One can show thatπ€solves the problem of membership forπΏ(π)deterministically and has the depth equals to0. Thereforeβππ
πΏ (π) = 0andβππ
πΏ (π) = 0. Thus, π»πΏππ(π) =π(1)andπ»πΏππ(π) =π(1). β‘
4. Corollaries
In this section, we consider two corollaries ofTheorems 1,3, and4.
4.1. Joint behavior of functionsπ»ππ
πΏ,π»ππ
πΏ,π»ππ
πΏ , andπ»ππ
πΏ
In this section, we assume that each regular factorial language over the alphabetπ΄is given by a f-reduced diagramπΌ, which generates the considered language denoted byπΏπΌ. To study all possible types of joint behavior of functionsπ»ππ
πΏπΌ,π»ππ
πΏπΌ,π»ππ
πΏπΌ, andπ»ππ
πΏπΌ, we consider five classes of regular factorial languagesξ²1,β¦,ξ²5 described in the columns 2β4 ofTable 1. In particular,ξ²1 consists of all regular factorial languages πΏπΌ for which the diagram πΌ is an independent simple diagram and ππ(πΌ) = 0. It is easy to show that the complexity classesξ²1,β¦,ξ²5are pairwise disjoint, and each regular factorial languageπΏπΌbelongs to one of these classes. The behavior of functionsπ»ππ
πΏπΌ,π»ππ
πΏπΌ,π»ππ
πΏπΌ, andπ»ππ
πΏπΌ
for languages from these classes is described in the last four columns of Table 1. For each class, the results considered inTable 1 for the functionsπ»ππ
πΏπΌ andπ»ππ
πΏπΌ follow directly fromTheorems 1and3.
We now consider the behavior of the functions π»ππ
πΏπΌ andπ»ππ
πΏπΌ for each of the classesξ²1,β¦,ξ²5. LetπΌ= (πΊ, π0, π)be a f-reduced diagram over the alphabetπ΄, which generates a regular factorial language.
LetπΏπΌ βξ²1. Sinceππ(πΌ) = 0,πΊis a directed acyclic graph, and the languageπΏπΌ is finite. UsingTheorem 4we obtainπ»πΏππ
πΌ(π) =π(1)and π»ππ
πΏπΌ(π) =π(1).
LetπΏπΌ βξ²2. Sinceππ(πΌ) = 1,πΊis a graph containing a cycle, and the languageπΏπΌis infinite. By Lemma 4.2 [1],|πΏπΌ(π)|=π(1). Therefore πΏπΌβ π΄β. UsingTheorem 4we obtainπ»ππ
πΏπΌ(π) =π©(π)andπ»ππ
πΏπΌ(π) =π©(π).
LetπΏπΌβξ²3. Sinceππ(πΌ)β₯2,πΊis a graph containing a cycle, and the languageπΏπΌis infinite. By Lemma 4.2 [1],|πΏπΌ(π)|=π(πππ(πΌ)). Therefore πΏπΌβ π΄β. UsingTheorem 4we obtainπ»πΏππ
πΌ(π) =π©(π)andπ»πΏππ
πΌ(π) =π©(π).
LetπΏπΌ βξ²4. SinceπΌis not an independent simple diagram,πΊis a graph containing a cycle, and the languageπΏπΌis infinite. We know that πΏπΌβ π΄β. UsingTheorem 4we obtainπ»ππ
πΏπΌ(π) =π©(π)andπ»ππ
πΏπΌ(π) =π©(π).
LetπΏπΌ βξ²5. ThenπΏπΌ =π΄β. UsingTheorem 4we obtainπ»ππ
πΏπΌ(π) = π(1)andπ»ππ
πΏπΌ(π) =π(1).
We now show that the classesξ²1,β¦,ξ²5are nonempty. For simplic- ity, we assume thatπ΄=πΈ, whereπΈ= {0,1}. It is easy to generalize the considered examples to the case of an arbitrary finite alphabetπ΄ with at least two letters. In the examples of diagrams, the starting node is labeled with the symbol+, and all nodes are final.
Denote byπΌ1 the diagram over the alphabetπΈdepicted inFig. 3.
One can show thatπΌ1is an independent simple f-reduced diagram and ππ(πΌ1) = 0. This diagram generates the languageπΏπΌ
1= {π,0}, which is factorial. ThereforeπΏπΌ
1βξ²1.
Fig. 3.DiagramπΌ1.
Fig. 4.DiagramπΌ2.
Fig. 5.DiagramπΌ3.
Fig. 6.DiagramπΌ4.
Fig. 7.DiagramπΌ5.
Denote byπΌ2 the diagram over the alphabetπΈdepicted inFig. 4.
One can show thatπΌ2 is an independent simple f-reduced diagram and ππ(πΌ2) = 1. This diagram generates the languageπΏπΌ
2 = {0π βΆ πβπ}, which is factorial. ThereforeπΏπΌ
2βξ²2.
Denote byπΌ3 the diagram over the alphabetπΈdepicted inFig. 5.
One can show thatπΌ3 is an independent simple f-reduced diagram and ππ(πΌ1) = 2. This diagram generates the languageπΏπΌ
3= {0π1π βΆπ, πβπ}, which is factorial. ThereforeπΏπΌ
3βξ²3.
Denote byπΌ4 the diagram over the alphabetπΈdepicted inFig. 6.
One can show thatπΌ4is a dependent simple f-reduced diagram generat- ing the languageπΏπΌ
4= {0π1π0πβΆπ, πβπ, πβ {0,1}}, which is factorial.
It is clear thatπΏπΌ
4β πΈβ. ThereforeπΏπΌ
4βξ²4.
Denote byπΌ5 the diagram over the alphabetπΈdepicted inFig. 7.
One can show thatπΌ5 is a f-reduced diagram that is not simple. This diagram generates the languageπΏπΌ
5=πΈβ, which is factorial. It is clear thatπΏπΌ
5=πΈβ. ThereforeπΏπΌ
5βξ²5.
A regular factorial language πΏ can have different f-reduced dia- grams, which generate it. However, for each of such diagramsπΌ, the languageπΏπΌ = πΏ will belong to the same complexity class. Let us assume the contrary: there exist a regular factorial language πΏ and two f-reduced diagrams πΌ1 and πΌ2, which generate it and for which languages πΏπΌ
1 and πΏπΌ
2 belong to different complexity classes. Then, for some pairππβ {ππ, ππ, ππ, ππ}, the functionsπ»ππ
πΏπΌ1
andπ»ππ
πΏπΌ2
have different behavior, but this is impossible sinceπ»ππ
πΏπΌ1
(π) =π»ππ
πΏπΌ2
(π)for any naturalπ.
6 Fig. 8. DiagramπΌ(0).
4.2. Languages over alphabet{0,1}given by one forbidden word Let πΈ = {0,1}, πΌ β πΈβ, and πΌ β π. We denote by πΏ(πΌ) the language over the alphabetπΈ, which consists of all words fromπΈβthat do not containπΌas a factor. This is a regular factorial language with π πΉ(πΏ(πΌ)) = {πΌ}. The following theorem indicates for each nonempty word πΌ β πΈβ the complexity class ξ²π to which the language πΏ(πΌ) belongs.
Theorem 5. LetπΌβπΈβandπΌβ π.
(a) IfπΌβ {0,1}, thenπΏ(πΌ) βξ²2. (b) IfπΌβ {01,10}, thenπΏ(πΌ) βξ²3. (c) IfπΌβ {0,1,01,10}, thenπΏ(πΌ) βξ²4.
We now describe a f-reduced diagramπΌ(πΌ)that generates the lan- guageπΏ(πΌ)for a nonempty wordπΌβπΈβ. LetπΌ=π1β―ππ,πΌ0=π, and πΌπ=π1β―ππforπ= 1,β¦, πβ 1. The setπ(πΌ) = {πΌ0, πΌ1,β¦, πΌπβ1}is the set of all proper prefixes of the wordπΌ. ThenπΌ(πΌ) = (πΊ, π0, π), where the set of nodes of the graphπΊis equal toπ(πΌ),π0=πΌ0, andπ=π(πΌ).
Forπ= 0,β¦, πβ 2, an edge leaves the nodeπΌπand enters the nodeπΌπ+1. This edge is labeled with the letter ππ+1. Forπ= 0,β¦, πβ 1, an edge leaves the nodeπΌπ and enters the nodeπΌπ βπ(πΌ)such thatπΌπ is the longest suffix of the wordπΌππΜπ+1, whereπΜπ+1= 0ifππ+1= 1andπΜπ+1= 1 ifππ+1= 0. This edge is labeled with the letterπΜπ+1. It is easy to show thatπΌ(πΌ)is a f-reduced diagram over the alphabetπΈ. From Theorem 10 [7] it follows that the diagramπΌ(πΌ)generates the languageπΏ(πΌ).
LetπΌβπΈββ§΅{π}andπΌ=π1β―ππ. We denote byπΌΜthe wordπΜ1β―πΜπ. It is easy to prove the following statement.
Lemma 6. LetπΌβπΈβandπΌβ π. Thenπ»πΏ(ΜπππΌ)(π) =π»πΏ(πΌ)ππ (π)for any pair ππβ {ππ, ππ, ππ, ππ}and any naturalπ.
Lemma 7. LetπΌβπΈββ§΅{π},π½βπΈβ, andπΏ(πΌ) βξ²4. ThenπΏ(πΌπ½) βξ²4. Proof. SinceπΏ(πΌ) βξ²4,π»ππ
πΏ(πΌ)(π) =π©(π)andπ»ππ
πΏ(πΌ)(π) =π©(π). One can show thatπΏ(πΌ)β πΏ(πΌπ½). Using this fact it is not difficult to prove that π»πΏ(πΌ)ππ (π)β€π»πΏ(πΌπ½)ππ (π)andπ»πΏ(πΌ)ππ (π)β€π»πΏ(πΌπ½)ππ (π)for any naturalπ. From here and fromTheorems 1and3it follows thatπ»ππ
πΏ(πΌπ½)(π) =π©(π)and π»πΏ(πΌπ½)ππ (π) =π©(π).
SinceπΌπ½βπΏ(πΌπ½),πΏ(πΌπ½)β πΈβ. The diagramπΌ(πΌπ½)contains at least one circle formed by the edge that leaves and enters the nodeπand is labeled with the letterπΜ1, where π1 is the first letter of the word πΌ. Therefore the languageπΏ(πΌπ½)is infinite. ByTheorem 4,π»ππ
πΏ(πΌπ½)(π) = π©(π)andπ»ππ
πΏ(πΌπ½)(π) =π©(π). Thus,πΏ(πΌπ½) βξ²4. β‘
Proof of Theorem 5. In each figure depicting a diagram πΌ(πΌ), πΌ β πΈββ§΅{π}, we label each node with a corresponding prefix of the word πΌ.
(a) The diagramπΌ(0)is depicted inFig. 8. This is an independent simple f-reduced diagram withππ(πΌ(0)) = 1. ThereforeπΏ(0) βξ²2. By Lemma 6,πΏ(1) βξ²2.
(b) The diagramπΌ(01)is depicted inFig. 9. This is an independent simple f-reduced diagram withππ(πΌ(01)) = 2. ThereforeπΏ(01) βξ²3. By Lemma 6,πΏ(10) βξ²3.
Fig. 9. DiagramπΌ(01).
Fig. 10. DiagramπΌ(00).
Fig. 11. DiagramπΌ(010).
Fig. 12. DiagramπΌ(011).
(c) The diagramπΌ(00)is depicted inFig. 10. This is not a simple diagram. It is clear thatπΏ(00)β πΈβ. ThereforeπΏ(00) βξ²4. ByLemma 6, πΏ(11) βξ²4. UsingLemma 7we obtainπΏ(000), πΏ(001), πΏ(110), πΏ(111) β
ξ²4.
The diagram πΌ(010) is depicted in Fig. 11. This is not a simple diagram. It is clear that πΏ(010) β πΈβ. Therefore πΏ(010) β ξ²4. By Lemma 6,πΏ(101) βξ²4.
The diagram πΌ(011) is depicted in Fig. 12. This is not a simple diagram. It is clear that πΏ(011) β πΈβ. Therefore πΏ(011) β ξ²4. By Lemma 6,πΏ(100) βξ²4.
We proved that, for any wordπΌβπΈβof the length three,πΏ(πΌ) βξ²4. UsingLemma 7we obtain that, for any word πΌ β πΈβof the length greater than or equal to four,πΏ(πΌ) βξ²4. β‘
Declaration of competing interest
The authors declare that they have no known competing finan- cial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Research reported in this publication was supported by King Abdul- lah University of Science and Technology (KAUST), Saudi Arabia. The