On the Multiset of Factors of a String

De Bruijn graph of order 2 and 3

De Bruijn graph G 3 and an Eulerian walk E 16

Two equivalence classes of Eulerian cycles with respect to

Rauzy multigrpah G 2 u

An example of a string v appears before w in u

DFA that separates 0010 from 0100

Organization of the Thesis

What is the maximum cardinality of the set of words equivalent in length where k = bnc2 and k = 2. What is the structure and minimum length of pairs of strings having the same k-precedence data.

Figure 2.1: The 3-tuple notation of string u represented by T 3 (u).

Effects Due to k-transformations

Letu≡k u0 so that u0 is obtained from u by applying any one of the k-tr1, k-tr2 ork-rt2 transformation. Assume that the strings u and u0 are of the form as given in either Definition 2.2 or 2.3.

Background

De Bruijn Graph and Rauzy Multigraph

We also get back the De Bruijn graph Gk of any binary De Bruijn series of order (k + 1) (without removing the k-length suffix). If u is a binary De Bruijn series of order (k + 1) (without removing the k-length suffix), then Gk+1u is a De Bruijn graph of order k.

Words with Given Adjacency Patterns

Now if u is an arbitrary string in Σ∗, then having an Eulerian walk from Gk+1u will give us a string v that has the same length as u. This is a way to reconstruct you and is the ideal version of DNA Sequencing By Hybridization (SBH). This idea is discussed in [39]. for listing the number of Eulerian circuits in a graph is known as BEST theorem. A multidigraph D has an Eulerian path but not an Eulerian cycle if and only if D is weakly connected to isolated vertices and for all vertices, with the possible exception of two, the in-degree is equal to the out-degree, and for those two vertices ((in-degree)−(out-degree)) will be exactly 1 and −1.

Finding the number of elements in the equivalence class generated by the relation ≡k (introduced in 2.1), i.e. Hutchinson and Wilf used the BEST theorem to find the number of strings that have the same set of symbols with adjacency information. In the next section, we will show that their result is useful in finding|M[1,k](u)|directly.

Euler Path The above two theorems guarantee that all Euler paths in a multigraph will start at the same node and end at the same node. The condition suffk−1(u)6=prefk−1(u) forces the Euler walk in Guk to be an Euler path and not an Euler cycle. This also constrains each Euler path to start at a node induced by prefk−1(u) and end at a node induced by suffk−1(u).

A set of Euler cycles that are rotations of each other contains only one Euler cycle that will correspond to the reconstruction from m[1,k](u). Similarly, mk(u) expresses Q in Theorem 3.2, since information about the adjacency of (k−1)-length factors u with multiplicity is embedded in mk(u).

Figure 3.3: Two equivalence classes of Eulerian cycles with respect toa rotation.

Asymptotic value of max

Thus, we have for each binary string u, the largest number of reconstructions from m[1,2](u) will be in Θ 2nn. So only 5 possible permutations of symbols in the string u are possible candidates to be a member of M[1,p](u). Therefore, the only possible option for the existence of u, v and x under the given conditions is x=π2(u) with all three binary bits.

The later part has already been taken care of and we have consisted of different ones. So, we found that whenever we choose three different stringsu, v and x of length 2p+ 1 such that m[1,p](u) = m[1,p](v) = m[1,p]( x) then there are only two types of permutations of symbols ofuare allowed to bev and x. All three strings also had to be of an alphabet of size 2 to be in those shapes.

Lower bound of max

From table 3.6 we have many sets of factors. m−1)-length factor multiplicity of u can be considered as. P in Theorem 3.2, and the proximity information described by Q in Theorem 3.2, can be found in multiples of the factor m of length u. A generalization of k-abelian word equivalence was recently made in [44] where the word composition based on factors is given by the set of all distributed subwords of length ≤m.

We consider yet another extension of the idea of k-abelian equivalence where factors of length k not only switch between themselves to give a different equivalent word, but also keep their relative positions intact until taking into account multiplications of occurrences in array in order. couples. Rubinov and Gelfand [49] investigated a version of this problem where the goal is to reconstruct a single string. The problem can also be visualized as a generalization of topological ordering where instead of a DAG (Directed Acyclic Graph) we are provided with one.

Table 3.1: Multiset of m and (m − 1)-length factors of (0 2m 1) m 0 2m

Definitions and Notations

Properties of Factor Precedence Multiset

The ordered pair (p, q) can be in the string nv in the form pandbqask length factors, where appears before bq for all possiblea, b∈Σ. The ordered pair (p, q) can be in the stringnv in the format-andbqask-length factors, whereas appears before bq for all possiblea, b∈Σ. In that case, the number of occurrences of (p, q) is missed the same number of times q appears in v without being a prefix, which is.

The ordered pair (p, q) can be in the stringv in the form pandqbask-length factors of which before qb appears for all possiblea, b∈Σ. In that case, the number of occurrences of (p, q) missed is the same number of times p appears in v without being a suffix, which is. The ordered pair (p, q) can be in the stringv in the format and qbask-length factors where ap appears before qb for all possible a, b∈Σ.

Figure 4.2: Examples of finding OP v k−1 (p, q) from OP v k .

Reconstruction of u from OP k (u)

An interesting question is whether there is a polynomial-time computation procedure for verifying that a given matrix represents the k-precedence data of a string. To deal with this problem, we need to look at the more fundamental problems of reconstructing a word from 1-precedent data. Answering this will immediately lead us to the method reconstruction of a string s fromOPk(s) in polynomial time.

In the case of an invalid matrix M with respect to a given k and S, the matrix M0, if constructed in the same way that OPk−1s was constructed, will be invalid with respect to tok−1 and S0, but can satisfy equation 4.1, where S0 is the lexicographically ordered set. of strings of length (k−1) constructed from S using Lemma 4.4. To summarize this section, if priority data is used as an additional check in the process of reconstructing a string from its multiset of factors with lengths up to tok, then we will get better performance. But if we start to reconstruct a string only from the k-priority data, exponential running time may arise, because checking the validity of a matrix from the context of k-priority data is not known to be possible in polynomial time.

Permutation in Multiset with the Same Precedence Data

If we assume that a given matrix M is valid and there is a string corresponding to M, then the matrix OPk−1s is formed using M following the result given by Lemma 4.4. The smallest pair of strings that can be obtained by applying transposition to another is of the form ab and ba. It can be easily verified that the smallest pair of distinct strings with the same precedence data is of the form abba and baab.

They allow bits of the string to be different but with the same priority data.

Smallest Pair of Strings with the Same k-Precedence Data

Now we have the definition of the k-mirror transposition, we need to establish that such transpositions will hold k-priority data of a string. In the case of the first and third forms in Lemma 4.14, we have y and y0 are in the form. From Lemma 4.16 and Lemma 4.17 we have at least 3k + 1 since the length of the string such an application of k-transposition to the string leads to different strings.

Although it turns out that for a given k the minimal string pairs with k precedence data are the same 3k+ 1 but we can find more than two pairs. In this chapter, we generalize the idea of separating words by modular factor composition and introduce preferred data on modular factor composition of a word. We show that preference data on l-modular factorization can be the same for two strings of lengthO(l4logn).

Table 4.1: Examples of string pairs (u, v) where OP 10 (u) = OP 10 (v) and |u| = |v| = 31.

Separating Words Problem and its Motivation

The well-known word separation problem is finding the smallest DFA to distinguish between two words of length ≤n. The word separation problem is the inverse of the classical problem where we search for the smallest string that can distinguish two DFAs, for example with m and n states, which can be done by adding a string of length m+n −1 [51]. 8] have also studied the separation of words by non-deterministic finite automata and context-free grammar, respectively.

In [51], the improvement of the upper bound O(n2/5log3/5n) for word separation by a deterministic finite automaton is mentioned as an open problem. Vyalyi and Gimadeev [57] generalized the idea and used the modular factor composition of words and their scheme gives a lower bound for l which must be Ω (n13 log−13 n), necessary to have a different l-modular factor composition between each pair of n-length strings. In the next section we will discuss the modular factor composition of a word in a broader perspective.

Modular Factor Composition of a Word

1 is in position with index congruent with (a+q)p−1 mod in w Solution for x in linear congruence px= (a+q) mod d,. The following DFA accepts only those strings that have been a factor at positions congruent to a mod d with frequency congruent toc modb. When the DFA encounters an instance of u, the transition function takes the execution of the DFA to the next level modulo b.

The execution reaches statess(i, k) and ensures that u appears in the string at a position congruent with a mod d. The next occurrence of u at a position congruent with a mod d may not need the transition function to start from the q part of the next level due to self-overlap and a small period of u. Next, in Figure 5.2 we see that the DFA accepts the strings in which λ occurs times as a factor at positions congruent with a mod d, where λ is congruent with cmod b.

Precedence data in Modular factor composition

Remarks

We have not found an upper bound online in terms of n, so that for any two stringsu and v of length n we will have premodl(u)6= premodl(u). Any improvement in the upper bound can give us the improvement of the word separation problem with DFA. Apart from the relation to the word separation problem, the modl(u) and premodl(u) structures we introduced have interesting properties related to the cableian equivalence of words.

Discrepancy of Hypergraph for Partial Coloring

Definitions and Notations
Relation with Modular Composition
Generalization to Arbitrary Alphabet
Remark

So there is no known lower bound on the number of hyper-edges in a non-zero mismatch hypergraph for partial 2-coloring. We have also shown the relationship between the number of different fuzzy reconstructions using these two methods. For example McKay and Robinson in [35] found the asymptotic number of Eulerian cycles in the complete graph.

We can view m[1,k] as different factors up to length k of the string u and the number of ways each of them can be in the string u as factors. By the factor precedence data Byk of a string u we mean the different ordered pairs of factors of length k of the string u and for each of them the number of different ways that the first component appears as a factor before the second component as a factor in the string. u. Is there any upper bound better than O(n25 log35(n)) on the smallest number of states in a DFA that separates any pair of strings of length n.