• Tidak ada hasil yang ditemukan

Chapter VII: Minimum Datalength for Integer Period Estimation

7.1 Introduction

Let us consider again the integer periodicity model that we started with in Chapter 1:

A signal x(n)is said to be periodic if

x(n+P)= x(n) (7.1)

for all nfor some integerP. The smallest such nonzero integerP is said to be the period ofx(n).

While this model has been very popular in many applications such as speech, ECG, EEG, protein and DNA repeats, there is one fundamental question which has surprisingly never been addressed before. Namely, given a sequencex(n)with integer period, what is the absolute lower bound on the data-length required to be able to identify its period? More generally, given a mixture of periodic signals, what is the absolute minimum data-length required to identify the periods of the hidden components? Notice that the bounds we seek are generic, i.e., independent of any particular technique we may choose to estimate the periods. We will also see that the definition of “hidden" integer periods is rather tricky, if we have to get unique and meaningful answers (Sec. 7.3). Some effort is therefore spent in this chapter to develop a formal definition, and also to study some interesting properties of hidden integer periods. It is rather surprising that none of these questions has been raised in the signal processing literature in the past.

Even though the bounds we seek are generic for the most part (i.e., independent of any particular technique we may choose to estimate the periods), we also address the special case of the dictionary based techniques reported in Chapter 3. For these specific methods the minimum number of samples is, not surprisingly, larger than the theoretical minimum. These method-dependent-bounds are also given in this chapter, in view of the practical usefulness of the dictionary based methods.

At the outset we would like to make it clear that development of faster and better methods for estimation of periods (integer or otherwise) is not the main goal of this chapter. Rather, the purpose is to develop algorithm-independent theoretical

constructive, and therefore place in evidence some procedures for estimation of integer periods with minimum number of samples. But these should be regarded as proof-of-concept algorithms. They are not computationally efficient, nor robust to noise.

Chapter Outline

A general rule of thumb in prior works has been the following: To estimate the period, we need a data-length that is at least twice the largest expected period (for e.g., [95], [110]). While this was observed to be true for some particular methods such as [95], it is not a fundamental bound in the sense that it need not apply to all possible period estimation techniques. A natural question is, is there a fundamental lower bound on the data-length that applies regardless of which technique we choose?

In Sec. 7.2 we show that such a lower bound does in fact exist: To estimate the period from the plausible set P = {P1,P2,· · ·,PN} of integers, the absolute minimum necessary and sufficient data-length is

Lmin = max

Pi,PjP

Pi+Pj− (Pi,Pj) (7.2) where(Pi,Pj) denotes the greatest common divisor. For example, to estimate the period from the set {7,24,100}, classical theory tells us that we need Nmin = 200 samples, whereas the actual bound is just 120 samples. In the special case when the set of plausible periods is:

P= {1,2,· · ·Pmax} (7.3) Lminturns out to be 2Pmax−2, close to the previously used rule of thumb of 2Pmax

samples [95].

In Sec. 7.3 and 7.4, we derive similar results for the case of mixtures of periodic sig- nals, where each component signal satisfies (7.1). Defining the component periods (or hidden integer periods) in a mixture in a unique and meaningful manner is rather subtle. The existing literature on period estimation has been rather informal in this regard, lacking a careful mathematical analysis on the uniqueness and identifiability of such component periods. One of our contributions in this work is to fill this gap in the theory. This is done in Sec. 7.3. Following this, in Sec. 7.4 we derive precise bounds on the minimum data-length required to estimate each of the component

𝜔0 𝜔1 𝜔2 𝜔3 𝜔

𝜔0 2𝜔0 3𝜔0 4𝜔0 𝜔

0

0

(a)

(b)

Figure 7.1: Part (a) - An arbitrary line spectrum; Part (b) - The harmonic line spectrum of a periodic signal. Can we use this additional structure in the spectrum of a periodic signal to reduce the data length required for period estimation?

periods (Theorems 7.4.2 and 7.4.3). In this case, the fundamental lower bound significantly differs from the 2Pmax rule of thumb.

While the primary goal of this chapter is to derive algorithm-independent bound on data lengths for period estimation, we also extend these results to one family of algorithms in Sec. 7.5. This is for the recently proposed dictionary based integer period estimation techniques (Chapter 3, [23]). Such dictionaries were shown to offer useful advantages compared to traditional methods, especially for mixtures of periodic signals. However, the minimum data-length required for the dictionary based algorithms in [23] was not reported earlier. It should be clear that any algorithm-specific bound on the data length that might have been reported earlier in the literature is necessarily at least as large as the generic bound derived here.

In Sec. 7.6, we briefly explore the datalength requirements when the period ofx(n)is not exactly an integer. Such signals typically arise as sampled versions of continuous time signals. Even though they might not be strictly periodic according to (1), their spectrum still has a harmonic structure as shown in Fig. 7.1 (b). An analysis of the minimum required data-length for estimating the period of such signals yields some rather interesting results. For instance, it will be shown (Theorem 7.6.1) that the minimum datalength depends only on the number of harmonics expected in the signal, with more datalength being required as the number of harmonics increases.

Connections to the classical result by Caratheodary and Fejer [111] on the datalength

All the above mentioned sections deal with contiguous datalengths. Sec. 7.8 ad- dresses the following question: If we are allowed to pick the samples in a non- contiguous fashion, what is the least number of samples needed to estimate the period? And how should we choose those samples? This question is quite diffi- cult to answer in general, but the smaller case of resolving between two periods is analyzed in this section.

Special Notations

1. The divisor set of a set of integersPis defined as:

D.S.(P)={d:∃P∈P, d|P} (7.4) That is, D.S.(P)is the union of the divisors of eachP ∈ P. For example, if P= {6,8}, thenD.S.(P)={1,2,3,4,6,8}.