(1)surya prakash matcha H I G H P E R F O R M A N C E A R C H I T E C T U R E S F O R A D A P T I V E E Q U A L I Z E R S U S I N G D I S T R I B U T E D A R I T H M E T I C (2)H I G H P E R F O R M A N C E A R C H I T E C T U R E S F O R A D A P T I V E E Q U A L I Z E R S U S I N G D I S T R I B U T E D A R I T H M E T I C A Thesis submitted for the award of the degree of Doctor of Philosophy by surya prakash matcha Under the supervision of Dr

Their subjects and suggestions undoubtedly contributed to the completion of the thesis. These people have also helped me in different ways for the completion of this work.

Introduction

The impulse response may contain causal and anticausal parts that contribute to the precue and postcue parts of the overall ISI. Such a model of the transmitter, channel and equalizer system is shown in Fig.1.2 where ηn represents the additive noise in the channel and ˜rndenotes the pre-decision output.

Performance issues and related research in ADFE

Here, parallelization is adopted to increase the throughput rate of the system and it is shown that the throughput rate can be proportional to the parallelization factor. The computational complexity here is lower than that of the multiplexer-based look-ahead, provided the tap number of the feedback filter is large.

Distributed Arithmetic and its variants

To increase the speed of the system, multiple bits ofxis can be used in parallel as address lines to the LUT, at the cost of increasing size. In such a case the size of the memory would be 2PN and the speed would be increased by a factor of P and therefore the inner product is calculated in B/P clock cycles.

DA based FIR filter design

DA based FIR filter

DA based FIR filter with offset binary coding scheme

Literature on Distributed Arithmetic based implementations

Later, DA was successfully used in the implementation of other DSP algorithms, such as cyclic convolution [11], discrete cosine transform (DCT) [56], and Fourier transform [12, 40]. As explained earlier, a fixed-coefficient filter can be easily realized using a DA by storing the partial products of the filter coefficients in the LUT.

Problem Formulation

Organization of the thesis

Piplenization of ADFE is not such an easy task due to the presence of feedback loop and non-linear device (quantizer) in ADFE. A fixed-coefficient filter can be easily realized using a DA by storing the partial products of the filter coefficients in a LUT, as described in Chapter 1.

High Performance LMS adaptive filter architecture using DA

Averaging the S-LUT input with the next consecutive input would generate a term independent of the oldest input sample. The S-LUT update schedule from time n to n+1 for a 4-tap filter is shown in Figure 2.7, where the positive and negative terms of the input samples are represented by. If we represent '1' and '0' as the positive and negative terms in the OBC combinations of input samples, the contents of the S-LUT would look like the binary sequence of the input samples in time, as shown in Figure 1. 2.8.

At hour+2 it would be the circular right shifted version of the sequence at hour+1 and so on. The algorithm that explains the overall operation of the DA-based LMS adaptive filter is shown in Fig. 2.9.

Performance Analysis

Area

Throughput

Conclusion

The FFF works directly on the received data and cancels out the pre-cursor part of the intersymbol interference (ISI). The output of the FBF, whose coefficients are carefully chosen, operates on the decisions made on the previous symbols. The DFE basically works on the assumption that there is no error at the output of the decision device.

The problem with DFEs is that the FFF and FBF sizes increase as the data rate increases. This is due to the fact that as the data rate increases, more and more symbols overlap, requiring a large number of pips for FFF and FBF.

The DFE architecture based on digit-serial DA

Direct-memory architecture

The internal structure of the direct memory architecture consists of a bank of registers, a group of memory blocks, a bank of accumulators and a shift accumulator block. Input bits enter the register bank serially which form the memory address bits. The stack bank consists of a set of stacks where the results from the memories are summarized.

From Figure 3.2, it can be seen that the contents of the top half of the memories resemble the mirror image of the bottom half and therefore a memory that stores only half of the contents can be used to perform DA filtering.

Reduced-memory architecture

OBC based-memory architecture

Performance Analysis

Conclusion

The speed of the DA-based implementation can be further increased by choosing a smaller number of addition units, resulting in a reduction in the critical path. The distortion can be of two types, as the frequency components of the transmitted signal are not equally attenuated (amplitude distortion) and not equally delayed (delay distortion). Furthermore, the distortion may not be the same at all times due to the time-varying nature of the channels.

The feed forward filter (FFF) filters out the precursor part of the ISI that occurs due to the anti-causal part of the channel transfer function. Distributed arithmetic can be used to reduce the computational complexity arising due to the requirement of a large number of taps in FFF and FBF.

Mathematical Formulation of DA based ADFE

It can be observed that the terms in square brackets of (4.9) can take one out of 2Nf possible combinations depending on b(i,Bf)−1−j, which are nothing but partial products of coefficients of FFF. These partial products can be stored in the memory which we call it as FFF-LUT1. These partial products (OBC combinations) can be stored in a memory and we call it FFF-LUT2.

The partial products of FFF-LUT2 follow the same analogy as the partial products of FFF filter weights described by Equations (4.9) and (4.16) correspond to the filtering operations of the FFF and FBF filters, respectively. With partial products following the same analogy in both the lookup tables (memories) of FFF and FBF, the weight update operation of FFF can be performed as follows.

OBC scheme for the DA based architecture

Extension to Sign-LMS and Signed-Regressor LMS ADFE

If only the upper half of the partial products is stored in memory, then the entry of the memory at address location can be given by. In (4.29), the summation related to the term x(n−i) is nothing but the partial products of the input samples, and it may therefore be convenient to store the partial products of the input samples in a memory so that weight update operation can be performed using partial products of filter weights and input samples. This can be achieved by averaging pairs of consecutive memory locations (eliminating the oldest input sample) and then adding and subtracting the expression R(n) with the result and storing them back in the same corresponding memory locations.

The LSBs of each of the buffers are used as address bits for MEMw(f), and filtering takes B clock cycles (B is the bit length of the buffers) according to (4.23) to compute one sample of the output. A similar algorithm can be applied to the ADFE feedback filter (FBF), and the same algorithm can be extended to the signed LMS algorithm, in which the sign(.) function in (4.28) is applied to the input sample instead of the error.

Computational Complexity

Conclusion

As discussed in Chapter 1, a common problem faced by ADFE is that as the data rate increases, the ISI components become more numerous, so the order of both FFF and FBF would increase. However, the block processing of ADFE presents a problem because the block processing of the FFF input, ie. of the received data is possible since it is known in advance, but this cannot be achieved with the FBF input, i.e. decision outputs to be evaluated by ADFE. The technique developed in [8] takes advantage of frequency domain schemes, i.e. ease of implementation and low complexity, but the technique imposes few restrictions on the choice of block length relative to the length of FFF and FBF.

In [18], a hybrid scheme was implemented, in which only the feedback part is implemented in the frequency domain, and the feedback part is implemented in the time domain. In the chapter, the subscripts/superscripts f and bre represent terms related to FFF and FBF, respectively.

The Block LMS (BLMS) algorithm

The causality problem in block ADFE

In other words, the decisions corresponding to the terms of the upper triangular matrix∇V1(k) are unknowns for the current block of inputs to the FBF. In IS1, the initial trial decisions are set to the null vector to compute the current decision block. To avoid this complexity, we propose an approach that uses a cost function that is minimized based on one of two criteria, namely absolute difference (MAD) and mean squared difference (MSD).

The approach is based on the fact that each of the unknown decisions can take one of the symbol values from the set of symbol values used in the modulation scheme used at the transmitter. The algorithm for calculating the unknown decisions based on this approach is shown in 5.3.

Formulation of the DA based block ADFE

The matrixRo,L is used for the selection of the last L valid samples of the filter output vector (as obtained from the overlap-save method) and is given as. The matrices F andF−1 are the M×M (M = Nf +L−1 in the case of FFF andM =Nb+L−1 in the case of FBF)-dimensional Fast Fourier Transform (FFT) and Inverse Fast Fourier, respectively Transform (IFFT) matrices that can be given as,. Using the procedure described above, each of the FFT and IFFT operations can be realized using the distributed computing technique for the efficient realization of block ADFE and this can be obtained as follows.

Consequently, (5.47) can be calculated by right shift (due to the term 2−j) and accumulate (due to the summation) operations. The detailed block diagram of the block ADFE implemented in the frequency domain is shown in Fig.

Conclusion

Here the OBC scheme was used to reduce the size of the lookup tables. Based on these equations, a direct memory architecture was developed where the content of the lower half of the LUT is a mirror image of the upper half. Later, a new architecture was developed for reduced complexity, namely the memory-reduced architecture, where half the content of the LUTs is generated live using combinational logic.

In Section 4, the DA processing for the LMS-based adaptive decision feedback equalizer was implemented. Finally, in Chapter 5, the big goal of realizing the ADFE block using DA was discussed.

Scope of the future work

Low power and less complex implementation of fast block LMS adaptive filter using distributed arithmetic. A high-performance, energy-efficient architecture for FIR adaptive filtering based on the novel distributed arithmetic formulation of the Block LMS algorithm. FPGA realization of high and medium speed FIR filters using modified distributed arithmetic architectures.

Sagar, "An Efficient Distributed Arithmetic based realization of the Decision Feedback Equalizer," Circuits, Systems and Signal Processing, Springer, vol.35, issue.2, pp.603-618, Feb.2016. Shaik, “A distributed arithmetic-based approach for implementing the token LMS adaptive filter,” in IEEE International Conference on Signal Processing and Communication Engineering (SPACES2015), India, pp.