To alleviate this problem, we used a single DA-based LMS adaptive filter with a new coefficient transfer scheme. This is based on the fact that when radix size becomes equal to the word length of decisions, the implementation of DA-based LMS adaptive filter can be made SA-less.
Introduction
Applications of ADF
- System Identification
- Channel Equalization
- Noise Cancellation
If the ADF output y(n) approaches yu(n), then the ADF can model part of the unknown system. The desired primary input signal consists of the original speaker ˆd(n) and the interfered noise signal component η(n).
LMS Algorithm
CLMS Algorithm
It is clear from the above discussion that the LMS algorithm either provides fast convergence rate or low steady-state error based on the choice of step size. The coefficients of ADF0 and ADF1 are updated based on the LMS criterion, according to
BLMS Algorithm
Complexity Issues and Related Research in LMS based Systems
Thus, there is a need to reduce the critical path delay to meet the desired data rate specifications. It can be noted that the location of matching delays (mD) can be changed to achieve the desired critical path.
Distributed Arithmetic
Note that the least significant bit (LSB) of the filter coefficients always forms the address lines to the LUT. To increase system speed, groups of coefficient bits can be used as address lines for LUTs.
DA based FIR Filter
TC-DA based FIR Filter
According to (1.31), it is clear that the number of clock cycles for the production calculation is reduced by a factor γ. These are nothing more than the partial filter products of the input samples which are shifted and accumulated for W number of clock cycles.
OBC-DA based FIR Filter
For the given set of input samplesex(n−i), the term dk(n) would take 2N combinations of the input samples, and of these only one combination is selected at a time.
Literature on DA based Implementations
Problem Formulation
Due to the limitation of standalone LMS ADF in various applications, a combination of LMS ADFs in parallel or in series or in block is usually used to achieve better performance. This motivates us to exploit possible advantages of DA in the realization of LMS ADFs with low complexity.
Organization of the Thesis
In Chapter 2, the complexity of LMS ADF is optimized using OBC, but it cannot be used to improve the convergence performance due to the presence of non-OBC terms. In Chapter 4, a low-complexity architecture of ADFE is presented for channel equalization problem in 5G communication system.
Introduction
Mathematical Formulation
It is clear from (2.3) that the filter coefficients require a single clock cycle for the update. It is worth noting that the number of clock cycles to update the filter coefficients is m+ 1.
Proposed Scheme
Optimal LUT and SA Architectures
- Design and Analysis of OBC-LUT Architectures
- High Radix TC and OBC based LUTs
- Architecture of SA Unit
The corresponding proposed radix-4 partial product generator (PPG) of TC and OBC is shown in Fig. The pipeline structure of the proposed MMP-CSFA based SA unit (PSAR) is shown in Fig.
Architecture for Small Order Filter
- Complexity Considerations of Coefficient Update Unit
The convergence performance of the proposed designs after removing non-OBC terms is shown in Fig. Second, the bit slices of coefficient increment terms are obtained before the coefficient coefficients are updated, as in (2.12).
Architecture for Large Order Filter
It consists of four 4th order DA base units (as shown in Fig. 2.8) whose outputs are added by two separate binary adder trees, corresponding to four (B+ 1)-bit sum and carry words. For each kem+1, initial clock cycles of DA base unit γj(l) are calculated by activating the corresponding registers in CURA using Ej = 1 and fed to ISR unit.
Performance Comparison
Computational Complexities
- Hardware Complexities
- Time Complexities
Like DA3-ADF and DA4-ADF, the CP0 of the proposed designs includes a computational delay of OBC-LUT. Interestingly, the proposed designs require a single clock cycle to produce the output like DA3-ADF and DA4-ADF, except for the proposed Structure-I.
Implementation Results
- ASIC Synthesis
- FPGA Synthesis
It is quite clear that the throughput of the DA0-ADF, DA1-ADF and DA2-ADF models are significantly lower than the other tube models. The physical LUTs of the DA0-ADF, DA1- ADF and DA2-ADF models are mapped on the FPGA using the HDL primitive [80].
Conclusion
The RLS ADF offers a fast initial rate of convergence and a low steady-state error compared to the LMS ADF. LMS ADF with variable step size is a potential solution that achieves both fast initial convergence and low steady-state error [13].
Mathematical Formulation
Thus, the coefficient update equation of pipeline CLMS ADF with two adaptation delays can be expressed as . Therefore, it can be concluded that the separate coefficient adjustment of ADF1 is not required.
Proposed Scheme
Architecture for Small Order Filter
The second adjustment delay is placed after ec(n−1) calculation to obtain ec(n−2) from ec(n−1), as shown in Fig. The internal scheme of the proposed CCU includes an equality check, a counter with enable (E) and clear (CLR) inputs and a comparator, as shown in Fig.
Architecture for Large Order Filter
Simulation study indicates that the duration of time windows of undesired correlations is always less compared to the duration of time window of desired correlation (in steady state). It should be noted that the counter must be cleared for each new count using CLR signal.
Determination of Pre-defined Time Window
To derive the predefined time window ζ and additional mean steady-state error performance ∆ξd2, three points are considered: 'A', 'B' and 'C', as shown in Fig. This is because the number of iterations would be increased to achieve lower steady-state errors.
Performance Comparison
Computational Complexities
Convergence Performance
For example, the proposed filter used step size µ0 = 1/N in the initial adjustment period and step size µ1 = 2−p/N in the steady state. According to the simulation results, the improvements in steady-state error performance for the proposed filter can be as high as 6 dB compared to DA3-ADF and DA4-ADF.
ASIC Synthesis
However, using a smaller step size µ1 = 2−p/N would reduce both mismatches and stationary errors, but it makes the convergence speed slower. On the other hand, while the proposed architecture is pipelined, the convergence speed and minimum steady-state error turn out to be better because it is based on the two different step sizes.
Conclusion
Tabel 3.1: Sammenligning af beregningsmæssige kompleksiteter mellem DA-baserede designfor Nth OrderFilter,Lth OrderBase UnitandW-bit WordlengthofFilter Coefficients DesignTypeAddersRegisterShiftersLUT/MUXthroughput DA0-ADF[67]Non-Pip/Par†+3M-2TM+(1K+2M+2M+TM+1M − 1)TM+ TA )] DA1(a)-ADF[68]Non-Pip/Par†2M+N+12N+3L+1LM2LM-LUT1/[k1(TACC+TA) DA1(b)-ADF[68]Non-Pip/ Par ?†3M+N+32M+3N+1LM+NM0-XOR2L−1M-LUT1/[k1(TACC+TM+TA) DA2-ADF[69]Non-Pip/Par?‡4M+3M+3N+2M + 2NM0-XOR2LM-LUT1/[k2(TACC+(L+1)TM+2TA)] DA3-ADF[70]Pip∗/Par†(3,2L−1+1)M3M(1+2L−1)+N + 2∼2LM2LM-MUX1/[W(LTM+TFA+TD)] DA4-ADF[71]Pip∗/Par†M(2+2L−1)+N(3+2L)M+2N+4LM2LM-MUX1 / [W(LTM+TFA+TX+TD)] ProposedPip∗/Par†¶M(3+2L−2)+N0(4+2L−1)M+NLM(2L−1)M1/[W(LTM) +TA+TFA+TX +2N0+4(1+L)M-MUX+Td)] Pip∗:Pipelinet arkitektur med to tilpasningsforsinkelser,Par:ParallelLUT,†:samtidigeSAogCUUoperationer,‡:samtidigeSA,CUUog LUToperationer,?:OBCbaserede produktdeling.Designeti[67–69]har en controller, mens det foreslåede design involverer en CCU, der består af en kvalitetskontrol, en modsætning og en komparator;N=LM,L:basisrækkefølge,M:antal basisenheder;B:angiver ordlængden af butinputprøver ogkoefficienter;L−W=N0=N0=N ;k0= 2L+max(W,2L−1)+log2M,k1=2L−1+W+log2M+1, k2=2L/2+max(W,2L/2−1)+log2M+1;TACC ,TM, TA,TFAogTDerhenholdsvis forsinkelser for at se-uptable,a2-til-1multiplekser,enadder,enfuld-adder ogregistreres. Imidlertid ville den direkte realisering af FB-filter i DFE ved hjælp af DA øge den kritiske vej, da SA-enhed bidrog med betydelig tid i feedback-sløjfen.
Mathematical Formulation and Background
Therefore, it is important to design a trial-decision FB filter using OBC-DA as it offers low implementation complexity. This was addressed by the same authors in [33] by precomputing and storing the remaining FB filter coefficients in a second LUT as shown in the figure.
Proposed Scheme
- Transformation of LUT Contents
- Proposed Architecture
- Design of High-Throughput Architecture
- Use of Inverted Multiplexers
- Use of Buffers/Inverters
- Use of Retiming Technique
- Coefficient Update Unit
In similar arguments, the iteration limit of the proposed unfolded architecture for 16-QAM would be 11TM/3. The update schedule of DLUT from time n to n+1 for 8th order FB filter is shown in FIG.
Performance Comparison
Computational Complexities
- Hardware Complexity
- Time Complexity
Similarly, the number of clock cycles required for the update of DLUT in adaptive TSP-DFE and R-DFE designs would be 2Nb-P and 2Nb, respectively. In contrast, the number of clock cycles required for the design in [39] depends on the buffer size and the delay factor.
Error Performance
- Convergence Performance
- BER Performance
In contrast, the design in [39] is associated with a slower factor whose value greatly affects the BER result. In addition, this design also needs extra clocks for system initialization, which further degrades BER performance.
Implementation Results
- ASIC Synthesis
- FPGA Synthesis
For example, if the BER value is fixed at 10-3 and P varies from 3 to 4, then almost 2 dB more SNR is required for the Rayleigh fading channel, while it is only 1.2 dB for the AWGN channel. Compared with R-DFE and TSP-DFE, the proposed design provides a better BER value by eliminating the effect of the first P coefficients of FB filter in parallel due to E-MUX at stage-I, as shown in Figure 1.
Conclusion
Tabela 4.1: Primerjava računalniških zapletenosti med zasnovami DFED za Nth for OrderFFfilterin Nth BorderFBfilter with ConstellationSizeM DesignTypeAddedsMultiplexersRegistersCriticalPath/ThroughputLatencyLUT/MULT R-DFE†[29]non-A(M)Nb(M)Nb−1Nblog2 MTMlog2M00 A2(M)Nb+12(M)Nb−2 +(M −1)NbNblog2M1/[k0TMlog2M]02Nb+M PP-DFE†?[33]ne-A2P+M +(Nb−P+1)2P+M−1(Nb+P)log2MTFA∼TMULP(Nb−P+ 1)log2M TSP-DFE†[33]ne-A2(M)Nb/22(M)Nb/2−23Nblog2M/2TA+TS Nb/2+1+TMlog2MNb/22(Nb+M)/2+1 A3 (M)Nb/2+23(M)Nb/2 +(M−1)Nb/2(3Nblog2M)/21/[k1(2TA+TS N/2+1+log2MTM)]Nb/23(2) (Nb+M)/2 DFE†[34]A(M)Nb(M)Nb(M)Nb+NbJlog2(Nb+1)TMβ+γ0 DFE†[35]ne-A(M)Nb~MNbNbNb2 + (M)Nblog2(Nb+1)TM/(P+log2N−1)R−10 DFE?[38]ne-A∼[(Nb+1)log2M]2−Nblog2MJ0[(Nb+1)TA+TS ]γ∼(Nblog2M)2 DFE?[39]A∼2[(Nblog2M)2+1]−Nblog2M1/[k2((N+1)TA+TS)]β+γ∼2(Nblog2M)2 DFE† [40]ne-A(Nblog2M)2/2(M−1)Nb2 /2(Nblog2M)3/6NbTA+TS+TMlog2MQ−1− Predlagano†ne-A2(M)Nb/2−1+12(M )Nb/2−1(3Nb/2+2)log2MTA+0,5TS Nb/4+1+TMlog2MNb/2+12(Nb+M)/2 A3(M)Nb/2−1+33(M)Nb /2−1−1 +(M−1)Nb/2(3Nb/2+2)log2M1/[k3(1,5Ta+0,5Ts Nb/4+1+TMlog2M)]Nb/2+13(2)( Nb+M)/2−1 †:Arhitektura brez množitelja;?:Arhitektura, ki temelji na množilniku;A:Prilagodljivo,P:Faktor pospeševanja,J:Faktor odvijanja (ali paralelizacije),R:Faktor prirastka,Q:Faktor ponovitve;β:Faktor počasnega upadanja; γ: Urni cikli inicializacije sistema, za enostavnost γ = 0; KM: Velikost vhodnega medpomnilnika; KM=2MKwithK=Nf+Nb+1,Throughput=Clockrate/Processingtimepersample,Clockrate=1/Criticalpath,k0=Nf+α02Nb+1; k1=Nf+P+α12Nb−P+1;k3=Nf+P+α22Nb−P−1+2;z α0=2M,α1=2M/2inα2=2M/2−1;k2=J0KM;zJ0=pJinpoznačuje ekstraparalelizacijo, vključeno v DFE ?[38]inDFE?[39].TMTA, TMUL in TFA so računske zakasnitve multiplekserja 2 proti 1, seštevalnika, množilnika in polnega seštevalnika za shranjevanje s prenosom. Poleg navedenih zapletenosti strojne opreme predlagana neprilagodljiva DFE in ADFE zahtevata 1-bit 2((M−1)Nb/2−1)in3(( Vrata M−1)Nb/2−1)XOR, medtem ko predlagani ADFE vključuje pogojno lestvico in sodčasti premik za posodobitev vsebine DLUT oziroma koeficientov filtra FB. Upoštevajte, da zasnove[34,39]temeljijo na istem principu in vključujejo medpomnilnike velike velikosti. znaki so ocenjeni za I-kanal sistema M-QAM, ki jih je treba pomnožiti s faktorjem dva, da se določi celotna kompleksnost. Kot je razloženo v 1. poglavju, je ADF osnovni gradnik v dušilniku hrupa za oceno hrupa za dušenje.
Mathematical Formulation
As a result, the LUT size can be reduced to half, the remaining half combinations are taken by external XOR gates. It can be noted from (5.20) that the content present at the lower half even address location forp= 0, q= 0 is two's complement to the content present at the upper half odd address location forp= 1, q= 1.
Proposed Scheme
- Filter Block Update Strategy
- Proposed Architecture
- LUT Update Scheme
- Architecture of Sub-Filter Unit
- Architectures of Error Computation Unit and Coefficient Update Unit 146
- Computational Complexities
- Noise Reduction Performance
- ASIC Synthesis
While the proposed design takes only 41 clock cycles (8 clock cycles in LUT0 and LUT1 update, 32 clock cycles in SA unit and 1 clock cycle in block error calculation and updating the content of external registers). From the results, it is found that the proposed filter works well at certain frequencies.
Conclusion
Although the complexity of pipelined LMS adaptive filter was optimized using OBC scheme in Chapter 2, non-OBC terms were produced at the output. In each iteration, the correlation between the adjacent errors was compared with the predefined time window.
Suggestions for Future Research
Bishop, »Algoritem prilagodljivega filtra s spremenljivim korakom (VS),« IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. Parker, “Block implementation of adaptive digital filters,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol.
Block diagram of an ADF
Block diagram of system identification configuration of an ADF
Block diagram of channel equalization configuration of an ADF
Block diagram of conventional ADFE
Block diagram of adaptive noise cancellation configuration of an ADF
Circuit schematic of an ADF based on conventional LMS algorithm
Block diagram of ADF based on CLMS algorithm
Block diagram of ADF based on BLMS algorithm
Block diagram of ADF based on DLMS algorithm
Block diagram of pipelined LMS ADF with two adaptation delays
Architecture of pipelined LMS ADF based on OBC-DA
Design, analysis and comparison of 4 th order OBC-LUT I, II, III architectures
Neuvo, “The maximum sampling rate of digital filters under hardware speed limitations,” IEEE Transactions on Circuits and Systems, vol. F¨arber et al., “The 5G candidate waveform race: a comparison of complexity and performance,” EURASIP Journal on Wireless Communications and Networking, vol.
Design, analysis and comparison of radix-4 TC and OBC partial product generators
Radix-2 pipelined MMP-CSFA based SA unit for OBC-DA
Radix-4 pipelined MMP-CSFA based SA unit for OBC-DA
Circuit schematic of 4 th order OBC-DA based LMS ADF
MSE learning curves of adaptive equalization problem for the presented and existing
Circuit schematic of 16 th order OBC-DA based LMS ADF with an ISR unit and a
Comparison of adders complexity for the presented and existing DA based designs
Comparison of registers complexity for the presented and existing DA based designs
Comparison of multiplexers, XOR gates and LUT words complexities for the presented
Throughput curves for the presented and existing DA based designs
Thus, it is clear that the number of clock cycles to update the contents of DLUT is significantly reduced for the proposed design. The corresponding noise reduction performance results for the proposed design are illustrated in Fig.
Block diagram of pipelined CLMS ADF with two adaptation delays
Pipelined CSFA based SA unit for TC-DA
Circuit schematic of 4 th order S-PLUT
Circuit schematic of 4 th order LMS ADF based on TC-DA
The block diagram of the proposed BLMS ADF for filter order = 16 and block length = 4 is shown in Fig. Also, the proposed filter maintains an estimate of the input noise from time to time as shown in the figure.
Circuit schematic of 16 th order LMS ADF based on TC-DA
Assumed MSE curves for the presented design to determine ζ
Variation of ζ with respect to initial error for 32 nd order filter and 8, 16-bit wordlengths
Block diagram of OFDM-QAM with TEQ
Architecture of conventional ADFE for N b th order FB filter
Architecture of R-DFE for 3 rd order FB filter