Conclusion - PDF Low Complexity Distributed Arithmetic Based Pipelined Vlsi ...

Table2.4:ComparisonofHardwareComplexitiesbetweenDABasedDesignsforNth OrderFilter,Lth OrderBaseUnitandW-bitWordlength ofFilterCoefficients DesignAddersRegistersMultiplexersXORgates BSLUT BiCom∗ByComBiCom∗ByComBiCom∗ByComBiCom∗ByCom DA0-ADF‡[67]−3M+1−(1+L)M+2L(L−1)M−−MM2L+1M DA1-ADF‡[68]−(L+2)M+1−(2+3L)M+1LM−−MLM2LM DA2-ADF‡[69]−4M+2−(3+L)M+2(L−1)(L−2)M2M2(L−1)MMM2LM DA3-ADF†[70]α0M(3.2L−1+1)Mα0M(3+3.2L−1+L)M+22(2L−1)M(2L−1)M(L+2)MM(2L−1)M− DA4-ADF†[71]α0M(2+2L−1+L)Mα0M(3+2L+2L)M+42(2L−1)M(2L−1)M(L+2)MMLM− Structure-I†(α1+1)M+α3(2L+3)M+1(α1+L+2)M+α4(2L+4)M+72M−2M+1(L(L−3)/2)M+2M−2(L+1)MLM− Structure-II†(α1+1)M+α3(L+3)M+12(L+1)M+α4(2L+4)M+72M−2M+1(L+1)M(L+1)MLM− Structure-III†(α2+1)M+α3(L+3)M+12(L+1)M+α4(2L+4)M+72M−2M+1(L+1)M(L+1)MLM− ‡:non-pipelinedstructure,†:pipelinedstructure,BS:barrelshifter,BiCom:bit-complexity,ByCom:byte-complexity,BiCom∗ =ByCom/B,B:wordlength ofinputsamples,W:wordlengthoffiltercoefficients,assumedB=W;N:filterorder,M(=N/L):numberofDAbase-units,α0=17forL=4ofDA3- ADF[70]andDA4-ADF[71],α1=L(L−1)/2,α2=L(log2L+1)/4,α3=log2(LM)−1,α4=2α3−2.DA0-ADF,DA1-ADFandDA2-ADFdesignsrequire acontrollerwhicharerelativelycomplexthantheproposeddesigns(acounterdlog2(N−m+1)e-bitsandtwomodulocircuits).

TH-2070_136102022

2.5 Conclusion

1000

Number of Adders

Structure-I Structure-II Structure-III

BiComByCom 800 600 400 200 0 3264128 Filter Order (N)

DA -ADF [67]0 DA -ADF [68]1 2 4³

DA -ADF [69] DA -ADF [70] DA -ADF [71] Fig.2.11:ComparisonofadderscomplexityforthepresentedandexistingDAbaseddesigns.

1000

Number of Registers

800 600 400 200 0 3264128 FilterOrder(N)

BiComByCom Structure-I Structure-II Structure-III

DA -ADF [67]0 DA -ADF [68]1 2 43

DA -ADF [69] DA -ADF [70] DA -ADF [71] Fig.2.12:ComparisonofregisterscomplexityforthepresentedandexistingDAbaseddesigns.

TH-2070_136102022

2.5 Conclusion

Multiplexers LUTBiComByComBiComByComXOR gates 1000 800 600 400 200 0

1200 3264128 Filter Order (N)

Number of Muxes , XOR gates and LUT wor

Structure-I Structure-II Structure-III

DA -ADF [67]0 DA -ADF [68]1 2 43

DA -ADF [69] DA -ADF [70] DA -ADF [71] Fig.2.13:Comparisonofmultiplexers,XORgatesandLUTwordscomplexitiesforthepresentedandexistingDAbaseddesigns.

Table2.5:ComparisonofTimeComplexitiesbetweenDABasedDesignsforNth OrderFilter,Lth OrderBaseUnitandW-bitWordlength ofFilterCoefficients DesignNOCNCCADLADCP0CP1CP2Throughput DA0-ADF‡[67]1k10−TACC+3TM+TA−−1/k1CP0 DA1-ADF‡[68]1k20−TACC+TA−−1/k2CP0 DA2-ADF‡[69]1k30−TACC+5TM+2TA−−1/k3CP0 DA3-ADF†[70]212a-PSAR,a-ECU4TM+TFA+TD(2+log2M)TA−1/W[max(CP0,CP1)] DA4-ADF†[71]212a-PSAR,a-ECU4TM+TFA+TX+TD(2+log2M)TA−1/W[max(CP0,CP1)] Structure-I†1k42b-PSAR,b-ECUTFA0+2TX+2TD+TATFA0+TX+TD+TM+TAlog2M3TA+(1+log2M)TM1/k4[max(CP0,CP1,CP2)] Structure-II†212b-PSAR,b-ECUTFA0+2TX+2TD+3TATFA0+TX+TD+TM+TAlog2M3TA+(1+log2M)TM1/W[max(CP0,CP1,CP2)] Structure-III†212b-PSAR,b-ECUTFA0+2TX+2TD+2TATFA0+TX+TD+TM+TAlog2M3TA+(1+log2M)TM1/W[max(CP0,CP1,CP2)] NCC:numberofclockcycles,NOC:numberofclocksused,AD:adaptationdelays(m),LAD:locationofadaptationdelays,a-PSAR:after PSARunit,a-ECU:afterECUunit,b-PSAR:beforePSARunit,b-ECU:beforeECUunit,clockB=WclockA,Throughput=1/[NCC× critical-path],k1=2L +max(W,2L−1 )+log2M,k2=2L−1 +log2M+W+1,k3=2L/2 +max(W,2(L/2)−1) )+log2M+1,k4=W(L−1)log2M; TACC,TM,TA,TFA,TFA0andTDarecomputationaldelaysduetoLUT,multiplexer,adder,binaryCSA,MMP-CSFAandDflip-flop(FF) respectively.

TH-2070_136102022

2.5 Conclusion

108 4816326412825601234567 0.6 0.3

108 0.40.5

Throu ghput (NSPPS)

Filter Order (N)

Structure-I Structure-II Structure-III

DA -ADF [67]0 DA -ADF [68]1 2 43

DA -ADF [69] DA -ADF [70] DA -ADF [71] Fig.2.14:ThroughputcurvesforthepresentedandexistingDAbaseddesigns.

Table2.6:PerformanceComparisonofDifferentDABasedDesignswithASICSynthesisusingTSMC90nmCMOSLibraryandFPGA ImplementationonXilinxZYNQ-XC7Z020-1CLG84Cfor8-bitWordlengthofFilterCoefficientsand4th OrderBaseUnit Platform Designand Filterorder N

ASICSynthesisFPGASynthesis Area (µm2)Power (mW)MSP (ns)Throughput (perµs)ADP (µm2×ns)PDP (mW×ns)SLUT (×1000)FF (×1000) 16321632163216321632163216321632 DA0-ADF‡[67]479389418831.5562.0824.2826.9141.1837.16116334253459976616706.4911.831.653.41 DA1-ADF‡ [68]390717716826.8850.8916.2118.0761.6955.3463334113944254359194.147.461.532.93 DA2-ADF‡ [69]293975793415.2528.1130.4533.0732.8430.2389513819158774649291.622.780.721.31 DA3-ADF†[70]5698211379619.2339.834.564.56219.3219.3259837518909871810.651.242.674.63 DA4-ADF†[71]351776956216.3431.125.285.45189.4183.5185734379112861690.921.761.452.83 Structure-I† 10176200845.199.818.379.11119.5109.88517318296543890.430.790.951.83 Structure-II†9813195924.989.3311.2412.0388.983.1110298235691561120.390.710.891.69 Structure-III†9341187224.768.6610.1911.0698.190.49518420706549960.340.630.841.56 MSP:minimumsamplingperiod,ADP:area-delayproduct,EPS:energy-per-sample.

TH-2070_136102022

3

Low Complexity Pipelined Convex Combination LMS Adaptive Filter

Contents

3.1 Introduction . . . . 70 3.2 Mathematical Formulation . . . . 71 3.3 Proposed Scheme . . . . 74 3.4 Performance Comparison . . . . 87 3.5 Conclusion . . . . 90

3.1 Introduction

ADFs are widely employed in various DSP and communication systems such as system identification, channel equalization, noise cancellation etc [5, 6]. For system identification problem, it is desired that the filter coefficients of ADF reach the optimum solution of unknown system in reasonable time with low steady-state error. LMS and RLS are two popular algorithms for the adaptation of filter coefficients. RLS ADF offers fast initial convergence rate and low steady-state error compared to LMS ADF. But, its computational complexity are significantly higher than LMS ADF which makes the real-time operation difficult. On the other hand, LMS ADF either provides fast convergence rate or low steady-state error based on the selection of step-size. Solving the well known trade-off between the initial convergence rate and the mean-square error in steady state of LMS ADF has gained much attention. Variable step-size LMS ADF is a potential solution which achieves both fast initial convergence and low steady-state error [13]. Due to adaptation of step-size in every iteration, the hardware realization of such filter is difficult. In the sequel, many researchers suggested different approaches to address this problem [14–22]. For instance, Kwong et al. in [16] utilized the squared instantaneous error, Aboulnasr et al. in [18] exploited the auto-correlation of time-averaged error signals at adjacent time. In the past, the idea of combining two LMS ADFs in parallel with different step-sizes has gained significant interest as it is able to optimize the trade-off between convergence speed and steady-state error. It is referred as convex combination of LMS (CLMS) ADF, where filter with large step-size offers fast convergence and filter with small step-size provides low steady-state error [11, 26].

But, this requires transfer of filter coefficients from one LMS ADF to other which is difficult from implementation point-of-view. Further, due to involvement of two LMS ADFs its complexity is twice that of conventional LMS algorithm. In addition, the presence of several MAC units in both the LMS ADFs lead to high computational requirements. Few attempts have been made in the past to reduce the complexity of coefficient transfer scheme [23–26]. Garcia et al. in [23] proposed a linear transfer of coefficients by interacting slow filter to fast filter during the transition stage, however it degrades the SNR performance. Later, Ruiz et al. in [24] suggested a new update rule to improve the SNR at the cost of increased computational requirements. Nascimento and de Lamare in [25] presented a low- complexity instantaneous transfer scheme based on sliding window approach for better convergence performance. Lu et al. in [26] proposed a new coefficient transfer scheme based on sign-adaptation approximations for low-complexity realization of CLMS ADF.

TH-2070_136102022

Dalam dokumen PDF Low Complexity Distributed Arithmetic Based Pipelined Vlsi ... - Ernet (Halaman 93-103)