in Partial Fulfillment of the Requirements for the Degree

Finally, the effectiveness of the proposed approach in real-life applications is investigated via an image processing application. Note that the numbers indicated on the logic gates and on the input and output nodes of the logic gates have already been discussed in Figure 2.

Approximate Computing

Fault resilience refers to an application's ability to accept output despite some of its underlying computations being performed using approximate software and/or hardware. 1.3, the fault tolerance of such applications is attributed to several factors, including noisy and redundant input data, absence of a unique golden output, limited perception of human senses, algorithmic functions, etc.

Figure 1.2: Researchers/designers are working to harvest new gains during the fallow period to advance the computing systems without significant technology progress [7].

Approximate Adders

1.6, an N-bit ESA has two primary design parameters: (i) Segment size (k), which represents the maximum length of carrier spread; and (ii) Overlapping bits (l), which represent the minimum number of bits used in carry prediction, where 1 ≤ k < N and 0 ≤ l < k. In addition to the primary design parameters andl, the anN-bit ESA also has three secondary design parameters: (i) Output bits per sub-adder (except the least significant sub-adder) contributing to the final sum(s); (ii) The size of the least significant subadder (h); and (iii) Total number of subadders (n), where r = k−l, h = N− (k −l) ⌈Nk−−lk⌉ and n = ⌈Nk−−kl⌉+ 1.

Figure 1.4: In nanometer regime, design related issues begin to exert a more pronounced influence on chip yield as compared to the process related issues [9].

Quality Metrics

For an input vector (or set of inputs), ER is the probability that the output will be incorrect as:. iii) MED is a composite quality metric that captures ED and ER information. MSE is one such quality measure where ED is more important compared to ER.

Thesis Motivations

Therefore, there is a need for analytical modeling of approximate adders. ii) Among all existing conventional adders, RCA is the simplest. In most of the work, researchers/designers study the feasibility of approximate adders in multimedia.

Figure 1.7: Simulation time versus bit-width for exhaustive simulation of a single configuration of an N - -bit ESA for different values of N

Thesis Contributions

Contribution in AFAs

Because different applications have different levels of error resilience, the effectiveness of approximate adders varies from application to application. In addition to the delay and power benefits, approximate adders can also be used to address other technology/design related issues such as yield loss [1].

Contribution in ESAs

Analytical Modeling
Optimization
Accuracy Enhancement

In addition to designing ApproxADDs, we also provide detailed analysis and analytical modeling to evaluate the accuracy, latency, power and area of ApproxADDs. Therefore, for a given accuracy, the a priori selection of optimal configurations that provide minimum delay, power, and/or area is a challenging decision for designers.

Application of Approximate Adders

Cryptography Applications
Yield Enhancement
MAC Unit

We propose analytical models to estimate the functional yield and parametric yield of approximate arithmetic circuits. To examine the effectiveness of approximate adders for designing other approximate arithmetic operations, we design an approximate multiplier using approximate adders.

Thesis Organization

We evaluate the performance of the proposed ApproxADDs by comparing them with existing AFA-based approximate adders. In this chapter, we first discuss the design philosophy and design approach of approximate adders.

Design Approach

State-of-the-Art AFAs

Gate Level AFAs

Simplified Full Adder [1]
Lower-part-OR Adder [2]

Assuming that the inputs are independent and uniformly distributed, Table 2.2 also summarizes the ER of the simplified functions. Assuming that inputs are independent and uniformly distributed, Table 2.3 also summarizes the ER of SFA and LOA.

Table 2.2: Examples of gate level design simplifications.

Transistor Level AFAs

Approximate Mirror Adders [3]
Approximate XOR/XNOR-based Adders [4]

As shown in Table 2.4, AMA#2 requires 16 transistors and has 2 errors in S, while Couti is correct for all input combinations. As shown in Table 2.5, AXA#2 has 4 errors in S while Cout is correct for all input combinations.

Table 2.3: Truth table and design metrics of gate level AFAs.

Error-Tolerant Adder 1 [5]

On the other hand, for an approximate sub-adder, a special mechanism is used where no carry is generated and propagated at any bit position. This special mechanism is described as follows. i) Check each input bit (Ai and Bi) from left to right. ii).

Table 2.5: Truth table and design metrics of transistor level AFAs proposed in [4].

State-of-the-Art ESAs

Accuracy configurable adders: Depending on the requirements of the applications, ESAs are designed for a targeted accuracy. 2.8, Error Detection Logic (EDL) detects errors and Error Correction Logic (ECL) corrects them based on the requirements of the applications.

Figure 2.7: Generalized architecture of an N -bit ACA designed by dynamic tuning of k and l [12].

Research Directions

Therefore, the next research direction is to investigate the applicability/feasibility of approximate adders for state-of-the-art applications to maximize their benefits. f). Consequently, the other research direction is to use approximate adders to efficiently design other approximate arithmetic operations.

Summary

Delay Estimation

The effort delay further consists of two components: (i) Logical effort (g), which captures the effect of logical complexity; and (ii) Electrical input (h), which captures the effect of external load driven by the logic gate. 3.3), and the electrical effort is given by the ratio of the load capacitance (CL) driven by the logic gate and the input capacitance of the logic gate (Cingate) as:.

Power Estimation

Since dynamic power consumption is a strong function of the logic gate area (a), Eq. 3.9) is only valid for template ports. When determining the normalized dynamic energy consumption using Eqn. 3.11), it is necessary to calculate the switching activity factor at each node of the circuit.

Area Estimation

Assuming that the inputs are uniformly distributed (ie, the P1 of the input nodes is 0.5), we first calculate the P1 of the nodes using Table 3.1.

Proposed AFAs

Delay, Power and Area Estimation

We first calculate the design parameters such as α,g, h, pd, pp, a, dandpnm inv for all the logic gates used in conventional FA and the proposed AFAs. Using Table 3.4, we then calculate the delay (D), normalized dynamic power consumption (Pnm inv) and area (A) of conventional FA and the proposed AFAs by adding the delay (d), normalized dynamic power consumption (pnm inv) and area (a) of individual logic gates in that circuit.

Figure 3.3: Gate level implementations of the proposed AFAs: (a) AFA#1; (b) AFA#2; (c) AFA#3; and (d) AFA#4

ApproxADD

Delay, Power and Area Estimation
ED and ER Estimation
Simulation Setup
Results and Discussion

Hence, ED of an ApproxADD can be given by the weighted sum of input combinations Ai.Bi.Ai−1 and Ai.Bi.Ai−1as:. The analytical results and simulation results of ApproxADD as a function of N for normalized delay, power, area and PDAP are shown in Figs.

Figure 3.4: Generalized architecture of an N -bit ApproxADD designed by cascading N AFA#1 in series.

Improving ED and ER

ApproxADDv 1

Here, it should be noted that the MRED of ApproxADDv1 decreases withk, while its normalized delay, power, area, and PDAP increase with mek. It means that ApproxADDv1 offers improvement in MRED at the cost of delay, power, area and PDAP.

Table 3.5: Probability of carry-lifetimes in N -bit conventional adder for different values of N .

ApproxADDv 2

The proposed EDC logic can be adapted to handle such cases, but it imposes hardware complexity. However, the proposed EDC logic complements output at (i+ 2)thbit position, regardless of whether the output at (i+2)thbit position is correct.

Figure 3.8: Gate level logic implementation of the proposed EDC logic. Note that the numbers indicated on the logic gates and on the input and output nodes of the logic gates have already been discussed in Fig

Performance Analysis

Effect of Adder Architectures

Therefore, the adder architecture used for higher order k-bits shows a significant difference in delay, power and area. Furthermore, the rate at which the normalized delay, power, and area change also depends on the higher-order bit adder architecture.

Figure 3.15: Normalized delay, power and area of 16-bit ApproxBK, ApproxKS and ApproxSk w.r.t

Comparison

Table 3.7 shows that the proposed approach yields the greatest improvement in delay for RCA architecture and in power, area and PDAP for KS architecture.

Multimedia Applications

It can be seen from Table 3.9 that for k ≤ 8, values of IQA metrics are not acceptable (especially the value of SSIM). Our simulation results of IQA metrics after performing DCT and IDCT using 24-bit ApproxADDv2 for the above images are tabulated in Table 3.10.

Figure 3.16: Image processing results: (a) Original image and images processed using 24-bit ApproxADDv2 with: (b) k = 8; (c) k = 10; (d) k = 12; (e) k = 14; (f) k = 16; (g) k = 18; and (h) k = 20.

Summary

Accuracy Estimation

ER Estimation
ED Estimation
MED and MSE Estimation

Furthermore, accuracies of the proposed and existing analytical models of ER for different values of N are tabulated in table 4.3. Furthermore, the analytical models of ER discussed in are for specific configurations, while the proposed analytical model is generalized.

Figure 4.1: Analytical results (for three different cases) and simulation results of ER of a 16-bit ESA for different configurations, i.e., different combinations of k and l.

Delay, Power and Area Estimation

Estimation Method
Delay Estimation
Area Estimation
Power Estimation
Model Validation

Similarly, we derive analytical models to estimate the delay of an ESA implemented using CIA and KSA architectures. Similarly, we derive analytical models to estimate the surface area of an ESA implemented using the CIA and KSA architectures.

Figure 4.7: Generalized adder architecture of conventional adders [6].

Important Observations

We then calculate the accuracy of the proposed analytical models of deceleration, force and area according to Eqn. Table 4.8 shows that the analytical results of the delay, power and area agree well with the simulation results.

Optimization

Objective Functions and Constraints

Based on the combinations of objective functions (delay, force and/or area) and constraints (ED, ER, MED and/or MSE), various cases of optimization are now possible. However, the proposed approach can also be adapted/extended for the other optimization cases.

Optimization Framework

Results and Discussion

Here, the left y-axis shows the minimum delay, power, and area, and the right y-axis shows the corresponding optimal configurations. Therefore, the minimum delay search results tabulated in Table 4.11 are moderately valid for ESA implemented using SA and KA architectures, and the minimum area search results shown in Table 4.10 are moderately valid for ESA implemented using BKA architecture. a) ESA designed using ASK architecture provides minimum delay; (b) ESA designed using CIA architecture provides minimum power; and (c) ESA designed using the RCA architecture provides minimal footprint.

Figure 4.12: Search results for minimal delay, power and area subjected to PR ≥ 0.90 of a 32-bit ESA imple- imple-mented using RCA architecture for different iterations: (a) Delay; (b) Power; and (c) Area

Accuracy Enhancement

Proposed Modifications
Delay and Power Improvements
Multimedia Applications

Accordingly, compared to the original ESAs: (i) modified ESAs provide a higher PR for a given delay and power; or (ii) For a given PR, custom ESAs provide lower latency and power. 4.16(h) shows the absolute difference images generated using original 8-bit ESAs and modified ESAs as a function of kforl =k/2.

Figure 4.13: Accuracy-effort curves of 32-bit ESA (implemented using RCA architecture) as a function of k and l: (a) PR versus delay; and (b) PR versus power

Summary

Conventional SHA-1
Approximate SHA-1
Maximum Approximation

Therefore, compared to the original ESAs: (i) For a given accuracy, the modified ESAs offer lower latency, power, and area; or (ii) For a given delay, power, and area, modified ESAs provide higher accuracy. To evaluate the effectiveness of the proposed approach in real-life applications, we processed the Lena image using the original ESA and the modified ESA.

Figure 5.1: Block diagram of processing block of SHA-1.

Yield Enhancement

Yield Models

Considering f(D) as a normalized distribution function of chips in defect densities, Murphy [120] predicts the functional yield as:. Further, since approximate adders are designed for a targeted application-specified threshold, γ = 1. Consequently, the functional yield in the approximate computation paradigm can be given by: . e−nAD+γnADe−nAD)f(D)dD (5.5).

Figure 5.6: Defect size distribution [17].

Results and Discussion

Our results of the effective yield improvement for the proposed 16-bit ApproxADDv1 and ApproxADDv2 (discussed in Chapter 3) for different values of ork are shown in Fig. 5.9 shown, the functional and parametric yield improvements due to decrease in silicon area and critical path delay are extracted from Fig.

MAC Unit

Hybrid Redundant Adder
Proposed Approach
ApproxMAC Unit

5.11, the ApproxMAC unit utilizes a dedicated approximate hybrid redundant matrix multiplier to perform the multiplication operation. In addition, the total capacity, i.e. PDAP of ApproxMAC units, much better than binary and hybrid redundant MAC units.

Table 5.2: CEVA-XC family of DSP processors [18].

Summary

We evaluated the effectiveness of the proposed approach by comparing ApproxADDv1 and ApproxADDv2 with existing AFA-based approximate adders. We evaluated the effectiveness of the proposed approach in real applications by processing Lena image using original ESAs and modified ESAs.

Future Aspects

Lombardi, “Design of Majority Logic (ML) Based Approximate Full Adders,” in IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, p. ” in IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, p.

Image processing results: (a) Original Lena image; (b) Noisy Lena image; and abso-

Block diagram of processing block of SHA-1

Block diagram of elementary block of processing block

Testing approximate SHA-1 and finding N app max

Threshold testing that identifies acceptable (but faulty) chips [16]

Application-specified threshold: (a) Fail small or fail rare; and (b) Fail small and fail

Defect size distribution [17]

Distribution of CPDs under process parameter variations

Yield improvements for a 16-bit RCA: (a) Functional yield improvement due to de-

Yield improvements for 16-bit ApproxADDv1 and ApproxADDv2

Block diagram of the proposed hybrid redundant ApproxMAC unit

Primary and secondary design parameters of an N -bit ESA

ED max and ER of a 32-bit ESA as a function of k and l

Probability (in %) of the length of carry propagation to be less than (or at least equal

Examples of gate level design simplifications

Truth table and design metrics of gate level AFAs

Truth table and design metrics of transistor level AFAs proposed in [3]

Truth table and design metrics of transistor level AFAs proposed in [4]

Estimated D, P nm inv , A and P DAP of conventional FA (Fig. 3.2) and the proposed

Design parameters of the logic gates used in FA (Fig. 3.2) and the proposed AFAs

Probability of carry-lifetimes in N -bit conventional adder for different values of N

Design parameters of the logic gates used in EDC logic (Fig. 3.8)