APPROXIMATE COMPUTING BASED PROCESSING OF MEA APPROXIMATE COMPUTING BASED PROCESSING OF MEA SIGNALS ON FPGA

Mohammad Hassan received his PhD in Electrical Engineering from the Department of Electrical and Communication Engineering, College of Engineering at UAE University, UAE. He also received his Masters in Electrical Engineering from the College of Engineering, UAE University, UAE.

Introduction

Overview
Statement of the Problem
Research Objectives
Relevant Literature

MEA Signal Processing Systems

The work of Muller et al. 2013) is an implementation of a closed-loop system for spike detection and stimulus generation using FPGA. The work of Doliw et al. 2022) presented an analog interface for brain computer interfaces.

Table 1 presents a benchmarking of this work with state of the art MEA processing systems implemented in recent years along with the main contributions

Methods

Research Design

The choice was mainly based on the parallelism, flexibility and area of the approximate summing circuit to take advantage of the FPGA device used. All parts of the approximate system are built the same as the exact ones, except for the use of approximate computer circuits in their calculations.

System Overview

The rough adders selected in the second step are used in building different versions of the rough processing system, where each version is based on one of the rough adders. Processing begins with filtering the amplified and digitized raw neural signal received from the MEA device in the filtering module, then the filtered signal.

Proposed Approximate Processing System

Using FPGA for Implementation
Enhancing the Filtering Process
Applying Approximate Computing Algorithms

Choosing the specifications of the filters is a crucial factor in improving the performance of the processing system. The filter cutoff frequency can be configured by setting the FIR filter coefficients to support different signal types. The output of the order N FIR filter is a convolution of input signal x[n] and coefficients h[n].

It is a sequence of values where each value is the weighted sum of the most recent input values, as shown in equation 2.1. Filter order is another critical parameter to consider during filter design. The mathematical model of the FIR filter in equation 2.1 shows that addition is the most frequent operation compared to multiplication.

The adder CPredA is chosen as an example of adders that approximate the output carry in their algorithms. At the same time, AA2 is chosen as an example of adders that approximate the output sum in their algorithms.

Figure 3: Conceptual Structure of FPGA (Andina et al., 2020)

Implementation

Implementation of Approximate Adders

Carry Prediction Full Adder (CPredA)
Approximate Adder (AA2)
Generic Accuracy Configurable Adder (GeAr)

Multiple instances of the previous basic block are cascaded according to the width and level of approximation required to produce high-order and 32-bit adders. Thus, CPredA will predict the bottom bite of the result while the adders are full. For half the prediction accuracy in the previous 8-bit adder, four instances of AA2 will predict the lower half of the result, while four full adders will predict the upper half.

GeAr adder breaks the full adder's carry chain by using KL-bit sub-adders to perform the approximate addition of operands of N-bit length. Each sub-adder, except the first, produces 2 bits of the final result depending on the last 2 bits. Thus, the carry propagation will be limited only to the length of the segment represented by L.

Controlling the result accuracy in GeAr can be done by combining it with longer segments of the accurate adder. If the upper six bits of the result are required to be calculated accurately in the previous 12-bit adder, two 4-bit sub-adders can be combined with a 6-bit full adder, as illustrated in Figure 22.

Figure 8 shows the circuit diagram of CPredA. Sum (S) and Carry (Cout) are produced as shown in Equations 3.1 and 3.2:

Implementation of Processing Systems

Implementation of FIR Module
Implementation of Spike Detection Module

The following sections describe the implementation of the processing system's FIR and peak detection modules. 41 values to be used in the Verilog programming code of the filter implementation on FPGA. We implemented the FIR filter in Verilog by dividing it into four blocks that operate in parallel on each positive edge of the clock, as illustrated in Figure 26.

The second FIR filter block uses 12-bit adders to add every two opposite samples. It stores the results in temporary first-level adder registers on each positive edge of the clock. The first four lines of the previous code change when implementing the block into an approximate system.

The first four lines of the last code specify 24-bit full adders as target adders. As shown in Figure 33, the spike detection module receives a 12-bit filtered sample generated by the FIR filter on each positive edge of the system clock.

Figure 24: Detailed real-time processing system

System Design Summary

The outer wrapper module manages filtering and spike detection modules to synchronize input and output, clocking and data flow. It receives the raw 12-bit neural sample generated by the MEA device at each clock cycle and forwards it to the FIR filter module for. It also triggers the spike detection module to retrieve the filtered sample from the FIR filter and compare it to the threshold value.

A filtered sample is produced at the output port of the wrapper module at each clock cycle along with the spike information and the counter. The FIR module receives three signals: the system clock signal, the system reset signal, and the digitized 12-bit raw neural data. The output of this module is a 12-bit filtered neural signal sent to the spike detection module.

The peak detector also receives three input signals, including the system clock signal used for synchronization, the system reset signal, and the filtered neural data generated by the FIR filter. It compares the received sample at each positive edge of the clock to the negative threshold and outputs a peak detect signal if the sample value is below the threshold.

Figure 35: Common schematic diagram for All Systems

Results and Discussion

Evaluation of Approximate adders

Delay Time
Area
Power Estimation
Accuracy

The tables show the latency for each rough adder at different widths along with the latency of the full adder for comparison. They also show the reduction in time, area and power normalized to full stacking. Reference source not found.-37 shows the delay time for all accumulators, including the full accumulator for reference.

57 In both configurations, CPredA and GeAr significantly reduce latency due to the parallelism in their architecture. The latency of AA2 is higher than CPredA and GeAr, but still less than the full adder since this adder simplifies the output generation without breaking the carry chain. CPredA and AA2 have smaller areas because their logic circuits are more straightforward than regular full-adder circuitry.

As expected, GeAr is showing values closest to the full adder since it is multiple. In the full forecast configuration, NMED is constant for all bin widths, with GeAr having the most negligible value.

Table 6 summarize the characteristics of the three approximate adders in Half Prediction and Full Prediction configurations

Evaluation of Filtering Errors

Every two opposite signal samples in FIR are summed using 12-bit adders. The results of the previous additions are multiplied by the appropriate coefficients and then summed using 24-bit adders. When 12-bit adders are in the half prediction configuration, they predict the lower six bits of the sum, while in the full prediction configuration they predict the twelve bits of the sum.

Similarly, the 24-bit adders predict twelve lower bits of the sum in half prediction while they fully predict the twenty-four bits of the sum in full prediction configuration. It shows the approximation level and adder configuration for the 12-bit adders used in block 2 and the 24-bit adders used in block 4 of the FIR filter. The previous three figures show that when we increase the approximation level of the 12- or 24-bit adders, the accumulated error in the output signal of the filter increases.

Level 3 uses precise 12-bit adders, with 24-bit adders set to a half prediction configuration. Processing system accuracy is expected to be high at level 3 and worse at level 4 than at level 2.

Figure 44 illustrates that the FIR filter in the proposed processing system uses two adder widths, 12 and 24-bit

Evaluation of the Approximate Processing System

Testing System Accuracy
Testing System Performance

The first accuracy test of the approximate systems is performed when 12-bit adders are in the semi-predictive configuration and 24-bit exact adders are used. Eight bits of 12-bit adders are predicted in the first extra level and ten bits are predicted in the second extra level. The results of the three segments are then combined to form the full 12-bit result.

The level of approximation in 12-bit adders is increased in this mode. All three versions of the approximation systems were developed by fully predicting the outputs of 12-bit adders and accurately calculating the outputs of 24-bit adders. Accuracy drops in this mode due to the high degree of approximation of 12-bit adders. In this mode, 8 bits of 12-bit adder results are predicted, while 24-bit adder results are half predicted.

In this mode, 10 bits of 12-bit adder results are predicted, while 24-bit adder results are half predicted. Improve system accuracy at high approximation levels by integrating error correction units into approximate systems.

Table 9: System Accuracy Results at Different Approximation Levels Adder Configuration System Accuracy Level 12-bit Adder 24-bit Adder CPredA AA2 GeAr

Conclusion

93-circuit area and carry calculation, provides a reduction of up to 29.6% in processing time and 14.3% in circuit area without compromising system accuracy or increasing power consumption, even in half-approximation mode. Development and validation of a spike detection and classification algorithm aimed at implementation on hardware devices. In vitro multifunctional microelectrode array with 59760 electrodes, 2048 electrophysiology channels, stimulation, impedance measurement and neurotransmitter detection channels.

A novel fixed-array multi-microelectrode system designed for long-term monitoring of extracellular single unit neuronal activity in vitro. In vitro studies of neuronal networks and synaptic plasticity in invertebrates and mammals using multielectrode arrays. 25. IET Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT.

Exploiting all-programmable SoCs in neural signal analysis: A closed-loop control for large-scale CMOS multielectrode arrays. A brief overview of in vitro models of injury and regeneration in the peripheral nervous system.