Design of an Application Specific Instruction Set Processor Using LISA

This is to certify that the work done for the direction of thesis entitled "Design of an application-specific instruction set processor using LISA" submitted by mr. Umakanta Nanda in partial fulfillment of the requirements for the award of a Master of Technology in Electronics and Communication Engineering with specialization in VLSI Design and Embedded Systems at National Institute of Technology, Rourkela is an authentic work done by him under my supervision and guidance is carried out. I would like to express my heartfelt thanks to my guide, Prof. Kamalakanta Mahapatra, for his guidance, support and encouragement during the course of my master's studies at the National Institute of Technology, Rourkela. Patra, Head of the Department, Electronics and Communication Engineering, National Institute of Technology, Rourkela for his support during my work.

I thank all the members of the Department of Electronics and Communications Engineering and the Institute, who have assisted me by providing the necessary resources, and in various other ways, in completing my work.

INTRODUCTION

Introduction

Motivation
Related Work
Organization of this Thesis

The reviewed ASIP design flows are oriented towards capacity constraints and do not consider the energy consumption of the implementation. There are various ASIP design tools for the entire ASIP design flow from application to implementation. A few design tools that focus on a subset of the ASIP design flow are also presented in the literature.

A framework for assembler-ASIP co-design with feedback from an optimizing assembler to the ASIP design is described in [9].

DESIGN METHODOLOGY OF ASIP

Implementation of DSP Application

Implementation on General Purpose Processor (GPP)
Implementation on General Purpose DSP Processor
Implementation on Application Specific Integrated Circuit (ASIC)Circuit (ASIC)
Implementation on Application Specific Instruction Set Processor (ASIP)Set Processor (ASIP)

Many DSP applications, with or without real-time requirements, can be implemented on a general-purpose processor (GPP). Here, general-purpose DSP means DSP available from a semiconductor supplier and not intended for a specific class of DSP applications. A generic DSP has a generic assembly instruction set that provides good flexibility for many applications.

A general-purpose DSP processor can be used to initialize a product because the system design time will be short.

ASIP Design Flow

Architecture Exploration
Architecture Implementation
Software Application Design
System Integration and Verification

Four design phases [13, 16] are needed to describe the ASIP design shown in the figure 2.3. Decisions are made to split different parts of the application that will either run on dedicated hardware circuits or be implemented in software. It is specified either in a low abstraction level which is in hardware description language or in the processor simulator which is in a higher abstraction level.

The complete micro-architecture of the model is described in HDL, where the simulator tells only the architecture aspects of the processor resources, instruction encoding and the temporal behavior of operations. Register transfer level is a hardware description language (HDL) coding style that describes the processor in the form of registers and interconnected logic. Then the generated HDL model can be compared with the LISA model [4, 17] components as shown in the figure 2.4.

Resource models give the idea about the structure of the architecture such as pipeline stages and pipeline registers. The designer will have full control over the generated HDL model with all its components. A synthesis tool can be used to automatically generate a gate-level netlist that specifies all logic gates and connections that are part of the processor model.

In an automatic place and step of the road, the location of the gates and the conveying paths are determined. No further additions to the programmer's model architecture are allowed at this stage.

Field of Application

OVERVIEW OF LISA

Building a LISA model

Modeling Instructions
Operation Hierarchy Of a Processor in LISA model

ISS Design vs Processor Design
Instruction Accurate vs Cycle Accurate Mod- eling
The Instruction Set Designer
Processor Debugger
Major Benefits

Processor resources include the processor's internal storage elements, as well as dedicated input/output pins and global variables. The internal storage elements of the processor are represented by its registers and its internal memories. As shown in the example above, a resource declaration typically consists of an identifier, a data type specifier, and an optional keyword that defines the semantic type of the resource.

Here, the processor memory has been declared using the MEMORY MAP keyword and various child keyword values. The BEHAVIOR section of the model describes the behavior of the instruction contained in the C block code. According to the description of the LISA 2.0 processor, the processor designer creates software development tools.

In figure 3.3, the hierarchy of the instructions [18] is shown for an assembly language code to find out the convolution of two sequences using FIR filter. Depending on the intent, different usage models of the LISA language and the processor designer product family are distinguished. In processor design, we do need a cycle accurate description of the processor architecture [20] to reflect the effect of the pipeline effects.

With extensive profiling capabilities [14], the development tools of the debugger can enable rapid analysis and exploration of the application-specific processor's instruction set architecture to determine the optimal instruction set for the target application domain. While the LISA hierarchy and the encoding of the instruction set are designed most efficiently with the GUI, the processor's resources and the hardware behavior are still written by hand as LISA code.

TEST CASE: TWO PROCESSORS DESIGN COMPARISION

The Instruction Set Designer

Through this graphical user interface [18] we can view, edit and create any processor model. Instruction sets can be designed and maintained in an intuitive way without having to deal with all the details of the syntax of the LISA language. Changes to the model in the GUI result in only minimal changes to the LISA code.

Here in the above debugger window we can see that our processor can understand the assembly code written for the 2-coefficient FIR filter.

Implementation of General Purpose Proces- sor

Operation profiling

Resource profiling

Reads/Total Contains the ratio of the reads of the specific resource to the number of total reads of all named resources. Read/Max: This tells the proportion of reads from specific register resource to the maximum number of register resource reads given in this column.

Figure 4.5: General Purpose Register Window The following information were gathered from the above model:

Memory profiling

Optimized implementation result

The generated HDL model structure
Comparison of the HDL codes generated
Synthesis Report collected from Cadence DC

To further reduce the resource section, we can use 16 general purpose registers (GPR) instead of 32, which reduces the area of our model. This is shown in figure 4.6. The Processor Generator tool in Processor Designer generated the synthesizable RTL for both processors. There are several entities in the basic structure for the register resources, memory resources, and the pipeline.

The pipeline decoder, which is placed in pipeline level entities, is driven by the pipeline controller. The RTL schematic and flow chart of our optimized model are shown in Figures 4.8, 4.9 and 4.10. The next work in this project is to compare HDL codes generated from two different processors.

This gives the idea about the number of lines of code of the HDL models. It is observed that the HDL code of our optimized model has a very small number of lines compared to that of the previous processor (without optimization). Both processors were then compared with regard to various parameters such as surface area, power, memory used and number of lines of HDL code. The RTL was synthesized using Cadence Encounter [25] and the results are tabulated as shown in Table 4.1.

In schematic design objects, we can see the internal parts of each blocks of the whole architecture. Finally in the technology scheme all the blocks are combined and shown in one window as shown in figure 4.10.

Figure 4.6: Optimized Implementation Result

Layout using MAGMA

Blast Create provides fast and early predictability of results before they are handed off to a backend tool. Blast Create streamlines chip planning and design by eliminating the numerous, cumbersome and error-prone data transfers between point tools in traditional flows. Blast Create outputs a design that is a positioned, timing-correct physical design, with DFT structures inserted, ready for routing.

Floor planning, analysis and refinement of the floor plan, power routing, physical implementation and synthesis are possible in the Blast Fusion environment shown in figure 4.12.

Conclusion

Main Contributions
Conclusion
Future Work

In this thesis, using LISA and the CoWare Processor Designer Platform, a processor model was implemented. The same model was then optimized for an ASIP, an FIR filter in our case. According to the profiling results, the optimization was with respect to resources such as data memory, program memory, instruction set and number of general purpose registers.

The synthesis results were compared and ASIP was found to be much better than the general processor in terms of power, area, used memory and generated lines of HDL code. By considering the profiling, any ASIP can be implemented and optimized with our general processor as a reference. This thesis has presented an optimized design of an Application Specific Instruction Set Processor.

The experimental results reported in the thesis have shown that the proposed ASIP design is better than the general processor in terms of area, power and memory size. Further, we can see that the lines of HDL code in ASIP, generated from CoWare processor designer tool, are much smaller than the general purpose processor. In the future, we can go to design a complex five-stage pipelined FIR filter and we can compare it with a handwritten HDL coded design of the same.

A new methodology for designing application-specific instruction set processors (asips) using a machine description language. IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems, 20(11), November 2001. Specific Instructions - Array Processors Using the LISA Machine Description Language.