1.1 Background Information

It is certified that Teo Sei Hau (ID No: 18ACB03719) has completed this final year project titled "Performance Evaluation of RISC32-E Cryptography" under the supervision of Mr. I understand that the University will upload the soft copy of my final year project in pdf format to the UTAR Institutional Repository, which can be made accessible to the UTAR community and public. I declare that this report entitled "Performance Evaluation of RISC32-E Cryptography" is my own work, except as cited in the references.

Finally, I must thank my family, who have supported me and overcome hardships with me throughout the process with me. This project is about performance evaluation of RISC32, RISC32-E-NQ and RISC32-E-Q for comparison purposes.

INTRODUCTION

Background Information

MIPS
UART
CoreMark Benchmark

Motivation
Problem Statement
Project Scope and Direction
Project Objectives
Impact, Significance and Contribution
Project Organization

In terms of motivation, this project will develop and implement a standard benchmarking system suitable for evaluating the processors. This project will primarily focus on configuring reliable benchmarking code and implementing it for the three RISC32 architectures mentioned. The reliable benchmarking chosen in this project is CoreMark benchmarking certified by EEMBC due to its simplicity and standardization.

After the completion of this project, it will provide a complete performance evaluation result on both RISC32-E microprocessors. In terms of contribution, this project can contribute to have a better idea about the architecture selection by providing more reliable comparison results with the hardware modeling of the research work of the later RISC32-E cryptographic processor.

Figure 1.1.1 F1: Conventional pipeline execution representation [1]

LITERATURE REVIEW

Overview of RISC32-E architecture
RISC32-E-Q
RISC32 Compilation Toolchain
RISC32-E Instruction Set
RISC32 Memory Map
CoreMark

CoreMark List Processing
CoreMark Matrix Processing

Partly because of the two new hardware queues, a new instruction, swc2 (storage word for coprocessor 2), was created to take over the new hardware. Then the llc module, LLVM's static compiler, takes on the role of transforming the LLVM IR into various intermediate forms, such as assembly files and binary object files, to map to the instruction set of the RISC32-E processor. It can decode and execute a subset of the standard MIPS instruction, as shown on the green card in Appendix A.

In the embedded world, CoreMark is usually one of the most common benchmarks that developers choose to test microcontroller and CPU (central processing unit) performance. List processing consists of algorithms such as finding, reversing, and sorting a list item according to various parameters that support the contents of the list data objects. To make sure the correct operation is performed, CoreMark also performs a 16-bit Cyclic Redundancy Check (CRC) based on the calculated data contained in the list objects.

Each of the data16 elements consists of two 8-bit parts, with the upper 8 bits containing the original value of the lower 8 bits. CoreMark will perform operations on the data16 element during each cycle of the testbench codes, depending on the processes involved. The value of list head objects will also change after each iteration of the benchmark. For example, the next pointer will change its value according to the change made when the list is flipped or sorted.

On each iteration of the test bank codes, the algorithm will sort the list based on the details stored in the data16 member. Matrix processing is represented as one of the most important algorithms in CoreMark since many algorithms use matrices and arrays to perform calculations. As in the list processing, CoreMark will also perform cyclic redundancy check (CRC) at the end of the iterations to make sure that all required tested functions have been executed to the data items.[3].

Figure 2.2 F1: Data processing pattern of RISC32-E (without queue system) [4]

PROPOSED METHOD/APPROACH

Proposed solution
Research Methodology for the project

Analyze the RISC32, RISC32-E and RISC32-E-Q processors
Setup the LLVM compilation toolchain
Setup the CoreMark benchmarking

RTL Modeling and Verification
Convert the CoreMark Test Program into RISC32 compatible codes
Calculate the CoreMark/MHz Score
Technologies Involved

LLVM
Xilinx Vivado

Timeline

Gantt chart for Project II

It should be noted here that the source files will be compiled with different optimization levels, from the least optimization level, -O0, to the highest optimization level, -O3, to see the performance of RISC32 processors under different workloads. The OS optimization level, which means that the code size optimization will also be one of the independent test suites. The main goal of this project is to use the CoreMark benchmark codes in the processors to get an idea of the performance of each processor.

So CoreMark codes will be studied to understand how the benchmarking system works to get accurate results. After that, minimal changes will be made to some c source files provided by EEMBC, such as core_portme.c and ee_printf.c, to enable the desired functions, such as sending the results via UART or getting the accurate CPU clock cycles to calculate timing for RISC32 processors. Each block (the smallest units) is verified before being aggregated to unit level.

Provided that all hardware components can successfully meet the necessary specifications, they will be synthesized and deployed on the Xilinx Artix-7 XC7A100T FPGA in Digilent Nexys 4 DDR board using Xilinx Vivado HLx 2020.2 IDE. In addition, the RISC32 compilation toolchain will be installed on a host computer running the Ubuntu 16.04 LTS operating system to compile the provided CoreMark code into compatible code that can be run on the RISC32 processor. The intermediate LLVM codes can be divided into 3 categories: in-memory compiler IR which will be used to generate object files, bitcode representation on disk, and in the form of human understandable assembly language.

The above criteria allow LLVM to be a capable compiler for providing a powerful intermediate representation that is used for compiler transformation and analysis. LLVM is used in this project to translate CoreMark benchmark codes, which are in the c language, into comprehensible RISC32 machine codes. Vivado High-Level Synthesis - A compiler that allows high-level programs such as C, C++, and SystemC to be directly targeted to Xilinx devices without the requirement of manual RTL construction.

Figure 3.7.1 F1: Gantt chart for week 1 until week 5

SYSTEM DESIGN

CoreMark Architecture Analysis

CoreMark Codes Analysis and Modification

core_portme.h file
core_portme.c file
ee_printf.c

Note that the configurations done here will affect how the entire program handles the data. For example, if HAS_FLOAT is set to 1, the time_in_sec variable will be set to float type. The original source codes implement the function using clock() function which requires C standard library which is time.h.

So changes are made to the c language file by replacing the clock() function with the clock cycles counted by the timer in hardware cp0 to achieve the same goal. In addition, the iterations are set to 3, as shown in Figure 4.2.2 F4, which corresponds to approximately 10 seconds. To make the benchmark reliable, the program looks for five starting values that are non-deterministic over time.

Seed values are meant to direct the initialization of values in data structures in a way that is opaque to the user. In this project, seed values are determined based on recommendations from CoreMark test coding distributors. In this project, the validation execution mode is selected as this is the most basic execution mode to test in the program.

Validation Run mode only required approximately 10 seconds as recommended by CoreMark encoding distributors, which matches the repetitions set to 3, as discussed above.

Figure 4.2.1 F2: Has_Float, Has_Time_H, Use_Clock and Has_STDIO configuration

LLVM Compilation Toolchain Setup

COREMARK IMPLEMENTATION

LLVM Installation and Compilation

Testing the LLVM Compilation via UART Communication
CoreMark Assembly Codes

All c files must be independently compiled using clang and the later lld linker will link them together using tags where the tags are the names of the functions contained in the files. Therefore, from the perspective of instruction memory capacity, the compiled CoreMark code, that is, the assembly instruction program, can be successfully placed in the i-cache. This also means that there will be no caching for the instruction cache while the program is running.

Figure 5.1.1 F2: Assembly code of try_uart after conversion using LLVM with optimization level of -o0

Simulation on RISC32 Processor

RISC32 Testbench
try_uart.c
CoreMark Code Simulation and Debugging

Faculty of Information and Communication Technology (Kampar Campus), UTAR .uiorisc_spi_mosi (tb_u_spi_mosi), . Faculty of Information and Communication Technology (Kampar Campus), UTAR always@(posedge tb_r32_pipeline.eng_c_risc.urisc_clk)begin. It should be noted that after the c language programs are converted to hexadecimal code, the hexadecimal code is stored in program.txt, which is loaded into the instruction memory of the RISC32 processor before the simulation starts.

The try_uart.c hardware codes generated by LLVM are loaded into RISC32 instruction memory as described in Section 5.2.1. So, the change is made based on this function to send CoreMark result via UART RISC32. So when the CoreMark codes are transferred to the FPGA board, the CoreMark score and.

The CoreMark code in the RISC32 Verilog model can be simulated, but due to some logic errors, the output is not reliable. The CoreMark program wants to output the result through UART using a function called "ee_printf". The problem started when the program ran in a forever loop inside the ee_vsprintf() function.

ee_vsprintf() uses ee_printf() to perform format conversion based on the format specifier. Due to a forever loop problem, the CoreMark program is unable to continue calculating the CoreMark value and sending it over the UART. Upon analysis, the infinite loop was caused by an instruction below the ee_vsprintf function, which is stored at address 0x8000_20dc, and corresponds to the instruction “beq at, v.

Figure 5.2.2 F1: Final result of try_uart program

CONCLUSION AND RECOMMENDATION

Conclusion

Future Work and Recommendation

Kiat, "THE DESIGN OF AN FPGA-BASED PROCESSOR WITH RECONFIGURABLE PROCESSOR EXECUTION STRUCTURE FOR INTERNET OF THINGS (IoT) APPLICATIONS KIAT WEI PAU MASTER OF SCIENCE (COMPUTER SCIENCE) FACULT OF INFORMATION AND COMMUNICATION TECHNOLOGY UNIVERSITI TUNKU ABD",. Select and modify CoreMark codes to allow the entire program to be ported across RISC32 processors. Need to figure out a way to create functions to calculate timing for the CoreMark program.

Configure the LLVM compiler toolchain settings to map the hex codes generated to the RISC32-E instruction set. Modify the timing functions of CoreMark to calculate timing based on the RISC32 timer. Discover the signals to use to observe CoreMark results.

Bachelor of Information Technology (Honours) Computer Engineering A-l Faculty of Information and Communication Technology (Kampar Campus), UTAR. Bachelor of Information Technology (Honours) Computer Engineering A-m Faculty of Information and Communication Technology (Kampar Campus), UTAR. Note: The supervisor/candidate(s) must provide the Faculty/Institute with an electronic copy of the complete originality report set.

Based on the above results, I declare that I am satisfied with the authenticity of the Final Year Project Report submitted by my student(s) as mentioned above. Form Title: Supervisor's Comments on Originality Report Generated by Turnitin for Submission of Final Year Project Report (for Undergraduate Programs) Form Number: FM-IAD-005 Rev No.: 0 Effective Date Page No. : 1 of 1.