Hardware implementation of the coding algorithm based on FPGA

(1)

To cite this article: M K Ibraimov et al 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1047 012137

View the article online for updates and enhancements.

This content was downloaded from IP address 89.218.177.218 on 08/04/2021 at 05:52

(2)

Content from this work may be used under the terms of theCreative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Published under licence by IOP Publishing Ltd 1

Hardware implementation of the coding algorithm based on FPGA

M K Ibraimov¹, S T Tynymbayev², Jongtae Park³, D M Zhexebay¹ and M A Alimova^1,4

1 Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan

2 Information and Computational Technologies Institute, Almaty, 050010, Kazakhstan

3 Kyungpook National University, Daegu, 41566, Korea

4 E-mail: [email protected]

Abstract. In this article, the efficient implementation multiplier of polynomials irreducible polynomials modulo for cryptographic encryption and decryption using FPGA is presented. For this, the Nexys 4 board based on the Artix-7 Field Programmable Gate Array (FPGA) from Xilinx was chosen. Verilog HDL is used to describe the circuit for reducing a number modulo.

The results of a timing simulation of the device are presented in the form of time diagrams for a given 8-bit number, confirming the correct operation of the device. The developed encryption algorithm on the basis of non-positional polynomial notations is intended for software, hardware, and also software and hardware implementation. The main hardware-implemented device in non-positional algorithm of the cryptographic transformation is a device for the multiplication of polynomials irreducible polynomials modulo, which produces routine calculations on data encryption. These mathematical operations are computationally intensive and fundamental arithmetic operations, which are intensively used in many fields such as cryptography, number theory, and finite field arithmetic.

Topical tendencies of system development and computer equipment require the elaboration of high- performance computing devices, including information security. By information and communication networks and the integrating devices development the need for creating efficient cryptographic transformations hardware solutions will grow [1-6]. The hardware cryptographic system is preferred provides better security, integrity and resistant to power analysis attacks. Non-standard hardware solutions for the implementation cryptographic algorithms can be obtained using Field Programmable Gate Array (FPGA) [7, 9], which allow designing digital devices using high-level hardware description languages.

In particular, the most cryptosystems use modular arithmetic operations, modular multiplying and exponentiation. These two operations are also commonly used in other areas. Consequently, their effective calculation is very important for the embedded systems of security. There are a lot of researches on efficient implementations of modular arithmetic operations include multiplying and exponentiation using various methods to reduce the amount of computation of these operations, especially in public key cryptography. Montgomery [8] and Karatsuba [4] multiplying algorithms are the most efficient and popular algorithms. The authors [4, 8] proposed multiplying algorithms and their approach has some drawback, where, as the radius increases, the complexity of the design and the duration of the clock cycle also increase sharply due to the need to use multipliers with large

(3)

2

The schematic diagram at the register transfer level of the main program block (figure 1) consists of input data blocks (data), encoder (coder), decoder (decoder), frequency divider (div_clk) for graphical interface (VGA), display of encoding and decoding character data (display).

Figure 1. RTL diagram of the main program block on FPGA.

In this paper, an approach to multiplying of polynomials A(x) and B(x) irreducible polynomial modulo P(x) is considered, that is [А(х)*В(х)] mod Р(х), where degA(x), degB(x)<degP(x).

Consider the multiplier operation in accordance with the scheme is shown in figure 2. By «START»

signal, the binary A(x), B(x) and P(x) polynomials coefficients are received by the blocks of the I1, I2, I3 diagrams in the registers RgA(x) and RgB(x), RgP(x). Besides this, by «START» signal, the binary code (k) of the multiplier digits number is received in the TP count. The «START» signal prior to reaching at the single trigger input T is delayed on the DL.1 delay line is determined by the total delay time on RgA(x), I6, AD1, AD2, MS and the recording time of the remainder in the RgR.

Upon the «START» signal is reached the trigger input T and translates it into a single state that allows the first timing pulse TP1 to pass from the output of the I4 diagram. At this point in the RgR register, the partial remainder is 𝑅0 = 𝐶0, with 𝑅𝑛-1 = 1, since egA(x) <degP(x).

The first clock signal RgB(x) is shifted to the left by one digit, while in the high order RgB(x) the value of the next coefficient of the polynomial В(х) – 𝑏𝑛-2 is fixed, provided to the control inputs of the I6 diagrams, and to the information inputs of the polynomial A(x) values coefficients. If, at the same time, 𝑏𝑛-2 = 1 then the polynomial coefficients are provided to the right-hand inputs of the AD21 adder.

TS1 at the time of the RgB(x) shift is delayed by the delay line DL.2 and is provided to the control inputs of diagram I7, and the information inputs are supplied by the remainder from the outputs of the RgP diagrams with a shift by one digit towards the high order.

From I7 output, the doubled remainder is provided to the left inputs of the AD21 adder. When 𝑏𝑛-2

=1the output of this adder is Ci=2*Ri-1⊕A(x).

(4)

3

Figure 2. Functional multiplier diagram of polynomials irreducible polynomials modulo.

If 𝑏𝑛-2 =0, C1= 2 ∗R0. Next, the C1 value is provided to the left inputs of the adder modulo 2 (AD22).

Moreover, if C1< P(x), then the multiplexer (MS) outputs the value C1 and is written to the RgR register forming the value r1. A sign of conditions is C1< P(x) the value of the high bit Ch= 0 of the sum Ci=2*Ri- 1⊕A(x).

If C1≤ P(x), then Ch= 1 and the MS multiplexer outputs the result of operations C1⊕Р(x), shaping also the value R1. Further, the remainder R1 is shifted one digit to the left by the outputs of the I7 diagrams. At this point, the I4 diagram output of the receives the TP2 timing pulse shifting the contents of the RgB(x) register.AD21, RgA(x) inputs are provided depending on the value of bn-3 and the second inputs are provided with the bits of the residual R1 multiplied by two. AD21 output, the C2 value is formed and with the help of the adder AD22 and the multiplexer MS, С2 is modulo, shaping the remainder R2.

Figure 3. Division of 32-bitdata in 15, 8, 5 and 4-bits.

(5)

4

on the delay line DL.3. After that, the result is given to the outputs by the diagram I8.

This cryptosystem on the example of 32-bit binary code, however, an algorithm written in Verilog HDL allows parallel processing of n-bit data is implemented. For example, 32-bit incoming data can be processed in parallel on 4 blocks: 15, 8, 5, 4-bit encoders (figure 3). This increases not only the performance of the cryptosystem, but also the cryptographic strength of the encryption device.

(a)

(b)

Figure 4. The timing diagram of the algorithm. (а) А(х) = х⁴+х²+1; В(х) = х⁴+х+1; Р(х) = х⁵+х³+1, (b) А(х) = х⁴+х+1; В(х) = х⁴+х²+x+1; Р(х) = х⁵+х³+1.

To process the 8-bit code (figure 4) used 0.1% of the registers and 0.33% of the Artix 7 FPGA logical cells. Accordingly, the amount of resources spent for this algorithm on the Artix 7 FPGA is minimal and can process a large amount of data at one time.

Accordingly, in this paper possibility of establishing a fully autonomous encryption device of an n- bit digital code was shown. A classical approach to constructing an algorithm for a multiplier of polynomials irreducible polynomials modulo was considered, where at the first stage the multiplying of two polynomials is calculated, and at the second stage, the resulting multiplying is performed modulo.

The main advantage of this method over other existing methods is that the algorithm compactly combines efficiency using embedded hardware modules for specific devices in the FPGA.

This research a better understanding of modular arithmetic operations in terms of computing for applications based on FPGA, and also includes an empirical analysis of algorithm performance from a hardware point of view was described. Accelerators the computation of cryptographic algorithms, particularly for cryptography based on a multiplier of polynomials irreducible polynomials modulo, to protect sensitive data within the framework of certain restrictions becomes significant in modern processor technology.

References

[1] Vibhor G and Arunachalam V 2011 Architectural Analysis of RSA Cryptosystem on FPGA International Journal of Computer Applications 26(8) 30-4

[2] Poschmann A Y 2009 Lightweight Cryptography - Cryptographic Engineering for a Pervasive World IACR Cryptol. ePrint Arch. (Germany: Bochum Ruhr University) p 516

(6)

5

[3] Kumar S S 2006 Elliptic Curve Cryptography for Constrained Devices (Germany: Bochum Ruhr University) p 160

[4] Karatsuba A and Ofman Yu 1962 Multiplication of Many-Digital Numbers by Automatic Computers Dokl. Akad. Nauk SSSR 145(2) 293-4

[5] Lenstra H W and Tijdeman R J 1984 Computational Methods in Number Theory (Amsterdam:

Math. Cent.) p 403

[6] Gura N, Patel A, Wander A, Eberle H and Shantz S Ch 2004 Comparing Elliptic Curve Cryptography and RSA on 8-bit CPUs Proc. 6th Int’l Workshop Cryptographic Hardware and Embedded Systems (CHES 04) LNCS3156 Lecture Notes in Computer Science 119-32 [7] Ghayoula R, Hajlaoui E, Korkobi T, Traii M and Trabelsi H 2008 FPGA Implementation of RSA

Cryptosystem International Journal of Electrical and Computer Engineering 2(8) 1667-71 [8] Ismail S and Nuray A 2014 Improving the computational efficiency of modular operations for

embedded systems Computer Science J. Syst. Archit. 60(5) 440-51

[9] Nakano K, Kawakami K and Shigemoto K 2009 RSA encryption and decryption using the redundant number system on the FPGA Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing 1 1-8

[10] Kalimoldayev M, Tynymbayev S, Gnatyuk S, Ibraimov M and Magzom M 2019 The device for multiplying polynomials modulo an irreducible polynomial News of the National Academy of Sciences of the Republic of Kazakhstan Series of Geology and Technical Sciences 2(434) 199- 205

[11] Kalimoldayev M, Tynymbayev S, Magzom M, Ibraimov M, Khokhlov S, Abisheva A and Sydorenko V 2019 Polynomials multiplier under irreducible polynomial module for high- performance cryptographic hardware tools CEUR Workshop Proceedings 2393 729-37 [12] Kalimoldayev M, Tynymbayev S, Gnatyuk S, Magzom M, Khokhlov S and Kozhagulov Y 2019

Matrix multiplier of polynomials modulo analysis starting with the lower order digits of the multiplier NEWS of the Academy of Sciences of the Republic of Kazakhstan Series of Geology and Technical Sciences 4(436) 181-7

[13] Kalimoldaev M, Tynimbaev S, Ibraimov M, Mazom M and Khokhlov S Patent for invention No.

33811 A device for multiplying by two polynomials modulo irreducible polynomials No. 2018 / 0834.1, declared 11.13.2018, publ. 08.02.2019