NEON NTRU

NEON NTRU

NTRU is a submission to the second round of [NIST’s Post-Quantum Cryptography project](https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Round-2-Submissions). It is a merger of the first round NTRUEncrypt and NTRU-HRSS-KEM submissions. See [ntru.org](https://ntru.org) for more information.

NEON NTRU SIMD instruction acceleration

Cortex-A72 Benchmark

Benchmark on Raspberry Pi 4:

  • CPU max freq: 1500 Mhz (no overclock)

  • ARMv8 ASIMD instruction

  • RAM: 8 GB

  • Speculator: Yes

  • Out-of-order: Yes

  • Branch prediction: Yes

Compiler flags
CFLAGS = -O3 -fomit-frame-pointer -mtune=native -fPIC -fPIE -pie -g3
CFLAGS += -Wall -Wextra -Wpedantic
LDFLAGS = -lpapi

Compiler version: - clang version 10.0.1, Target: aarch64-unknown-linux-gnu - gcc version 10.2.0 (GCC) aarch64-unknown-linux-gnu

Timing benchmark in microsecond (us)

Table 1. A72-HPS2048509

Parameter

Keygen

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

15967.9

809.2

2122.1

696.4

699.9

NEON GCC

7156.8

135.2

89.8

-

-

NEON Clang

6164.5

139.9

91.2

25

26

Ref/NEON Clang

2.59

5.78

23.26

27.84

26.88

Table 2. A72-HRSS701

Parameter

Keygen

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

30103.0

1368.9

4011.1

1320

1377

NEON GCC

13352.5

92.1

174.2

-

-

NEON Clang

11994.0

91.3

174.0

48

48

*Ref/NEON Clang *

2.50

14.99

23.05

27.50

28.68

Table 3. A72-HPS2048677

Parameter

Keygen

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

28207.5

1397.3

3761.2

1229

1238

NEON GCC

12681.5

195.5

144.8

-

-

NEON Clang

11026.6

197.6

138.9

37

38

Ref/NEON Clang

2.55

7.07

27.07

33.21

32.57

Table 4. A72-HPS4096821

Parameter

Keygen

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

41108.4

1998.9

5533.1

1808

1815

NEON GCC

18309.7

242.0

184.8

-

-

NEON Clang

16278.3

248.6

186.8

51

53

REF/NEON Clang

2.52

8.04

29.62

35.45

34.24

Cortex-A53 Benchmark

  • CPU max freq: 1200 Mhz (no overclock)

  • ARMv8 ASIMD instruction

  • RAM: 1 GB

  • Speculative: Yes

  • Out-of-order: No

  • Branch prediction: Unknown

Benchmark on Raspberry Pi 3B 64-bit OS ARMv8

CC = /usr/bin/clang
CFLAGS = -O3 -fomit-frame-pointer -mtune=native -fPIC -fPIE -pie -g3
CFLAGS += -Wall -Wextra -Wpedantic
LDFLAGS = -lpapi

Compiler version: clang version 10.0.1, Target: aarch64-unknown-linux-gnu

Timing benchmark in microsecond (us)

Table 5. A53-HPS2048509

Parameter

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

1839

4710

1547

1556

NEON Clang

344

175

47

48

REF/NEON Clang

5.34

26.91

32.91

32.41

Table 6. A53-HRSS701

Parameter

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

3060

8896

2923

2936

NEON Clang

207

311

83

84

REF/NEON Clang

14.78

28.60

35.21

34.95

Table 7. A53-HPS2048677

Parameter

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

3134

8281

2728

2740

NEON Clang

483

256

70

71

REF/NEON Clang

6.49

32.34

38.97

38.59

Table 8. A53-HPS4096821

Parameter

Encap

Decap

poly_Rq_mul

poly_S3_mul

Ref

4504

12125

4005

4020

NEON Clang

598

327

92

94

REF/NEON Clang

7.53

37.08

43.53

42.76

Questions?

Feel free to create a pull request.

Is NTRU faster than SABER? Want a comparison?

You can find my other repo NEON implementation of SABER here: https://github.com/cothan/SABER

References

Cryptology ePrint Archive: Report 2020/795

Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches
https://eprint.iacr.org/2020/795
Duc Tri Nguyen
Duc Tri Nguyen
Graduate Research Assistant

My research interests include implementation of Post-Quantum Cryptography using High Level Synthesis in FPGA and NEON instruction in ARM platform, beside, sometimes I play CTFs, Crypto and Reverse Engineering are my favorite categories.

Related