NEON NTRU
NEON NTRU
NTRU is a submission to the second round of [NIST’s Post-Quantum Cryptography project](https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Round-2-Submissions). It is a merger of the first round NTRUEncrypt and NTRU-HRSS-KEM submissions. See [ntru.org](https://ntru.org) for more information.
NEON NTRU SIMD instruction acceleration
Cortex-A72 Benchmark
Benchmark on Raspberry Pi 4:
CPU max freq: 1500 Mhz (no overclock)
ARMv8 ASIMD instruction
RAM: 8 GB
Speculator: Yes
Out-of-order: Yes
Branch prediction: Yes
CFLAGS = -O3 -fomit-frame-pointer -mtune=native -fPIC -fPIE -pie -g3 CFLAGS += -Wall -Wextra -Wpedantic LDFLAGS = -lpapi
Compiler version:
- clang version 10.0.1, Target: aarch64-unknown-linux-gnu
- gcc version 10.2.0 (GCC) aarch64-unknown-linux-gnu
Timing benchmark in microsecond (us)
Parameter | Keygen | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 15967.9 | 809.2 | 2122.1 | 696.4 | 699.9 |
NEON GCC | 7156.8 | 135.2 | 89.8 | - | - |
NEON Clang | 6164.5 | 139.9 | 91.2 | 25 | 26 |
Ref/NEON Clang | 2.59 | 5.78 | 23.26 | 27.84 | 26.88 |
Parameter | Keygen | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 30103.0 | 1368.9 | 4011.1 | 1320 | 1377 |
NEON GCC | 13352.5 | 92.1 | 174.2 | - | - |
NEON Clang | 11994.0 | 91.3 | 174.0 | 48 | 48 |
*Ref/NEON Clang * | 2.50 | 14.99 | 23.05 | 27.50 | 28.68 |
Parameter | Keygen | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 28207.5 | 1397.3 | 3761.2 | 1229 | 1238 |
NEON GCC | 12681.5 | 195.5 | 144.8 | - | - |
NEON Clang | 11026.6 | 197.6 | 138.9 | 37 | 38 |
Ref/NEON Clang | 2.55 | 7.07 | 27.07 | 33.21 | 32.57 |
Parameter | Keygen | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 41108.4 | 1998.9 | 5533.1 | 1808 | 1815 |
NEON GCC | 18309.7 | 242.0 | 184.8 | - | - |
NEON Clang | 16278.3 | 248.6 | 186.8 | 51 | 53 |
REF/NEON Clang | 2.52 | 8.04 | 29.62 | 35.45 | 34.24 |
Cortex-A53 Benchmark
CPU max freq: 1200 Mhz (no overclock)
ARMv8 ASIMD instruction
RAM: 1 GB
Speculative: Yes
Out-of-order: No
Branch prediction: Unknown
Benchmark on Raspberry Pi 3B 64-bit OS ARMv8
CC = /usr/bin/clang CFLAGS = -O3 -fomit-frame-pointer -mtune=native -fPIC -fPIE -pie -g3 CFLAGS += -Wall -Wextra -Wpedantic LDFLAGS = -lpapi
Compiler version: clang version 10.0.1, Target: aarch64-unknown-linux-gnu
Timing benchmark in microsecond (us)
Parameter | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 1839 | 4710 | 1547 | 1556 |
NEON Clang | 344 | 175 | 47 | 48 |
REF/NEON Clang | 5.34 | 26.91 | 32.91 | 32.41 |
Parameter | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 3060 | 8896 | 2923 | 2936 |
NEON Clang | 207 | 311 | 83 | 84 |
REF/NEON Clang | 14.78 | 28.60 | 35.21 | 34.95 |
Parameter | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 3134 | 8281 | 2728 | 2740 |
NEON Clang | 483 | 256 | 70 | 71 |
REF/NEON Clang | 6.49 | 32.34 | 38.97 | 38.59 |
Parameter | Encap | Decap | poly_Rq_mul | poly_S3_mul |
Ref | 4504 | 12125 | 4005 | 4020 |
NEON Clang | 598 | 327 | 92 | 94 |
REF/NEON Clang | 7.53 | 37.08 | 43.53 | 42.76 |
Questions?
Feel free to create a pull request.
Is NTRU faster than SABER? Want a comparison?
You can find my other repo NEON implementation of SABER here: https://github.com/cothan/SABER
References
Cryptology ePrint Archive: Report 2020/795 Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches https://eprint.iacr.org/2020/795