# CWE-208: Constant-time Implementation to Lattice-based Post-Quantum Cryptography ### Abstract

Very soon, when a large scale quantum computer is built, the modern public-key cryptography we know today will soon be broken by Shor’s algorithm. Follow NIST’s standardization call, many post-quantum cryptography software submissions are implemented in a secure way to thwart side-channel attack. In this paper, we discuss the techniques used in Post-Quantum Cryptography implementation to counter CWE-208.

### Summary

In my coursework, I write a paper which analyse implementation of Post-Quantum Cryptography Finalist, and draw some lessons for myself.

The paper is concise and easy to read for everyone, the text in the paper is 50% of the work, many lessons I learn from the godbolt.org link.

### Do this

#### Effect of Compiler Optimization

int select(bool b, int x, int y) {
return b ? x : y;
}

int ifelse(bool b, int x, int y) {
if (b)         return x;
else           return y;
}

int mul(bool b, int x, int y) {
return (x*b) | (1-b)*y;
}

int and_inverse(bool b, int x, int y) {
return (x & (-b)) | (~(-b)) & y;
}


See disassembly

Clang realizes 4 functions have similar logic, while GCC doens’t.

GCC does not optimize the code while clang is able to convert 4 conditional selections to the same assembly instructions.

And conditional move instruction cmove is not always exists.

Look at RISC-V, AVR assembly

Therefore you should not use the code above to do comparison. It’s not constant-time (optimzed by compiler) and my expensive on other platforms where compiler is not mature enough.

#### Constant-time Comparison

// Input: *a, *b, len
uint8_t compare(uint8_t *a, uint8_t *b, int len) {
int i;
uint8_t r = 0;
for(i=0;i<len;i++)
r |= a[i] ^ b[i];
return (-r) >> 7;
}
// Output: 0 or 1


This code outsmarts Clang and GCC

See disassembly

#### Constant-time Conditional Move

// Input: *a, sel in {0, 1}
for (i = 0; i < length; i++)
secret[i] ^= (-sel) & (a[i] ^ secret[i]);
// Output: *secret


The code is not optimized by compiler.

See disassembly

#### Constant-time Rejection Sampling

// Input: *a
uint16_t i, j, sel, secret = {0};
i = 0; j = 0;
while (j < 256) {
d = a[i] | (a[i+1] << 8);
if (d < upperbound)
sel = 1;
else
sel = 0;
secret[j] ^= (-sel) & (d ^ secret[j]);
j += sel;
i += 2;
}
// Output: *secret


Balance if-else therefore not timing leakage

#### Constant-time Conditional Subtraction

// Input: value, bound
int16_t csubq(int16_t value, const int16_t bound) {
value -= bound;
value += (value >> 15) & bound;
return value;
}
// Output: value


I copied this code from PQC finalist, it’s perfect.

See disassembly

### Pitfall; Don’t do this

#### memcmp or strcmp

// Input: *a, *b
for (int i = 0; i < length; i++) {
if (a[i] != b[i]) break;
}
// Output: 0 or 1


Non constant-time comparison.

#### Cache timing leakage

// Input: check, *a, *b
for (int i = 0; i < length; i++) {
if (!check)
secret[i] = a[i];
else
secret[i] = b[i];
}
// Output *secret


Either a or b will be in L1 cache. See flush + reload cache timing attack. This leak which variable is selected.

#### Conditional timing leakage

// Input: value
if (value > bound)
value -= bound
// Output: value


If you try to fix with this code below

// Input: value
if (value > bound)
value -= bound
else
value = value;
// Output: value


Sadly, the compiler will optimize the else branch. 