CWE-208: Constant-time Implementation to Lattice-based Post-Quantum Cryptography

Abstract

Very soon, when a large scale quantum computer is built, the modern public-key cryptography we know today will soon be broken by Shor’s algorithm. Follow NIST’s standardization call, many post-quantum cryptography software submissions are implemented in a secure way to thwart side-channel attack. In this paper, we discuss the techniques used in Post-Quantum Cryptography implementation to counter CWE-208.

Summary

In my coursework, I write a paper which analyse implementation of Post-Quantum Cryptography Finalist, and draw some lessons for myself.

The paper is concise and easy to read for everyone, the text in the paper is 50% of the work, many lessons I learn from the godbolt.org link.

Do this

Effect of Compiler Optimization

int select(bool b, int x, int y) {
    return b ? x : y;
}

int ifelse(bool b, int x, int y) {
    if (b)         return x;
    else           return y;
}

int mul(bool b, int x, int y) {
    return (x*b) | (1-b)*y;
}

int and_inverse(bool b, int x, int y) {
    return (x & (-b)) | (~(-b)) & y;
}

See disassembly

Clang realizes 4 functions have similar logic, while GCC doens’t.

GCC does not optimize the code while clang is able to convert 4 conditional selections to the same assembly instructions.

And conditional move instruction cmove is not always exists.

Look at RISC-V, AVR assembly

Therefore you should not use the code above to do comparison. It’s not constant-time (optimzed by compiler) and my expensive on other platforms where compiler is not mature enough.

Constant-time Comparison

// Input: *a, *b, len
uint8_t compare(uint8_t *a, uint8_t *b, int len) {
  int i;
  uint8_t r = 0;
  for(i=0;i<len;i++)
    r |= a[i] ^ b[i];
  return (-r) >> 7;
}
// Output: 0 or 1

This code outsmarts Clang and GCC

See disassembly

Constant-time Conditional Move

// Input: *a, sel in {0, 1}
for (i = 0; i < length; i++)
    secret[i] ^= (-sel) & (a[i] ^ secret[i]);
// Output: *secret

The code is not optimized by compiler.

See disassembly

Constant-time Rejection Sampling

// Input: *a
uint16_t i, j, sel, secret[256] = {0};
i = 0; j = 0; 
while (j < 256) {
    d = a[i] | (a[i+1] << 8);
    if (d < upperbound)      
        sel = 1;
    else                        
        sel = 0; 
    secret[j] ^= (-sel) & (d ^ secret[j]);
    j += sel;
    i += 2;
}
// Output: *secret

Balance if-else therefore not timing leakage

Constant-time Conditional Subtraction

// Input: value, bound
int16_t csubq(int16_t value, const int16_t bound) {
  value -= bound;
  value += (value >> 15) & bound;
  return value;
}
// Output: value

I copied this code from PQC finalist, it’s perfect.

See disassembly

Pitfall; Don’t do this

memcmp or strcmp

// Input: *a, *b
for (int i = 0; i < length; i++) {
	if (a[i] != b[i]) break;
}
// Output: 0 or 1

Non constant-time comparison.

Cache timing leakage

// Input: check, *a, *b
for (int i = 0; i < length; i++) {
    if (!check)     
        secret[i] = a[i];
    else            
        secret[i] = b[i];
}
// Output *secret

Either a or b will be in L1 cache. See flush + reload cache timing attack. This leak which variable is selected.

Conditional timing leakage

// Input: value
if (value > bound)
    value -= bound
// Output: value

If you try to fix with this code below

// Input: value
if (value > bound) 
    value -= bound
else
    value = value;
// Output: value

Sadly, the compiler will optimize the else branch.

Duc Tri Nguyen
Duc Tri Nguyen
Graduate Research Assistant

My research interests include implementation of Post-Quantum Cryptography using High Level Synthesis in FPGA and NEON instruction in ARM platform, beside, sometimes I play CTFs, Crypto and Reverse Engineering are my favorite categories.

Related