Very soon, when a large scale quantum computer is built, the modern public-key cryptography we know today will soon be broken by Shor’s algorithm. Follow NIST’s standardization call, many post-quantum cryptography software submissions are implemented in a secure way to thwart side-channel attack. In this paper, we discuss the techniques used in Post-Quantum Cryptography implementation to counter CWE-208.
In my coursework, I write a paper which analyse implementation of Post-Quantum Cryptography Finalist, and draw some lessons for myself.
The paper is concise and easy to read for everyone, the text in the paper is 50% of the work, many lessons I learn from the godbolt.org link.
int select(bool b, int x, int y) {
return b ? x : y;
}
int ifelse(bool b, int x, int y) {
if (b) return x;
else return y;
}
int mul(bool b, int x, int y) {
return (x*b) | (1-b)*y;
}
int and_inverse(bool b, int x, int y) {
return (x & (-b)) | (~(-b)) & y;
}
Clang realizes 4 functions have similar logic, while GCC doens’t.
GCC does not optimize the code while clang is able to convert 4 conditional selections to the same assembly instructions.
And conditional move instruction cmove
is not always exists.
Therefore you should not use the code above to do comparison. It’s not constant-time (optimzed by compiler) and my expensive on other platforms where compiler is not mature enough.
// Input: *a, *b, len
uint8_t compare(uint8_t *a, uint8_t *b, int len) {
int i;
uint8_t r = 0;
for(i=0;i<len;i++)
r |= a[i] ^ b[i];
return (-r) >> 7;
}
// Output: 0 or 1
This code outsmarts Clang and GCC
// Input: *a, sel in {0, 1}
for (i = 0; i < length; i++)
secret[i] ^= (-sel) & (a[i] ^ secret[i]);
// Output: *secret
The code is not optimized by compiler.
// Input: *a
uint16_t i, j, sel, secret[256] = {0};
i = 0; j = 0;
while (j < 256) {
d = a[i] | (a[i+1] << 8);
if (d < upperbound)
sel = 1;
else
sel = 0;
secret[j] ^= (-sel) & (d ^ secret[j]);
j += sel;
i += 2;
}
// Output: *secret
Balance if-else
therefore not timing leakage
// Input: value, bound
int16_t csubq(int16_t value, const int16_t bound) {
value -= bound;
value += (value >> 15) & bound;
return value;
}
// Output: value
I copied this code from PQC finalist, it’s perfect.
// Input: *a, *b
for (int i = 0; i < length; i++) {
if (a[i] != b[i]) break;
}
// Output: 0 or 1
Non constant-time comparison.
// Input: check, *a, *b
for (int i = 0; i < length; i++) {
if (!check)
secret[i] = a[i];
else
secret[i] = b[i];
}
// Output *secret
Either a
or b
will be in L1 cache. See flush + reload
cache timing attack.
This leak which variable is selected.
// Input: value
if (value > bound)
value -= bound
// Output: value
If you try to fix with this code below
// Input: value
if (value > bound)
value -= bound
else
value = value;
// Output: value
Sadly, the compiler will optimize the else
branch.