Plaid CTF 2013 - Crypto 250 Giga - Writeup

By meta, tecknicaltom, Reid


Crypto 250 giga was an RSA encryption service. It generated a new RSA keypair for each connection, encrypted and returned the flag, and then provided ciphertext for any number of inputs. The implementation used 1024-bit RSA keys and the standard Python crypto RSA library. By design, asymmetric crypto algorithms allow anyone to encrypt arbitrary plaintext with the public key - so we didn't see much potential for a known/chosen plaintext attack against the service. Thus we focused our efforts on the very suspicious random number generator.

*** Welcome to GIGA! ***
**the super secure key management service**

We are generating an RSA keypair for you now.
(Please be sure to move your mouse to populate the entropy stream)
Congratulations! Key created!
To prove how secure our service is here is an encrypted flag:
Find the plaintext and we'll give you points

Now enter a message you wish to encrypt: AAAA
Your super unreadable ciphertext is:

Now enter a message you wish to encrypt: BBBB
Your super unreadable ciphertext is:

Now enter a message you wish to encrypt: 

Random Numbers and Key Generation

Here is the sketchy RNG:

def rng(n):
  global rbuf
  rand = rbuf[:n]
  rbuf = rbuf[n:]
  while (len(rbuf) < 4096):
    rbuf += rbuf + hr.hexdigest()
  #print "rng(%s) --> %s" % (n,rand.encode('hex'))
  return rand

Although we skipped this step during the CTF, it's not hard to analyze the entropy of the resulting random numbers. Even the world's simplest sanity check counting unique results is shocking. Analyzing the entropy with Burp gives us 2 bits of entropy with a 1% confidence interval.

	$ python 1000 | sort | uniq | wc
	676     676   54756
	$ python 1000 | sort | uniq | wc
	676     676   54756
The random values are used to seed a deterministic algorithm for finding primes p and q.

If the RNG is broken, to such an extent that values repeat frequently, then there is a good chance that p or q will repeat. The reason for large (>= 1024-bit) keys, is not only to make brute force computational attacks infeasible, but also to ensure (probabilistically) that everyone has unique values for p and q. The algorithm used by the Python (and most) crypto libraries to generate primes from a random seed is described in Robert D. Silverman's paper "The Fast Generation of Random, Strong RSA Primes." In short, generate a sequence of potential primes starting with a random 1024-bit integer. Use a sieve to discard all candidates divisible by small primes and test the remaining numbers for primality using the Miller-Rabin probabilistic test. Strong primes refers to the additional restriction that the number be probabilistically prime with an error rate no greater than $$ 2^{-100} $$.

This is an excellent example of how important random numbers are to crypto. Without a good entropy source, even math cannot save you!

Ok, this is broken, but the question remains, how do we exploit it? First, a review on the RSA algorithm and modular arithmetic:

RSA Math Review

RSA key generation involves randomly selecting two large primes (p and q) which will be used to generate the public and private keys.

$$ p,q = $$ distinct randomly chosen primes
$$ n = p \cdot q $$ = public key
$$ e = 65537 $$ = constant which is relatively prime to N (called the exponent)
$$ φ(n) = φ(p)φ(q) = (p − 1)(q − 1) $$

$$ d = $$ private key = multiplicative inverse of $$ e \mod φ(n) $$
Encryption: $$ c \equiv m^e \mod n $$
Decryption: $$ m = c^d \mod n $$

A quick review on modular arithmatic:

$$ a \equiv b \mod n $$ means that $$ n \cdot k + b = a $$ for some integer $$ k $$


$$ 11 \equiv 3 \mod 4 \Rightarrow 4 \cdot k+3=11 $$ where $$ k=2 $$
$$ 18 \equiv 2 \mod 4 \Rightarrow 4 \cdot k+2=18 $$ where $$ k=4 $$


If the so-called random values repeat there is a good chance p or q will repeat. In such a case multiple public keys (n) would share a factor and the private key could be recovered. Recall that each connection generates a new key pair from our randomly selected primes p and q, and that $$ p \cdot q = n $$. If q repeats then:

$$ n_1 = p_1 \cdot q $$
$$ n_2 = p_2 \cdot q $$

While it's not computationally feasible to factor $$ n_1 $$ or $$ n_2 $$ directly, it is computationally feasible to compute the GCD using Euclid's algorithm. Therefore:

$$ \gcd(n_1, n_2) = q $$
$$ p_1 = n_1/q $$

Therefore, given two public keys with a shared factor we can recover p and q and therefore derive the private key! This is the essence of this security challenge and why RNGs are so important, but we aren't there yet. The crypto service doesn't actually give us the value n, it only gives us ciphertext and the ability to encrypt as many payloads as we want. However, with a little more modular arithmetic (yay) we can devise a method for extracting n by selecting very specific values to encrypt - 0x02 and 0x03.

Recall that: $$ c \equiv m^e \mod n $$
and $$ n \cdot k + b = a $$ for some $$ k $$

Therefore we have the following:

$$ n \cdot k_1 = c_1 - m_1^e $$ for some $$ k_1 $$
$$ n \cdot k_2 = c_2 - m_2^e $$ for some $$ k_2 $$

Selecting 0x02 and 0x03 for $$ m_1 $$ and $$ m_2 $$ respectively:

$$ n \cdot k_1 = c_1 - 2^e $$
$$ n \cdot k_2 = c_2 - 3^e $$

And since we know $$ e=65537 $$ and the ciphertexts $$ c_1 $$ and $$ c_2 $$ are given to us by the encryption service, we can produce n because it's the largest common factor of those two values.

$$ n = \gcd(c_1-2^e, c_2-3^e) $$

Connecting the pieces

Now it's time to write some code. Here is our algorithm:

# Connect many times and derive the public key:
	# Read the flag ciphertext (save this for later)
	# Send "\x02" and read the response ciphertext (call this c2)
	# Send "\x03" and read the response ciphertext (call this c3)
	# calculate n given c2 and c3 as described above
# Iterate over the n-values and look for any that share a common factor. 
# If they do, calculate the private key (d) and decrypt the saved ciphertext flag.
Note that we observed many cases where a common factor was 2 which was rather unexpected and perhaps implies that the RNG was even more broken than it looked. We discarded these values and it didn't take long to find N-values with a believably large common factor.

Flag: Im_sure_mega_is_much_better_though

import socket
from Crypto.PublicKey import RSA
from fractions import gcd
from itertools import product

def gethexdata(data):
    lines = (line for line in data.split('\n') if line)
    for line in lines:
            return line
        except TypeError as e:

def read_until_prompt(sock):
    data = msg = sock.recv(2048)
    while "Now enter a message" not in msg:
        msg = sock.recv(2048)
        if not msg.startswith('.'):
            data += msg
    return data

def harvest():

    sock = socket.socket()

    # get the encrypted flag
    data = read_until_prompt(sock)
    cf = gethexdata(data)

    # encrypt 0x02
    data = read_until_prompt(sock)
    c2 = gethexdata(data)

    # encrypt 0x03
    data = read_until_prompt(sock)
    c3 = gethexdata(data)


    # calculate n from c2 and c3
    cf = RSA.pubkey.bytes_to_long(cf.decode('hex'))
    c2 = RSA.pubkey.bytes_to_long(c2.decode('hex'))
    c3 = RSA.pubkey.bytes_to_long(c3.decode('hex'))
    e = 65537
    n = gcd(pow(2,e)-c2, pow(3,e)-c3)

    # verify the result (optional)
    if c3 != pow(RSA.pubkey.bytes_to_long('\x03'),e,n):
        print "N did not verify!"

    return (cf, c2, c3, n)

def decrypt(c, p, q, n, e=65537):
    d = RSA.inverse(e, (p-1)*(q-1))
    m = pow(c, d, n)

    # test
    if d*e % ((p-1)*(q-1)) != 1:
        print "d did not verify"

    if c == pow(m, e, n):
        print "decrypted successfully"

    print RSA.pubkey.long_to_bytes(m)
    return m

if __name__=='__main__':

    pubkeys = []
    done = False

    with open('harvest.log', 'a') as fh:
        while not done:
            # get n and ciphertexts from server
            cf,c2,c3,n = harvest()

            # compute the gcd of this n with each prior n value (cartesian product)
            for n1,n2 in product([n],pubkeys):
                g = gcd(n1, n2)

                # filter out false positives with small gcd values 
                if g > 2**10:
                    print "GCD: ", g, "\n"
                    print "N1: ", n1, "\n"
                    print "N2: ", n2, "\n"
                    print "Flag: ", cf, "\n"
                    print decrypt(cf, n1/g, g, n1)

                    #done = True