Wednesday, July 26, 2017

Range of Random.nextGaussian, continued

In the first part I described a theoretical range of values returned by nextGaussian; in this part I will describe the way used to find out the actual minimum and maximum values.

The simplest way would require using brute force to check all 248 possible seed values. This would take a few years on my computer, so I had to do better than that.

As explained in the first part, in order to maximize the values of nextGaussian, the values of v1 and v2 need to be as close to zero as possible. For that, the value returned by nextDouble needs to be close to 0.5. So, how do we make the random number generator return the values we want?

NextDouble calls next two times, first to get top 26 significant bits of the result, then again to get next 27 bits. If we can get the top bits to be 1 << 25, the resulting double will be very close to 0.5.

A call to next first updates the seed value used by Random, then returns top N bits of the new seed. So, we know the desired seed value after the call to next. In order to find out the value for seed before the call, we can use the simple reversing algorithm found here:
private static final long multiplier = 0x5DEECE66DL;
private static final long addend = 0xBL;

private static long reverseSeed (long seed) {
    // reverse the addend from the seed
    seed -= addend;
    long result = 0;
    // iterate through the seeds bits
    for (int i = 0; i < 48; i++) {
        long mask = 1L << i;
        // find the next bit
        long bit = seed & mask;
        // add it to the result
        result |= bit;
        if (bit == mask) {
            // if the bit was 1, subtract its effects from the seed
            seed -= multiplier << i;
        }
    }
    return result;
} 
Then, we also need to counter the initial scrambling that happens in setSeed operation:
private static long unscramble(long seed) {
    return seed ^ multiplier;
}
 And we're ready to start hacking:
long seedRange = 0x2000000L;
Random rand = new Random();
double minSoFar = 0, maxSoFar = 0;
for(long seed = -seedRange;seed<seedRange;seed++) {
    rand.setSeed(unscramble(reverseSeed(0x800000000000L + seed)));
    double d = rand.nextGaussian();
    if(minSoFar > d) {
        minSoFar = d;
        System.out.println("Seed: "+seed+ " min: "+d);
    }
    if(maxSoFar < d) {
        maxSoFar = d;
        System.out.println("Seed: "+seed+ " max: "+d);
    }

    d = rand.nextGaussian();
    if(minSoFar > d) {
        minSoFar = d;
        System.out.println("Seed: "+seed+ " second min: "+d);
    }
    if(maxSoFar < d) {
        maxSoFar = d;
        System.out.println("Seed: "+seed+ " second max: "+d);
    }
}
The lowest value found was: -7.844680087923773 (on second call to nextGaussian with seed = 994892)
The highest value found was: 7.995084298635286 (on first call to nextGaussian with seed = 14005843)
These are the real maximum & minimum values returned by Oracle Java 8 implementation of nextGaussian.

Range of Random.nextGaussian()

Recently I had a look at the source code of a language detection library. The library internally uses Random.nextGaussian to determine its behavior, but uses it in such a way that any value lower than -10 would result in incorrect calculations. I was curious if getting such a value is even possible.


The implementation of nextGaussian in Java 8 is as follows:
 private double nextNextGaussian;
 private boolean haveNextNextGaussian = false;

 public double nextGaussian() {
   if (haveNextNextGaussian) {
     haveNextNextGaussian = false;
     return nextNextGaussian;
   } else {
     double v1, v2, s;
     do {
       v1 = 2 * nextDouble() - 1;   // between -1.0 and 1.0
       v2 = 2 * nextDouble() - 1;   // between -1.0 and 1.0
       s = v1 * v1 + v2 * v2;
     } while (s >= 1 || s == 0);
     double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
     nextNextGaussian = v2 * multiplier;
     haveNextNextGaussian = true;
     return v1 * multiplier;
   }
 }
The function used to generate next values has extremes when both v1 and v2 are as close to zero as possible, without being both equal to zero.

Quick check of nextDouble indicates that the function generates only multiples of 1 / (double)(1L << 53). This means that nextGaussian will return extreme values for v1 = +/- 2 / (double)(1L << 53)and v2 = 0.

For these values nextGaussian would return +/- 12.00727336061225.

That does not mean that it is possible to get these values. There's a limited number of values that can be returned by nextDouble. But it does mean that nextGaussian (in its Oracle Java 8 implementation) will never return a value outside of the range between -12.00727336061225 and 12.00727336061225.