Menu Home

Portably(ish) generating random integers

[160309/20:49][rawiri@minty ~]$ rand -h
rand - generate random positive integers
Optional Arguments:
-c [count.  Number of numbers (Default:1)]
-d [debug.  Tells you which processing method is used (Default:off)]
-h [help]
-m [minimum number (Default:1)]
-M [maximum number (Default:9223372036854775807)]

TL;DR: I made a script that should generate a random integer for you on almost anything UNIX/Linux-ish. You can grab it from here:

https://github.com/rawiriblundell/scripts/blob/master/rand

Why is it that I can’t login to any old UNIX or Linux box and enter a command like, say, random and get a random number back?  You’d expect that to be a classic command, as ubiquitous as, say, tar.  I have a need to insert a random delay into a script before an action occurs, and it needs to run on Linux, Solaris, HP-UX and AIX at the very least. But there’s no single command to do what I need.

While laying this complaint over drinks with workmates, a couple of the greybeards within earshot mumbled something about having that on VMS. Then, a colleague who is a bit of a perl guru, sent me through a quick one-liner in his favourite language. But what if this runs on a host without perl?

“A host without perl? Rawiri, you’re mad!”

Funnily, I’ve seen it. And the way I’ve been writing shell scripts for Linux and Solaris over the last few years, I’ve developed a habit of catering for contingencies.

OK, so let’s throw in a python option and run a test for the existence of perl. If it’s found, we use it, otherwise we fail over to python.

… But, ok. Umm, what if python isn’t there?

Rinse and repeat.  So eventually, we wind up with a script that can generate a random integer using GNU shuf, BSD jot, perl, python, awk, the $RANDOM builtin variable found in some shells, and finally /dev/urandom. By this point we’ve added in the ability to set minimum and maximum boundaries as well as defining how many integers we might want e.g.

[160309/20:49][rawiri@minty ~]$ rand -m 9 -M 20 -c 200 | column
17	18	16	12	16	14	15	12	15	20
12	10	19	12	16	13	17	14	15	11
10	13	17	10	14	9	10	16	18	17
14	9	12	12	16	14	19	18	17	11
16	20	17	13	16	9	17	19	16	12
14	17	18	13	20	12	16	19	12	18
20	14	12	12	9	12	20	17	15	11
16	17	12	17	11	17	16	12	15	15
16	19	10	10	14	17	16	16	16	19
14	9	18	15	18	20	15	17	9	15
11	16	16	17	9	20	11	10	15	16
11	13	9	17	19	19	16	18	12	15
10	17	12	19	16	15	13	14	9	12
17	10	13	20	15	16	18	13	14	11
18	18	9	14	16	12	18	11	20	10
17	15	19	14	20	19	10	10	12	14
9	16	19	10	17	11	19	11	19	18
19	17	16	20	14	10	13	20	9	13
15	10	15	17	16	18	16	13	16	9
15	19	14	14	11	16	10	10	12	20

… But, ok. Umm, what if all of those options are unavailable? Yes, by this point you probably have bigger issues and this is now an entirely academic exercise, but this script is intended to generate an integer by any means necessary.

Well, this one took a bit of thought. The problem is that, much like the lack of a ubiquitous random program on all unices, there is likewise a lack of a standard way to get some variable data with which to work with, that also has the granularity necessary to allow bulk number generation.

Here’s an example; We can take the output of date and pass it to md5sum and then use tr to suck out some digits. But the POSIX specification for date locks us into whole second granularity, which means if we try to generate more than one random number this way, we’ll fail until the second ticks over. Same inputs means same outputs:

[160309/21:00][rawiri@minty ~]$ for _ in {1..5}; do date | md5sum | tr -dc 0-9; echo; done
718868671813249754
718868671813249754
718868671813249754
718868671813249754
718868671813249754

If we’re allowed a GNUism, we can use GNU date‘s %N which gives us nanosecond granularity:

[160309/21:00][rawiri@minty ~]$ for _ in {1..5}; do date '+%N' | md5sum | tr -dc 0-9; echo; done
97648538623275916459675
435569615207069488535
91957263745340804
793107682191263575
08498689530557247

Which is great, but realistically if we’re missing everything else on our list, we very likely won’t have GNU anything available to us.

I got to thinking about HAVEGED, which made me think of entropy, and now we’re kinda back to /dev/urandom.  The date method above is ultimately an extremely basic attempt to generate pseudo-entropy.  So I recalled that earlier versions of Linux and UNIX entropy gathering tended to grab inputs from, say, keyboard presses and mouse movements and the like, then they’d simply run the raw entropy through an XOR and maybe hash it with SHA-1 or SHA-2.  Subsequent, more cryptographically secure methods like Yarrow and Fortuna are now in use.

But we’re not exactly in need of high quality entropy, just good enough pseudo-entropy.  We’re not going to be generating PGP keys using this, just making some random integers!

Well that’s it – we just grab a bunch of data that may or may not be variable, XOR it and store it into a file, /tmp/entropy. So I started out testing a few XOR options found at the better end of a Google, and they all seemed to perform really poorly or they weren’t remotely portable and full of bashisms. I settled on one and evolved it to this:

# xor function derived from http://zurlinux.com/?p=712
Fn_xor() {
  while read -r line; do
    # Start by removing whitespace to reduce character bias
    # pipe to 'od' which converts each character to one octal per line
    for i in $(printf "%s" "${line}" | tr -d " " | od -A n -t o1 -w1 -v); do
      # for each octal, run an xor and then print it
      # shellcheck disable=SC2059
      printf \\"$(printf '%03o' "$(( i ^ 90 ))" )"
    done
  done
}

The problem is that it iterates character by character through the data that’s fed to it and manages the transformation through multiple subshells in a double loop. It was slightly more than 3x faster than what I started with, but it was still pretty slow. So I passed it out to some scripting peers and one of them gave me the perspective that I’d lost: we’re transforming characters, the above function uses a tool specifically for that: tr. So we get this gnarly looking thing:

# Function to quickly rotate ASCII characters by 90 places
# emulates Fn_xor in a lazy way but is infinitely faster
Fn_chrotate() {
  # shellcheck disable=SC2020
  tr ' !"#$%&()*+,-./0123456789:;< =>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~' 'z{|}~ !"#$%&()*+,-./0123456789:;< =>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxy'
}

Right, so we run a bunch of commands, remove whitespace and newlines to reduce the character bias, pass it through our quick tr based function and we have ourselves some low-fidelity psuedo-entropy:

{
  ps -aux 2>/dev/null
  printf "%s" "$$"
  top -n 1 2>/dev/null
  date '+%H%M%S'
  df -k 2>/dev/null 
  printf "%s" "$$"
  netstat -a 2>/dev/null
  date '+%H%M%S'
  vmstat -i 2>/dev/null
  printf "%s" "$$"
  vmstat -s 2>/dev/null
  date '+%H%M%S'
  iostat 2>/dev/null
  printf "%s" "$$"
  pstat -afipSsT 2>/dev/null
} | tr -d " " | tr -d "\n" | Fn_chrotate > /tmp/entropy

Great. But our /tmp/entropy file is still static, so every time we try to grab n bytes from it, we’ll be grabbing the same exact sequence of bytes and we’re back to the problem of same input => same output. So we have to stir the file after our n bytes are grabbed.

tail -c +33 /tmp/entropy | tr -d "\n" > /tmp/entropy2

This starts printing the file from the 33rd byte and outputs it to a temporary file. In effect, we cut off the first 32 bytes. We can then seed in some fresh and remixed pseudo-entropy:

{
  # Seed some more data into the temporary entropy file, here we hash the PID
  printf "%s" "$$" | Fn_xor | tr -d " " | tr -d "\n" | "${crypt}"
  # Rotate the entropy, get the first 64 chars, xor it to the bottom of the temp file
  printf "%s" "$(fold -b -w 64 /tmp/entropy | head -1 | Fn_xor)"
  date '+%H%M%S' | Fn_xor | tr -d " " | tr -d "\n"
} >> /tmp/entropy2

Note that this time we use the real xor function. It produces better output and the data we’re feeding it here is not long enough to cause substantial performance issues. We also grab the first 64 characters of the original entropy file, xor it and append it to the temporary file. This way the same data is churned quickly from the start of the file to the end, with some fresh input wrapping it. Note that we also hash one of the commands.

Finally:

mv /tmp/entropy2 /tmp/entropy

And we can now generate our next number.

And that, dear reader, is how you completely overwork something simple. Congrats, by the way, if you read this far.

Categories: BSD Geeking Out Lunix Lunacy Open Source

rawiri

3 replies