As some of you know, I went to a bit of effort to create a script that tries as many ways as possible to generate random numbers. Some would say I went a bit too far. Overboard. So to speak.
So recently I’ve been working on rewriting my passphrase generator function, which has meant rewriting my shuffle function (it checks for ‘shuf’ and steps in if ‘shuf’ isn’t found), and I’ve been stuck on one of the shuffle methods. No problem, I thought, I can just use rand and consolidate this down to a quarter of the size!
Well, it turns out that, no, I couldn’t. rand, by design, will generate random numbers. That means that generated integers can repeat. And that’s correct behaviour: If you roll a dice cube 6 times, there’s absolutely no guarantee you’ll get 1, 2, 3, 4, 5 and 6 regardless of the order. You might get, say, 3 twice. And that’s fine! It’s when you get a lot of 3’s and 4’s that you start to question whether your die are rigged.
But I needed different behaviour – I needed rand to generate a complete randomised set of integers within a range. Let me show you what I mean by way of demonstration:
This was the default behaviour – to allow repeats:
$ rand -N 6 -M 6 -r 2 4 4 5 1 4
That’s a dodgy die!
This is the new default behaviour – a complete set of integers within the supplied boundaries, and in random order:
$ rand -N 6 -M 6 3 6 4 1 5 2
As you can see, both behaviours are now supported and switched using ‘-r’. Unfortunately this took the script beyond the point of being a straight if/elif/else. I’ve had to move everything to functions, I’ve added new functions, tested, retested, adjusted code for POSIX-esque compatibility and so on.
The last struggle I’ve had over the last couple of days has been figuring out some method to fill up any gaps left in a generated number set. For example, let’s generate 99171 random integers but allow repeats:
$ rand -N 99171 -M 99171 -r | sort | uniq | wc -l 62740
Uhh… We’re nearly 36.5k integers short!
After trying a few different algorithms including simple double sorts (which get brutally slow at any scale), knuth-fisher-yates, looped match-and-reduce et al, I went for a walk and remembered that I was really working with three sets of integers and there had to be a simpler way to approach this:
- The set that I’d generated: In the above example, this is the 62740 integers that were already random
- The full unsorted set: 1 to 99171
- The difference of those two sets
I can quickly generate all three sets, that’s no problem, the trick is merging the first and third set. So I figure if the first random set has extracted sufficient integers from the full set, then what’s left should be fairly random. Unfortunately, simply appending the leftovers does leave obvious patterns.
Enter the paste command. We can interlace (or ‘hot-finger’, or ‘comb’) line by line using paste, like so:
$ cat /tmp/a a b c d $ cat /tmp/b 1 2 3 4 $ paste -d '\n' /tmp/a /tmp/b a 1 b 2 c 3 d 4
Great – so we simply interlace the two data sets and we’re done. Sadly this does require using temporary files as arrays aren’t POSIX and it seems that mktemp/mkfifo can’t be expected to exist on older Unices. But apart from that, it’s all coded up, and I’m working through testing each of the different generation methods now – so far it’s working fast.
Oh, I also added zero padding, because why not?
$ rand -M 10 -N 5 -z 08 04 02 07 03
And here’s its help output:
$ rand -h rand - generate random positive integers Optional Arguments: -d [debug. Tells you which processing method is used (Default:off)] -h [help] -m [minimum number (Default:1)] -M [maximum number (Default:9223372036854775807)] -N [count. Number of numbers (Default:1)] -r [repeat. Output lines may be repeated (Default:off)] -z [zero pad single digits. E.g. '9' becomes '09' (Default:off)]
As before, you can get the script from here: https://github.com/rawiriblundell/scripts/blob/master/rand