Menu Home

Analysing lotto numbers

New Zealand’s first Lotto broadcast, if you were interested.

So I’m at home fighting a cold. In between cups of lemon+ginger tea with a bit of honey and ingesting whatever seemingly useful drugs I can find, I decided that I needed to do something. Anything. Lying in bed all day gets boring, fast.

I recently posted about generating lotto numbers. So, what would be involved in checking the win-rate of any “tickets” generated by my rand script? No, I don’t mean to do a full blown analysis of lotto numbers in order to come up with some half baked “winning strategy”, I’m not that smart, just smart enough to know better than to waste my time.

So I needed the history of lotto results to compare to. I won’t go into too much detail on how or where I got the numbers – they’re publicly available and I had to save and scrape a few aspx pages. And the output I made is basically like this:

$ head nzlottoresults 
Draw 01: 04 29 16 40 08 32 30
Draw 02: 03 09 39 13 36 20 38
Draw 03: 11 26 18 39 22 05 38
Draw 04: 35 02 29 10 04 11 14
Draw 05: 23 07 03 08 12 11 15
Draw 06: 12 20 17 39 31 01 04
Draw 07: 14 01 34 23 05 28 10
Draw 08: 13 17 26 38 08 37 19
Draw 09: 15 02 32 27 04 09 37
Draw 10: 19 20 11 04 28 18 02

Interestingly, we can see that draw 711 is where the PowerBall was added into the fray:

$ grep -C 2 "Draw 711" nzlottoresults 
Draw 709: 39 27 07 16 04 14 06
Draw 710: 09 15 02 33 19 25 01
Draw 711: 37 12 13 14 06 11 29 03
Draw 712: 30 23 39 33 25 08 16 06
Draw 713: 33 16 36 37 19 08 11 05

So if we look at the numbers, we can see there’s a reasonably uniform distribution (i.e. the numbers are drawn at nearly the same rate), with the number 01 being drawn the most at 300 times across 1546 draws, and the number 28 being drawn 243 times:

$ cut -d: -f2- nzlottoresults | cut -d " " -f1-8 | tr " " "\n" | grep . | sort | uniq -c | column
    300 01	    274 09	    258 17	    263 25	    268 33
    266 02	    261 10	    289 18	    284 26	    252 34
    258 03	    255 11	    283 19	    280 27	    275 35
    250 04	    284 12	    265 20	    243 28	    273 36
    269 05	    292 13	    282 21	    248 29	    270 37
    259 06	    269 14	    282 22	    269 30	    281 38
    292 07	    277 15	    271 23	    270 31	    276 39
    278 08	    259 16	    270 24	    255 32	    272 40

It may seem like a big difference, but across 1546 draws, it’s actually not. You simply cannot just pick out the most drawn numbers, slap them onto a ticket and expect to win. You can’t do something like, say:

$ cut -d: -f2- nzlottoresults | cut -d " " -f1-8 | tr " " "\n" | grep . \
  | sort | uniq -c | sort | tail -7 | awk '{print $2}' | sort | tr "\n" " "; echo
01 07 12 13 18 19 26 

Followed by this:

$ cut -d: -f2- nzlottoresults | cut -d " " -f9 | grep . | sort | uniq -c | sort -n | tail -1 | awk '{print $2}'
02

To get the supposedly statistically most-likely-to-be-chosen numbers:

01 07 12 13 18 19 26 02

More on those numbers later. But interestingly, the PowerBall numbers are nowhere near as uniformly distributed. Ball 09 is clearly an under-achiever:

$ cut -d: -f2- nzlottoresults | cut -d " " -f9 | grep . | sort | uniq -c
    101 01
    111 02
    101 03
     92 04
     87 05
     94 06
     73 07
    100 08
     32 09
     45 10

So we have 10822 numbers including the Bonus Ball and excluding the PowerBall across 1546 draws. If we use my rand script (here it’s using shuf as its back-end) to generate the same amount of random numbers, we can see a similar distribution:

$ rand -M 40 -N 10822 | sed 's/\< [0-9]\>/0&/' | sort | uniq -c | column
    278 01	    273 09	    290 17	    280 25	    270 33
    299 02	    268 10	    249 18	    300 26	    268 34
    263 03	    266 11	    281 19	    257 27	    270 35
    261 04	    292 12	    261 20	    270 28	    284 36
    285 05	    248 13	    269 21	    263 29	    293 37
    254 06	    260 14	    278 22	    284 30	    258 38
    261 07	    285 15	    253 23	    285 31	    251 39
    266 08	    261 16	    265 24	    253 32	    270 40

Here the highest number is 26, drawn 300 times. And the lowest number is 13, drawn 248 times. A similar distribution to reality.

Ok, so here’s the test: Use the genlotto script to generate 1546 “tickets”, then write another script to automatically check those tickets against the real results. The first part is easy:

$ mkdir lottoresults
$ for i in {1..1546}; do ./genlotto > lottoresults/result-$i; done

The next part is a bit trickier… we have to cater for 6 digits, a Bonus Ball and a PowerBall that may or may not be there. We have to read the drawn results, then read each line of the ticket we generated for that draw, and then compare the two. This requires a couple of arrays and a couple of variables, as well as a bunch of checks that tally what’s found. Oh, and to keep things simple we just use the modern set of rules. The resulting script is called lottocheck. Very quickly we see that we would have won… nice!

$ ./lottocheck 
Draw 9: Division 5 winner!
Draw 19: Division 6 winner!
...

Just not that much in the higher divisions:

$ ./lottocheck | cut -d: -f2- | sort | uniq -c
      6  Division 2 winner!
      2  Division 3 winner!
     80  Division 4 winner!
     41  Division 5 winner!
    552  Division 6 winner!

Using the prize amounts for the latest draw, that’s $131,335 won. Out of an outlay of $18,552 ($12 Powerdip tickets, 10 lines). Note the lack of Powerball results? This means you’d have got a better return by just buying the $6 tickets ($9,276 all up).

But this is gambling (or statistics… your choice), so let’s run it again, dealer!

$ for i in {1..1546}; do ./genlotto > lottoresults/result-$i; done
$ ./lottocheck | cut -d: -f2- | sort | uniq -c
      2  Division 2 winner!
      1  Division 3 winner!
    108  Division 4 winner!
     43  Division 5 winner!
    514  Division 6 winner!

A square $59k returned.


So what happens when we run our statistically winning numbers against the back history of NZ’s lotto draws?

$ ./superlottocheck | cut -d: -f2- | sort | uniq -c
      1  Division 2 winner!
      1  Division 3 winner!
      9  Division 4 winner!
      3  Division 4 winner WITH POWERBALL!
      6  Division 5 winner!
      2  Division 5 winner WITH POWERBALL!
     58  Division 6 winner!
      3  Division 6 winner WITH POWERBALL!
      3  Division 7 winner WITH POWERBALL!

$21,370 returned from $18,552. Not much of a return for 29 years of investment.

OK, so what can we conclude? Gambling is always a loser’s errand unless you play the long game and you’re absolutely disciplined with your winnings. Good luck with that (ironic pun?) Don’t get caught in the trap of problem gambling, and don’t EVER gamble what you can’t afford to lose. Also, it seems that throwing money at PowerBall is pretty much a complete waste of time and money, and a quick Google search confirms that.

This post was brought to you by Gees Linctus, a fistful of anti-histamines and panadol, washed down with a hot toddy. With all of that in mind: don’t trust my scripts or my maths. I’m clearly not 100% right now, so corrections will be appreciated… once I recover!

Categories: Geeking Out

rawiri