I’m gonna dive right in with problem 1 from Project Euler. I’ll give you enough time to go make a cup of tea and make yourself comfortable. Go on, I’ll wait.
So, we meet again. Before we start, a quick disclaimer: maybe a third of the people I know, are already familiar with the programming language R. For people who are not, I will explain as I go.
To make the code easy to read, I took screen captures, to keep the formatting and colours. Since you can’t copy & paste that, you can find all my code for this post here.
Problem 1 states:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.
1. Unhand me, you brute!
What’s the first thing you try? Well, you figure there’s only 999 numbers to check, so you might as well make a brute-force approach, and check every one. So here is the code in R, where n1 is 3, n2 is 5, and thresh(old) is 1000:
Pretty simple, right? All you’re doing is going through the numbers 1 to 999 [ for (i in 1:(thresh-1))]. Then checking for divisibility by 3 or 5:
If it is true (thus that ‘if’ above), we add/append that particular ‘i’ to our vector ‘multvect’. After we’ve gone through all possible values of ‘i’, we sum up all the elements in our vector. Voila, our (correct) solution is 233,168. Simples!
The big ‘but’. (I cannot lie). Checking every number for divisibility is fine for small numbers, buuuuuuuut: what about extremely large numbers? Well, let’s have a look. Time taken to run this code: For 1:1000 = 0s ; 1:10,000 = 0.09s ; 1:100,000 =
Well, that’s embarrassing. Integer overflow? Long story short, that ‘append’ function is like adding more and more people to an orgy. An orgy in your computer memory space. (That’s why it’s called RAM). Sooner or later, someone’s gonna fall off the bed/dinner-table/ski-lift. So we’re gonna need a bigger boat. Or failing that, let’s get elegant.
2. Vectorization at Tiffany’s
The problem with ‘append’ is that you don’t know ahead of time how long that vector is, so you keep adding to it. But because of how your computer works, it might hit a limit at some point. What if we knew ahead of time, how long that vector needs to be? But how would we… Wonder Twin Powers Activate! Form of: Mathematical Reasoning!
We’re just looking for multiples of 3 and 5, below 1000. We know that for 5, the sequence starts 5, 10, 15, … but what’s the largest multiple of 5, below 1000? Sure, we know on intuition that it’s 995, but how do we formalize this? All we need to do is find out how many times 5 fits into 999. So 999/5 is 199.8. But we’re interested in whole numbers, so we round down. Sure enough 199 times 5 is 995. Sorted!
So we just need to preallocate space for 199 numbers in the case of multiples of 5. For 3 its the same process. 999/3 is 333, so we only need 333 numbers. We can simplify the whole process by a neat trick that R can do, called ‘vectorization’. Instead of performing an operation (e.g. multiplying it by 3) one at a time, we can set up the whole vector (1:333, highlighted) and multiply the whole vector by 3.
And our solution is 266,333. Which – wait, what the actual fuck? We should get 233,168! Where did that extra 33,165 come from? Isn’t it interesting how that number is divisible by 3 and 5? How many more rhetorical questions can I ask? Let’s take a look at, say, the first 20 values of multn1 and multn2, our vectors of multiples of 3 and 5, and work out what’s going on:
Of course! There’s gonna be numbers which occur in both those lists. Numbers divisible by 3 and 5. Obviously, those numbers are multiples of 3*5 = 15! Sooo stoopid, Herc. Let’s try that again, but only count those values once, by adding this cheeky little line just before the line which returns the answer:
‘Unique’ just finds the unique values, ignoring how many times they appear. So unique(2,3,3,3,3,4,4,5) returns 2,3,4,5. ‘c’ means ‘concatenate’ which is fancytalk for stitch-together. Running this function, we get: 233,168!
Now let’s look at how fast it is. brute.force took 0.09s to search through 1:10,000. Holly.Golightly does 1:10,000 in 0s ; 1:100,000 = 0s ; 1:1,000,000 = 0.1s. Niiice! Kinda curious about when it slows down. Let’s keep going: 1:10,000,000 = 1.45s. Hmm. How about 100 million?
Fuck. The ole’ girl served us well, but maybe vectors are not the way to go for absurdly large numbers. Let’s acquit ourselves by droppin’ some mad science.
3. Two men vector, one man leaves!
Alright. All we’re really doing is adding numbers up. It shouldn’t be that hard. Let’s look at the multiples of 5 (a nice round number) and see if we can find some patterns. Let’s just take the first 10 multiples, and then we can generalize to larger numbers once we’ve found a nice rule.
5+10+15+20+25+30+35+40+45+50 = 275. How do we make this expression easier? Well, every element is a multiple of 5 (duh), and we remember our trick from above, so this becomes 5*(1+2+3+4+5+6+7+8+9+10). That looks simpler. We can always multiply by 5 after. Okay. And 1+2+3…+10 has a clear pattern to it. Take a look visually:
Well, heck! It’s a frikkin’ triangle. With two equal sides. That ain’t so daunting. A triangle’s pretty much half of a diagonally cut square sandwich. (Not sepia-toned. Fuck off). Squares and halves are easy to deal with, so let’s make a sandwich:
Well, fuck me sideways. How delightfully common. All those numbers to add up are just half of an almost square. If you work it out manually, 1+2+3+…+10 = 55. We no longer have to do that shit! 1+2+…+n is now just (n x (n+1))/2. Let’s try it. (10*11)/2 = 110/2 = 55.
Now here comes the cheeky bit. We want the sum of (3,6,9,…,999) + the sum of (5,10,15,…,995) minus the sum of (15,30,45,…,990). Fuck code, let’s get out the pencil and paper, and show these sandwiches a thing or two, old-school stylee.
- 3*(1:333) = 3*(333*334)/2 = 3*(111,222)/2 = 333,666/2 = 166,833
- 5*(1:199) = 5*(199*200)/2 = 5*(39,800)/2 = 199,000/2 = 99,500
- 15*(1:66) = 15*(66*67)/2 = 15*(4,422)/2 = 66,330/2 = 33,165
- 166,833 + 99,500 = 266,333 (oh, hi again! how have you been?)
- 266,333 – 33,165 = 233,168
- BAM! And the code:
How fast is it? Well, it doesn’t need to allocate any memory space to vectors. It doesn’t need to create vectors at any point. It just finds three numbers, and applies the formula:
Yeah, that’s what I thought. Keep kickin’ ass and taking names, math. We salute you.