What’s the best way to gain performance in the execution of your calculus intensive program?

Well, it’s still your brain: just write better code, thanks mainly to vectorisation of the loops.

Here is an example, in four blog post. Probably this very example will be not useful to you, but the techniques I used are clearly explained, and you can replicate them in a relatively easy way.

How much I arrive to gain? Around 500x in execution speed. Which means that if the original program took 8 hours to be executed, with those tricks you have the results in a single minute.

Yes, that’s a lot, and there is no way you can obtain a similar improvement just using a different hardware, like GPUs.

tachimetro

Ready to go?

Part 1, problem description

Part 2, vectorizing the loops

Part 3, batches and multithreading

Part 4, in-time compilation