Algorithm vs CPU Cycles

Introduction

A few weeks ago I went to get some drinks with a few developer friends. We were talking about software at different levels, and one of them talked about how he improved a long running process. This process has two unsorted arrays, and for each item in ArrayA it looks up a value in ArrayB. My friend restructured the algorithm to reduce its time complexity. This led us to a conversation about algorithms, CPU Cycles and Big O Notation.

Big O

Before I talk about the approaches, I have to mention what this “Notation” is. Big O Notation is used to calculate how long an algorithm takes to process N items. Algorithms are classified using this notation to focus on the amount of steps needed to complete, regardless of processing power.

This notation can be used to calculate time complexity or space complexity of an algorithm. Long story short, Big O represents how many items have to be visited (or how  many items are in memory) when the algorithm is running.

There is a lot of information on how the Big O Notation works, so I won’t go into details here, the important thing is that as N grows higher, slower algorithms gap gets wider, as seen in the below table:

NO(1)O(Log N)O(N)O(N Log N)O(N2)
31.001.5831.439
201.004.322026.02400
3001.008.23300743.1490000
5,0001.0012.295,00018494.852.50E+07
100,0001.0016.61100,0005.00E+051.00E+10

I also won’t go into detail on how the Big O notation is calculated for an algorithm, but it is based on loops, inner loops and divide and conquer approaches. A quick example of an algorithm of each category:

  • O(1): Look up a value by index or key
  • O(Log N): Binary search in a sorted, i.e., keep splitting the array into halves until the value is found
  • O(N): Loop through an array
  • O(N Log N): A loop within a loop, using divide and conquer, e.g., Quicksort
  • O(N2): A loop within another loop, e.g. bubble sort.

Going into code

Now onto the code, first is the code in the original state, before any improvements were made. Here we see that, each element in Array1 does a full search on Array2, making the algorithm of N items visit another N items each, this O(N2)

Approach 1: O(N Log N)

Now we can go into my friend’s solution. Because there are two unsorted arrays, the first step is to sort them, by default this is low to high. Each of these sorts takes O(N Log N), then we have a for loop that cycles through each element once, this part takes O(N). The rules of the notation specify that we use the most complex operation as the complexity, so we can say that the code below runs in O(N Log N).

Approach 2: O(N)

This is the solution that came up in conversation. Instead of sorting both arrays, we can convert Array2 into a Dictionary/HashMap, which loops through each element once, so O(N), then loop over Array1, another O(N) operation. 

The theory

From what we know about what Big O represents, we can tell which algorithm is faster just by looking at the code and the loops. Obviously O(N) is faster, we are looping less times over the items of the list. But when we put it to practice we see a different result. I did a quick github repo to test out the algorithms. Here are metrics I gathered by doing all three approaches:

So, what happened here? Why is O(N Log N) faster?

The practice

In practice we see a different result. After running some benchmarks on this repo, we can see that Approach 1 runs faster in actual time. How can that be? Well, since Big O notation only tackles algorithm time complexity, it ignores how the time a CPU takes to look up a value in a Dictionary/HashMap.

We’re talking about CPU cycles now. While in an algorithm each action is a single step, in reality the CPU does more processing in order to do said step. This means that pulling up an item from a Dictionary/HashMap takes up more CPU Cycles that add up.

The Truth

The fact of the matter is that while Approach 1 runs faster in actual time, if we have a large enough data set, Approach 2 will be faster. This can be extrapolated:

With a large enough dataset*, a faster algorithm will run better on a slow CPU, than a slower algorithm in a fast CPU.

* that dataset might be in the trillions.

Conclusion

In the case of my friend’s work, his approach works best because it takes less CPU cycles to pull up an item via index than it does via a key from a Dictionary/HashMap. And this approach will work for millions of items. 

But at some point, once we reach a high enough number of items, the algorithm with O(N) will be faster.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *