DSA — chapter 1 — Introduction

13 min readFeb 7, 2021

Problem: given 2 algorithms to solve the same problem, how to determine which algorithm is right?

Solution 1: Implement both algorithms and test against some predefined test cases and see which one takes fewer resources like time.

problems with solution 1:

what if the test cases end up favoring one algorithm over another, for example, let's say the problem to solve by the algorithm is to implement linear search — one implementation of the algorithm is to find the desired data in an array from the beginning while in the other implementation data is searched from the end to beginning of the array, so one algorithm searches from left to right in the array while other searches from right to left in the array, what if the test cases are written in such a way that they are closer to the right and thus algorithm searching from right to left is considered better while in theory there is no difference between the two.
Another problem with this approach is that it depends on server load, so when algorithm 1 ran, the server/system was less busy while in the case of algorithm 2 server was quite busy and thus took more time.

Solution 2: Asymptotic Analysis

Theoretical approach
Measure the order of growth in terms of input size.
Guarantees few things for input size greater than a threshold.

It guarantees that if an algorithm is asymptotically better than some other algorithm, after a certain value of input size, the asymptotically better algorithm will perform better.

For example, if an algorithm is linear and another algorithm is logarithmic, it is guaranteed that if the linear algorithm runs on the world’s fastest computer while the logarithmic algorithm runs on an average computer after a particular size of the input, the logarithmic algorithm will be faster.

in red: t= 10⁹ log(x) and in blue: t= x

I took 10⁹ as a multiple because the computer where the logarithmic algorithm is running is assumed slow, as you can see in this case our threshold is, x₀ = 10¹⁰ and after this point, the slower computer starts beating the fastest computer. (it’s true because the order of growth of 10⁹log(x) is less than x.)

In the above code example:

The input size is x, also let’s assume running a loop and print statement takes some constant time C1 :
so time taken for this code is ≈ C1 * x + 1000 * C1 ≈ C1*x + C2
which is obviously linear ( y = m*x + c ).

Asymptotic notation

Ѳ (Theta notation)
O (Big O notation)
Ω (Omega notation)

Ѳ (Theta notation)
Ѳ notation is used to tell about the highest growing term, for example:
Ѳ(n²) = n² + 2n + 1 ( highest order growing term is n² )

O (Big O notation)
Big O notation represents the highest growing term is at most something, for example: (note that O(n²) will cover all the functions in Ѳ(n²))
O(n²) = n² + 2n + 1
O(n²) = 3n
O(n²) = 1000
O(n²) ≠ n³

Ω (Omega notation)
Ω notation represents all the functions with at least some order of growth, for example:
Ω(n²) = n² + 2n + 1
Ω(n²) = 2n³
Ω(n²) ≠ 3n + 1
like O notation Ω(n²) will also cover all functions covered by Ѳ(n²).

I will come to this topic again with a more mathematical explanation and some graphs to make this clear.

Best case, Average case, and the Worst case of an algorithm.

Best case scenario: what you are finding is present at the 0th index.
Worst case scenario: anything that’s not present — basically iterating over the whole length
Average case scenario:
this is not that simple, the average case is the sum of all possible times divided by total possibilities of input.
- calculating average needs assumption:
for example, one can assume that all possible permutation of iterable is equally likely or some other statistical distribution.
basically, it’s tough to calculate in most cases.

It should be obvious that the best case is not very useful and the average case is tough to calculate, so the worst-case analysis is the one that we will be interested in most of the time.

Connecting Best case, Worst case, and Average case analysis with Asymptotic notations.

In the above algorithm, time complexity (T):

T ≠Ѳ(n) because in some cases it’s O(1) (best case scenario for example).
but if we say the worst-case time complexity of this algorithm is Ѳ(n), then that’s true.
Also, note that the time complexity of the above algorithm is O(n). (make sure you understand the difference between time complexity and worst-case time complexity)

Exercises (find the order of growth of following loops):

Solution:
In both the problems above it’s quite clear that they both are kind of the same, one goes from 0 towards n, while the other goes from n towards 0, and the “step” size is c.
It will take n/c steps in whatever direction.
n/c => Ѳ(n), that is the time complexity of the above loops in Ѳ(n).

Exercises (find the order of growth of following loops):

Solution:
One thing is clear in the above loops, that they are again kind of the same, and now a movement towards the destination (from 1 towards n or from n towards 0) has sped up, the movement is not linear anymore.

Let’s go back to the previous problem and understand how we got from 0 towards n by adding c again and again:
c + c + c + … s times = n (might be more than n, but (s-1)*c is definitely less than n)
c*s = n => s = n/c

Similarly, in this example we can say something like this:
c*c*c … s times = n (from 1 towards n)
c^s = n => s = log n ( log with base c)

The time complexity of the above loops is Ѳ(log n).

In the above loop, let’s see how the value of the index changes with each iteration:
index => 2, 2^c, (2^c)^c, ((2^c)^c)^c …
=> index => 2, 2^c, 2^c², 2^c³, …
after kth iteration => 2^(c^k) ≤ n
k ≤ log log (n) (the first log is the log to base c, second is the log to the base 2)

What is the time complexity of the below algorithm:

The above algorithm has time complexity is O(n), and not Ѳ(n) because sometimes (when n is multiple of 10) it’s Ѳ(1).

What is the time complexity of the below algorithm:

C1*n + C2*m = Ѳ(n+m)

What is the time complexity of the below algorithm:

let’s see how many times the inner loop will run: n, n-1, n-2 … 1
well that’s obviously the sum of first n natural numbers: n(n+1)/2 = Ѳ(n²)

Limitations of Asymptotic analysis

Lets we have an algorithm O(n²) and other O(n), but what if the constants in O(n²) are really low compared to O(n), although still after a certain value of n, O(n) will always beat O(n²) but what if that certain value of n is not practical.

blue curve: y = x²/10⁵ and black line is y = x, the linear curve starts performing better after x = 100,000. what if 100,000 is not even practical for our real-life scenario!

Other things that asymptotic notation doesn’t consider a lot of computer-related things, for example, quicksort has Ѳ(n²) as worst-case time complexity while merge sort has Ѳ(n*log n) as worst-case time complexity but still, quicksort is better because it’s more architecture and cache-friendly algorithm.

Analysis of Recursive Algorithms

the condition above will take some constant C1 time and the loop will take C2*n time, if we say the algorithm takes T(n) time we can write something like below:

T(n) = C1 + C2*n + 2*T(n/2) = Ѳ(n) + 2T(n/2)

Before computing T(n) let’s look at more examples.

Time complexity of example 1 will be T(n) = C + 2T(n/2)
Time complexity of example 2 will be T(n) = C + T(n-1)

Now we know how to write the recursive equation of T(n), let’s see how we can solve T(n) and get a value in terms of n.

There are multiple mathematical methods to solve recurrences like these:
T(n) = 2T(n/2) + Cn
such as the substitution method, master method, recursion tree method, etc.
for now, we will only discuss the recursion tree method.

Recursion tree method.

We write the non-recursive part as the root of the tree and the recursive part as the children.
we keep expanding children till we see a pattern.

TODO — add a picture here ask Mohit!
at every level work done is Cn, if the height of the tree is h, the total work done is C*n*h.
the input size id decreasing at each level like this: n, n/2, n/4 … 1
at height h, input size is n/2^h = 1
2^h = n
h = log n(base 2)

So the total time taken by the algorithm is C*n*log n => Ѳ(n log n)

Let’s analyze the below recursion:

T(n) = 2T(n-1) + C
T(1) = C