What is the big O notation and how do I calculate it?

A lot of people I see seem to know off the top of their head whether a process is O(log n) or O(1) and so on. But I don't even know the slightest on how to determine that. How do I calculate the big O notation? Can you give examples of code and their corresponding representations in O notation?

Answer

Request

Follow163

Comment

Share2

Downvote

11 Answers

Gayle Laakmann McDowell, Author of Cracking the Coding Interview

88.1k Views • Upvoted by Vladimir Novakovski, IOI 2001, ACM World Finals 2003 and 2004

Gayle is a Most Viewed Writer in Data Structures.

Big-O notation is a way to express the efficiency of an algorithm. If you’re going to be working with code, it is extremely important that you understand Big-O. It is, quite literally, the language we use to express efficiency.

Big-O is an expression of how the execution time of a program scales with the input data. That is, as the input gets bigger, how much longer will the program take? Just a little bit? A lot longer?

Suppose you have a function foo which does some processing on an array of size N. If footakes O(N) time, then, as the array grows (that is, as N increases), the number of secondsfoo takes will also increase in some sort of linear fashion.

NOTE: This is an excerpt (or will be, once it's published) from Cracking the PM Interview. If you find errors (even little typos -- I haven't proofread this yet), or anything that is a bit confusing or misleading, please comment / correct.

Interested in the book, Cracking the PM Interview? Sign up on the website to be notified when it launches.

Real-Life Big-O

Many “operations” in real life are O(N). Driving, for example, can be thought of as O(N). As the distance N increases, driving time also increases in a linear fashion.

What might not be O(N)?

Imagine we invited a bunch of people (including you) to a dinner party. If I invited twice as many people to the party, you will have to shake twice as many hands. The time it will takeyou to shake everyone’s hand can be expressed as O(N). If I double the amount of guests, it will take you twice as long. This is a linear, or O(N), increase.

Now, let’s suppose everyone wants to shake hands, but for some strange reason only one pair can shake hands at a time. As N increases, how much longer will this meet and greet take? Well, your work will take O(N) time -- but so will everyone else’s. The time it takes increases proportionally with O(N^2), since there are roughly N^2 pairs.

Dropping Constants

If you are paying close attention, you might say, “But wait! There aren’t N^2 pairs. People aren’t shaking hands with themselves, and you’re double counting every pair. There are really N(N-1)/2 pairs. So we should say O(N(N-1)/2).”

You’re absolutely right. There are N(N-1)/2 pairs (which is .5*N^2 - .5N), but we still say that this is O(N^2).

Big-O is very hand-wavey, wishy-washy. We’re trying to express how the time changes in rough terms, not a offer a precise calculation for the number of seconds something takes.

As a result, we drop constant factors, so O(2N) is the same as O(N). We also drop the addition or subtraction of constants, so O(N - 5) becomes O(N). Put together, these two factors mean that O(N^2 + N) should be written as O(N^2). Think about it: if O(N^2) and O(N^2 + N^2) are the same, then O(N^2 + N), which is between those two, should be treated as the same.

This is a very important thing to understand. You should never express an algorithm as “O(2N).” This is not a “more precise” or “better” answer than O(N); it’s only a confusing one. A so-called O(2N) algorithm is O(N) and should be expressed as such.

Which of the below expressions are equivalent to O(N^3)?

O(3N^3)
O(N(N^2 + 3))
O(N^3 - 2)
O(N^3 + N lg N)
O(N^3 - N^2 + N)
O((N^2 + 3)(N+1))

All of them!

Drop your constants and just keep the most important term.

Multiple Variables

Back to the handshaking example. Suppose we invited men and women to our dinner party. The men all already know each other and the women all already know each other. Therefore, people will only shake hands with the opposite gender.

Assuming that we’re still in bizarro land where only one pair can shake hands at a time, how would you express how long this takes?

Don’t say O(N^2). Suppose we have 100 men and 1 woman. Adding one man will add one handshake, but adding one woman will add 100 handshakes. The time it takes does not actually increase proportional to the number of people ... (more)

Upvote690Downvote

Comments19+

Share2

Ezequiel H. Martinez, Computer Scientist & MBA with 20 years of experience.

6.9k Views

Definition

Big O function, is the expected growth function of an algorithm for the amount of data (to be processed) in its worst case scenario.

N, is always the amount of data to be processed. So, an algorithm of time O(N), will do as you have to check every element N times. Linear.
O(N^2), will be like you've to check every N element, N times!!! HUGE.

So, if you have the following code (javascript):


var i=0;
var n=10;
 
for (i=1;i<n;i++){
    console.log(i);
}

Is O(n), cuz you iterate along the data n times.

Something that is O(n^2) (N squared) would be an iteration inside an iteration.
Like:


var i=0;
var x=0;
var n=10;
 
for (i=1;i<n;i++){
    for (x=1;x<n;x++){
        console.log(i);
    }
}

Every additional for/while/do, inside another inner one, will add an extra power (like, N^2, N^3, N^4, etc...).

If, in the other hand, you made one after the other:


var i=0;
var x=0;
var n=10;
 
for (i=1;i<n;i++){
    console.log(i);
}
for (x=1;x<n;x++){
     console.log(i);
}

It N+N, equals? 2N. Since N is the higher order of magnitude, we say this algorithm is a "O(n)".

And what about the following?


var i=0;
var x=0;
var n=10;
var m=10;
 
for (i=1;i<n;i++){
    console.log(i);
}
for (x=1;x<m;x++){
     console.log(i);
}

Well, that should be O(n+m), right? no, is O(n). Because the growth is linear.

See? is the function GROWTH what matters.

You have, actually, three kinds of algorithms, and not much more:

1. O(n): Linear (all functions of the like: f(x)= kX + C, and the like)
2. O(N * log N): logarithmic grow (anything that entails trees usually)
3. O(n^x): exponential. (Or a polinomial, like: kn^x + ... nm + i; too costly, avoid this at all cost)

Usually, intuitively we will fall in n or n^x cases. Computer Scientist where struggling for more than 60 years about how to convert an exponential problem in a linear one.

The best thing you can do, is to bring an algorithm that transforms it from N^x in a logarithmic one. Usually, this sort of solution entails the usage of trees, ordered lists (and search algorithms of the kind "divide and conquer" like quicksort) or other special data structures.

That's what Computer Science is for. :-)

Written Aug 25, 2015 • View Upvotes

Upvote13Downvote

Comments1+

Kk Aravind Bharathy, Web developer, Writer

10.2k Views

Asymptotic analysis is used to study how the running time grows as size of input increases.This growth is studied in terms of the input size.Input size, which is usually denoted as N or M, it could mean anything from number of numbers(as in sorting), number of nodes(as in graphs) or even number of bits(as in multiplication of two numbers).

While dealing with asymptotic analysis our goal is find out which algorithm fares better in specific cases.Realize that an algorithm runs on quite varying times even for same sized inputs.To appreciate this, consider you are a sorting machine.You will be given a set of numbers and you need to sort them.If I give you a sorted list of numbers, you would have no work, and you are done already.If I gave you a reverse sorted list of numbers, imagine the number of operations you need to do to make the list sorted.Now that you see this, realize that we need a way of knowing what case the input would be?Would it be a best case?Would I get a worst case input?To answer this, we need some knowledge of the distribution of the input.Will it all be worst cases?Or would it be average cases?Or would it mostly be best cases?

The knowledge of the input distribution is fairly difficult to ascertain in most cases.Then we are left with two options.Either we can assume average case all the time and analyze our algorithm, or we could get a guarantee on the running case irrespective of the input distribution.The former is referred to as average case analysis, and to do such an analysis would require a formal definition of what makes an average case.Sometimes this is difficult to define and requires much mathematical insight.All the trouble is worth it, when you know that some algorithm runs much faster on the average case, compared to its worst case running time.There are several randomized algorithms that stand testimony tothis.in such cases, doing an average case analysis reveals its practical applicability. The latter, the worst case analysis is more often used since it provides a nice guarantee on the running time.In practice coming up with the worst case scenario is often fairly intuitive.Say you are the sorting machine, worst case is like reverse sorted array.What's the average case?
Yup, you are thinking, right?Not so intuitive.

The best case analysis is rarely used as one does not always get best cases.Still one can do such an analysis and find interesting behavior.

In conclusion, when we have a problem that we wanna solve, we come up with algorithms.Once we have an algorithm, we need to decide if it's of any practical use to our situation.If so we go ahead and shortlist the algorithms that can be applied, and compare them based on their time and space complexity.There could be more metrics for comparison, but these two are fundamental.One such metric could be ease of implementation.And depending on the situation at hand you would employ either worst case analysis or average case analysis or best case analysis.For example if you rarely have worst case scenarios, then its makes much more sense to carry out average case analysis.However if the performance of our code is of critical nature and we need to provide the output in a strict time limit, then its much more prudent to look at worst case analysis.Thus, the analysis that you make depends on the situation at hand, and with time, the intuition of which analysis to apply becomes second nature.

Big-OH(in context of this SO question:http://stackoverflow.com/questio...

When we say f(x)=O(g(x)), we mean that for some constant c independent of input size,

f(x)<=c.g(x) for all x>=k

in other words, beyond a certain point k, the curve f(n) is always bounded above by the curve g(n) as shown in the figure.

Now in the case you have considered, the operations of addition and subtraction, multiplication are all primitive operations that take constant time(O(1)). Let's say the addition of two numbers takes 'a' time and assigning the result takes 'b' time.

So for this code:

for (i=0;i<n;i++)
for (j=0;j<n;j++)
a[i,j]=b[i,j]+c[i,j]

Let's be sloppy and ignore the for loop operations of assignment and update. The running time T(n)=(a+b)n2.

Notice that this is just O(n2), why?

As per the definition, this means we can identify some point k beyond which for some constant c the curve T(n) is always bounded above by n2.

Realize that this is indeed true.We can always pick sufficiently large constants so that c.n^2 curve always bounds above the given curve.

This is why people say:Drop the constants!

Bottomline: Big O of f(n) is g(n) means, to the right of some vertical line, the curve f(n) is always bounded above by g(n).

A few more posts to help you:
Page on Stackoverflow
Page on Stackoverflow

Updated Aug 28, 2014 • View Upvotes

Upvote6Downvote

Comment

Share1

Daniel R. Page, Theoretical Computer Scientist

4k Views • Daniel is a Most Viewed Writer in Big-O Notation.

You calculate it as follows.

1) Identify a barometer in your pseudocode that is an elementary operation (e.g., arithmetic assignment). No line in your code will occur more times than this one with respect to the input.
2) Count the number of times that line occurs. You'll get a function with respect to the input size.
3) Then you must determine what it's asymptotics are. There are many ways to do that, here are two that are normally covered in an undergraduate course on analysis of algorithms.

a) Apply the Limit Theorem for Big-oh.
b) Prove directly using the definition of Big-oh to determine membership.

Following this, you want to refine your asymptotic result using properties that follow from the definition of Big-oh. For example, if there are constants e.g., 2 in your growth function, you can drop these as the input size gets large, they become irrelevant.

Written Mar 8, 2015 • View Upvotes

Upvote1Downvote

Comment

Anindya Mozumdar, Hobbyist Programmer

9.4k Views

Let's say you have two functions on the real line - f and g so that takes f takes a real number and produces another one. Similar for g.

Then f is said to be O(g) if there is a x_0 and constant c such that for all x > x_0, f(x) <= c * g(x)

So for all values of x greater than x_0, f(x) is bounded by a constant times g(x).

For example, if you consider a polynomial f(x) = x^2+x+1. Then clearly for all x>1, x+1 < x^2

which means f(x) <= x^2+x^2 and therefore f(x) <= 2x^2 for all x > 1. Therefore, the polynomial f(x) is O(x^2).

Now O notation is not limited to functions on the real line, but I have used it here for simple examples.

A great example of a simple algorithm with its O properties is here - Insertion sort

Written Nov 12, 2012 • View Upvotes

Upvote5Downvote

Comment

Robin Thomas, Programmer. All Around Nerd

3.9k Views • Robin is a Most Viewed Writer in Big-O Notation.

Say that you need to perform a task, and that there are 2 methods for performing that task. It means that there are 2 algorithms, say A and B, for performing that same task. Let A finish the task in time TA and B finishes it in time TB. If we can show that TA is less than TB, then algorithm A is indeed faster than algorithm B.

We test both the algorithms for an input of size n0, and we see that TA(n0) < TB(n0). That means that algorithm A is indeed faster than algorithm B for input size n0. But we wont be using the same input size n0 all the time. We need to be able to use different input sizes depending on the application. But if we can show that TA(n) < TB(n) for all values of n > 0, we can prove that algorithm A is indeed faster than algorithm B, irrespective of the input size. But we have no idea on how big or small n is, and a function is not always less than or equal to another function throughout the entire set of input sizes. To solve that, we have the asymptotic analysis.

Asymptotic analysis removes all the dependencies like hardware, OS, compiler and various other factors, and gives us a relation purely on input size n. There are various classes of asymptotic, like the big O, theta, big Omega, little o, and little omega. Each of these can be used to calculate a relation between the running time of an algorithm (for time complexity) with its input size n. All other factors are ignored.

Big notation is used for creating an upper bound for an algorithm. It means that whatever happens, the algorithm shall not take more time than what shown by the big O notation. So big O notation are used mainly for worst case analysis. We shall strip off low order terms and constants to generate a relation purely on the input size n.

Say that you need to search for a telephone number from a listing of telephone numbers, What shall you do? You shall start at the top and goes down to the bottom, by comparing each telephone number with the number that you need to search for. This is what call a sequential search. If you add 1 more telephone number to the listing, you shall need to search for 1 more number only (assuming that the number you need to find has not be found). You add telephone numbers to the listing, you just to need to search twice. So if you add n telephone numbers, you just need to search n times to find the number. So we call it linear time complexity or O(n).

Suppose you need to search for a word in a dictionary, say "Good". Will you start at A in the dictionary and then do a linear search for every single word in the dictionary, until you find the word? No, you wont. Because linear search is so time consuming in this occasion. So what shall we do? We shall try to get to the middle of the dictionary. The word we need to find shall be in either of the halves. We shall go to the respective halves and keep dividing the dictionary, until we find our word. Since we halve the dictionary at each step, the number of steps get divided by 2 in each step. That means that its a logarithmic complexity or O(log n). In asymptotic analysis, we do not think about the base of the logarithms. The reason is because to convert from a logarithm of one base to another, we just need to multiply by some constant, and since we ignore constants in asymptotic analysis, they have no importance.

Suppose you need to find the element in an array at the index 5 (So you need to find the value of a[5]). You do not have to search through a[0] to a[5] to determine the value at a[5]. A simple lookup at the a[5] shall get you the answer. Whatever the value of the element at a[5], you just need a simple lookup. It means that the time for the operation does not depend on the size, and rather needs only a constant time. This is what we call constant complexity or O(1).

When we say that the running times of 2 algorithms are asymptotically the same, it doesn't mean that their running times are the same. For example
100n lg n and 2n lg n are asymptotically same (because the constants are ignored), though their values are not.

Written Sep 10, 2013 • View Upvotes

Upvote8Downvote

Comment

Shravan Kumar, Writes code for food

4.1k Views

You are referring to the Big O notation . They are not different symbols for the same thing but have entirely different meanings. Which one is "preferable" depends entirely on the desired statement.

f ∈(g) means that f grows at most as fast as g, asymptotically and up to a constant factor; think of it as a ≤. f ∈o(g) is the stricter form, i.e. <.

f ∈Ω(g) has the symmetric meaning: f grows at least as fast as g. ω is its stricter cousin. You can see that f ∈Ω(g) is equivalent to g ∈(f).

f ∈Θ(g) means that f grows about as fast as g; formally f ∈(g) ∩ Ω(g). f∼g is its stronger form. We often mean Θ when we use .

Note how (g) and its siblings are function classes. It is important to be very aware of this and their precise definitions -- which can differ depending on who is talking -- when doing "arithmetics" with them.

When proving things, take care to work with your precise definition. There are many definitions for Landau symbols around (all with the same basic intuition), some of which are equivalent on some sets on functions but not on others.

Suggested reading:

Written Oct 8, 2013 • View Upvotes

Upvote2Downvote

Comment

Share1

Martino Ciresa, CS master student

1.6k Views

Originally Answered: What is big O? How do we calculate it?

Saying an algorithm runs in O(f(n)) for some function f means that, no matter the input, it terminates in f(n) time, or less. n is a measure of the size of the input (e.g. the number of nodes in a graph, the length of a string, etc.).

Written Jul 26, 2015 • View Upvotes

Upvote1Downvote

Comment

Ahmad Assaf, Knowledge Seeker

7.8k Views

One of the best answers is the stack Overflow answer O Plain English explanation of Big O

Written Nov 12, 2012 • View Upvotes

Upvote16Downvote

Comment1

Sibivel Chinnasamy

2.3k Views

When solving a computer science problem there will usually be more than just one solution. These solutions will often be in the form of different algorithms, and you will generally want to compare the algorithms to see which one is more efficient.
This is where Big O analysis helps. It gives us some basis for measuring the efficiency of an algorithm. A more detailed explanation and definition of Big O analysis would be this: it measures the efficiency of an algorithm based on the time it takes for the algorithm to run as a function of the input size.

Written Sep 3, 2014

UpvoteDownvote

Comment