CS13002 Programming and Data Structures

Spring semester

Performance analysis of programs

There may exist many algorithms to solve a given problem. Moreover, the same algorithm may be implemented in a variety of ways. It is now time to analyze the relative merits and demerits of different algorithms and of different implementations. We are shortly going to introduce a set of theoretical tools that make this analysis methodical. In order to solve a problem, it is not sufficient to design an algorithm and provide a correct (bug-free) implementation of the algorithm. One should also understand how good his/her program is. Here "goodness" is measured by relative performance of a program compared to other algorithms and implementations. This is where the true "computer science" starts. Being able to program does not qualify one to be a computer scientist or engineer. A good programmer is definitely a need for every modern society and it requires time and patience to master the art of good programming. Computer science goes beyond that by providing formal treatment of programs and algorithms. Consider that an airplane pilot is not required to know any aerospace engineering and an aerospace engineer need not know how to run an airplane. We need both pilots and aerospace engineers for our society.

In short, when you write (and perhaps sell) your programs or when you buy others' programs, you should exercise your ability to compare the apparent goodness of available programs. The task is not always easy. Here are some guidelines for you.

Resource usage of a program

You may feel tempted to treat a small and compact program as good. That is usually not a criterion for goodness. Most users execute precompiled binaries. It does not matter how the source code looked like. What is more important is how efficiently a program solves a problem. Efficiency, in turn, is measured by the resource usage of a program (or algorithm). The two most important general resources are:

Here running time refers to the amount of time taken by a program for a given input. The time is expected to vary according to what input the program processes. For example, a program that does matrix multiplication should take more running time for bigger matrices. It is, therefore, customary to view the running time as a function of the input. In order to simplify matters, we assume that the size of the input is the most important thing (instead of the actual input). For instance, a standard matrix multiplication routine performs a number of operations (additions and multiplications of elements) that is dependent only on the dimensions of the input matrices and not on the actual elements of the matrices.

A standard practice is to measure the size of the input by the number of bits needed to represent the input. However, this simple convention is sometimes violated. For example, when an array of integers need be sorted, it is customary to take the size of the array (number of integers in the array) as the input size. Under the assumption that each integer is represented by a word of a fixed size (e.g. 32 bits), the bit-size of the input is actually a constant multiple of the array size, and so this new definition of input size is not much of a deviation from the standard convention. (We will soon see that constant multipliers are neglected in the theory.) However, when the integers to be sorted may be arbitrarily large (multiple precision integers), this naive negligence of the sizes of the operands is certainly not a good idea. One should in this case consider the actual sizes (in bits or words) of each individual element of the array.

Poorer measures of input size are also sometimes adopted. An nxn matrix contains n2 elements and even if each element is of fixed size (like int or float), one requires a number of bits proportional to n2 for the complete specification of the matrix. However, when we plan to multiply two nxn matrices or invert an nxn matrix, it is seemigly more human to treat n (instead of n2) as the input size parameter. In other words, the running time is expressed as a function of n and not as a function of the technically correct input size n2 (though they essentially imply the same thing).

Space (memory) requirement of a program is the second important criterion for measuring the performance of a program. Obviously, a program requires space to store (and subsequently process) the input. What matters is the additional amount of memory (static and dynamic) used by the program. This extra storage is again expressed as a function of the input size.

It is often advisable to reduce the space requirement of a program. Earlier, semiconductor memory happenned to be a costly resource and so programs requiring smaller amount of extra memory are prefered to programs having larger space requirements. Nowadays, the price of semiconductor memory has gone down quite dramatically. Still, a machine comes with only a finite amount of memory. This means that a program having smaller space requirement can handle bigger inputs given any limited amount of memory. The essential argument in favor of reducing the space requirement of a program continues to make sense today and will do so in any foreseeable future.

Some other measures of performance of a program may be conceived of, like ease of use (the user interface), features, security, etc. These requirements are usually application-specific. We will not study them in a general forum like this chapter.

Palatable and logical as it sounds, it is difficult to express the running time and space usage of a program as functions of the input size. The biggest difficulty is to choose a unit of time (or space). Since machines vary widely in speed and memory management issues, the same program running on one machine may have markedly different performance figures on another machine. A supercomputer is expected to multiply two given matrices much faster than a PC. There is no standard machine for calibrating the relative performances of different programs. Comparison results on a particular machine need not tally with the results on a different machine. Since different machines have different CPU and memory organizations and support widely varying instruction sets, results from one machine cannot, in general, be used to predict program behaviors on another machine. That's a serious limitation of machine-dependent performance figures.

There is a hell lot of other factors that lead to variations in the running time (and space usage) of a program, even when the program runs on the same machine. These variations are caused by off-line factors (like the compiler used to compile the program), and also by many run-time factors (like the current load of the CPU, the current memory usage, availability of cache memory and swap memory).

To sum up, neither seconds (or its fractions like microseconds) nor machine cycles turn out to be a faithful measure of the running time of a program. Similarly, memory usage cannot be straightaway measured in bits or words. We must invent some kind of abstractions. The idea is to get rid of the dependence on the run-time environment (including hardware). A second benefit is that we don't have to bother about specific implementations. An argument in the algorithmic level would help us solve our problem of performance measurement.

The abstraction is based on the measurement of time by numbers. Such a number signifies the count of basic operations performed by the program (or algorithm). A definition of what is basic will lead to controversies. In order to avoid that, we will adopt our own conventions. An arithmetic operation (addition, subtraction, multiplication, division, comparison, etc. of integers or floating point numbers) will be treated as a basic operation. So will be a logical and a bitwise operation (AND, OR, shift etc.). We will count (in terms of the input size) how many such basic operations are performed by the program. That number will signify the abstract running time of the program. Usually, the actual running times of different programs are closely proportional to these counts. The constant of proportionality depends on the factors mentioned above and vary from machine to machine (and perhaps from time to time on the same machine). We will simply ignore these constants. This practice will bring under the same umbrella two programs doing n2 and 300n2 basic operations respectively for solving the same problem. Turning blind to constants of proportionality is not always a healthy issue. Nonetheless, the solace is that these abstract measures have stood the taste of time in numerous theoretical and practical situations.

The space requirement of a program can be quantified along similar lines, namely, by the number of basic units (integers or floating-point numbers or characters) used by a program. Again we ignore proportionality constants and do not talk about the requirement of memory in terms of bits.

We will call the above abstract running time and space usage of a program to be its time complexity and space complexity respectively. In the rest of this chapter, we will concentrate mostly on time complexity.

The order notation

The order notation captures the idea of time and space complexities in precise mathematical terms. Let us start with the following important definition:

Let f and g be two positive real-valued functions on the set N of natural numbers. We call g(n) to be of the order of f(n) if there exist a positive real constant c and a natural number n0 such that g(n) <= cf(n) for all n >= n0. In this case we write g(n) = O(f(n)) and also say that g(n) is big-Oh of f(n).

The following figure illustrates the big-Oh notation. In this example, we take c=2.

order notation
Figure: Explaining the big-Oh notation

Examples

We now explain how the order notation is employed to characterize the time and space complexities of a program. We count the number of basic operations performed by an algorithm and express that count as having the order of a simple function. For example, if an algorithm performs 2n2 - 3n + 1 operations on an input of size n, we say that the algorithm runs in O(n2) time, i.e., in quadratic time, or that it is a quadratic time algorithm. Any algorithm that runs in polynomial time is said to be a polynomial-time algorithm. An algorithm that does not run in polynomial time, but in exponential time, is called an exponential-time algorithm. An exponential function (like 2n) grows so rapidly (compared to polynomial functions) with the input n that exponential-time algorithms are usually much slower compared to polynomial-time algorithms, even when the input is not too big. By an efficient solution of a problem, one typically means devising an algorithm for that problem, that runs in some polynomial time O(nd) with d as small as possible.

Examples

We now analyze the complexities of some popular algorithms discussed earlier in the notes.

Worst-case versus average complexity

Our basic aim is to provide complexity figures (perhaps in the O notation) in terms of the input size, and not as a function of any particular input. So far we have counted the maximum possible number of basic operations that need be executed by a program or function. As an example, consider the linear search algorithm. If the element x happens to be the first element in the array, the function linSearch returns after performing only few operations. The farther x can be located down the array, the bigger is the number of operations. Maximum possible effort is required, when x is not at all present in the array. We argued that this maximum value is O(n). We call this the worst-case complexity of linear search.

There are situations where the worst-case complexity is not a good picture of the practical situation. On an average, a program may perform much better than what it does in the worst case. Average complexity refers to the complexity (time or space) of a program (or function) that pertains to a random input. It turns out that average complexities for some programs are markedly better than their worst-case complexities. There are even examples where the worst-case complexity is exponential, whereas the average complexity is a (low-degree) polynomial. Such an algorithm may take a huge amount of time in certain esoteric situations, but for most inputs we expect the program to terminate soon.

We provide a concrete example now: the quick sort algorithm. By partition we mean a partition function for an array of n integers with respect to the first element of the array as the pivot. One may use any one of the three implementations discussed above.

   void quickSort ( int A[] , int n )
   {
      int i;

      if (n <= 1) return;
      i = partition(A,n);        /* Partition with respect to A[0] */
      quickSort(A,i);            /* Recursively sort the left half excluding the pivot */
      quickSort(&A[i+1],n-i-1);  /* Recursively sort the right half */
   }

Let T(n) denote the running time of quickSort for an array of n integers. The running time of the partition function is O(n). It then follows that:

   T(n) <= T(i) + T(n-i-1) + cn + d

for some constants c and d and for some i depending on the input array A. The presence of i on the right side makes the analysis of the running time somewhat difficult. We cannot treat i as a constant for all recursive invocations. Still, some general assumptions lead to easily derivable closed-form formulas for T(n).

An algorithm like quick sort (or merge sort) is called a divide-and-conquer algorithm. The idea is to break the input into two or more parts, recursively solve the problem on each part and subsequently combine the solutions for the different parts. For the quick sort algorithm the first step (breaking the array into two subarrays) is the partition problem, whereas the combining stage after the return of the recursive calls involves doing nothing. For the merge sort, on the other hand, breaking the array is trivial -- just break it in two nearly equal halves. Combining the solutions involves the non-trivial merging process.

It follows intuitively that the smaller the size of each subproblem is, the easier it is to solve each subproblem. For any superlinear function f(n) the sum

   f(k) + f(n-k-1) + g(n)

(with g(n) a linear function) is large when the breaking of n into k,n-k-1 is very skew, i.e., when one of the parts is very small and the other nearly equal to n. For example, take f(n) = n2. Consider the function of a real variable x:

   y = x2 + (n-x-1)2 + g(n)

Differentiation shows that the minimum value of y is attained at x = n/2 approximately. The value of y increases as we move more and more away from this point in either direction.

So T(n) is maximized when i = 0 or n-1 in all recursive calls, for example, when the input array is already sorted either in the increasing or in the decreasing order. This situation yields the worst-case complexity of quick sort:

   T(n) <= T(n-1) + T(0) + cn + d
        =  T(n-1) + cn + d + 1
        <= (T(n-2) + c(n-1) + d + 1) + cn + d + 1
        =  T(n-2) + c[n + (n-1)] + 2d + 2
        <= T(n-3) + c[n + (n-1) + (n-2)] + 3d + 3
        <= ...
        <= T(0) + c[n + (n-1) + (n-2) + ... + 1] + nd + n
        =  cn(n-1)/2 + nd + n + 1,

which is O(n2), i.e., the worst-case time complexity of quick sort is quadratic.

But what about its average complexity? Or a better question is how to characterize an average case here. The basic idea of partitioning is to choose a pivot and subsequently break the array in two halves, the lesser mortals stay on one side, the greater mortals on the other. A randomly chosen pivot is expected to be somewhere near the middle of the eventual sorted sequence. If the input array A is assumed to be random, its first element A[0] is expected to be at a random location in the sorted sequence. If we assume that all the possible locations are equally likely, it is easy to check that the expected location of the pivot is near the middle of the sorted sequence. Thus the average case behavior of quick sort corresponds to

   i = n-i-1 = n/2 approximately.

We than have:

   T(n) <= 2T(n/2) + cn + d.

For simplicity let us assume that n is a power of 2, i.e., n = 2t for some positive integer t. But then

   T(n) =  T(2t)
        <= 2T(2t-1) + c2t + d
        <= 2(2T(2t-2) + c2t-1 + d) + c2t + d
        =  22T(2t-2) + c(2t+2t) + (2+1)d
        <= 23T(2t-3) + c(2t+2t+2t) + (22+2+1)d
        <= ...
        <= 2tT(20) + ct2t + (2t-1+2t-2+...+2+1)d
        =  2t + ct2t + (2t-1)d
        =  cnlog2n + n(d+1) - d.

The first term in the last expression dominates over the other terms and consequently the average complexity of quick sort is O(nlog n).

Recall that bubble sort has a time complexity of O(n2). The situation does not improve even if we assume an average scenario, since we anyway have to make O(n2) comparisons in the nested loop. Insertion and selection sorts attain the same complexity figure. With quick sort, the worst-case complexity is equally poor. But in practice a random array tends to follow the average behavior more closely than the worst-case behavior. That is reasonable improvement over quadratic time. The quick sort algorithm turns out to be one of the practically fastest general-purpose comparison-based sorting algorithm.

We will soon see that even the worst-case complexity of merge sort is O(nlog n). It is an interesting theoretical result that a comparison-based sorting algorithm cannot run in time faster than O(nlog n). Both quick sort and merge sort achieve this lower bound, the first on an average, the second always. Historically, this realization provided a massive impetus to promote and exploit recursion. Tony Hoare invented quick sort and popularized recursion. We cannot think of a modern compiler without this facility.

Also, do you see the significance of the coinage divide-and-conquer?

We illustrated above that recursion made the poly-time Fibonacci routine exponentially slower. That's the darker side of recursion. Quick sort and merge sort highlight the brighter side. When it is your time to make a decision to accept or avoid recursion, what will you do? Analyze the complexity and then decide.

How to compute the complexity of a program?

The final question is then how to derive the complexity of a program. So far you have seen many examples. But what is a standard procedure for deriving those divine functions next to the big-Oh? Frankly speaking, there is none. (This is similar to the situation that there is no general procedure for integrating a function.) However, some common patterns can be identified and prescription solutions can be made available for those patterns. (For integration too, we have method of substitution, integration by parts, and some such standard rules. They work fine only in presence of definite patterns.) The theory is deep and involved and well beyond the scope of this introductory course. We will again take help of examples to illustrate the salient points.

First consider a non-recursive function. The function is a simple top-to-bottom set of instructions with loops embedded at some places in the sequence. One has to carefully study the behavior of the loops and add up the total overhead associated with each loop. The final complexity of the function is the sum of the complexities of each individual instruction (including loops). The counting process is not always straighforward. There is a deadly branch of mathematics, called combinatorics, that deals with counting principles.

We have already deduced the time complexity of several non-recursive functions. Let us now focus our attention to recursive functions. As we have done in connection with quickSort, we write the running time of an invocation of a recursive function by T(n), where n denotes the size of the input. If n is of a particular form (for example, if n has a small value), then no recursive calls are made. Some fixed computation is done instead and the result is returned. In this case the techniques for non-recursive functions need be employed.

Finally, assume that the function makes recursive calls on inputs of sizes n1,n2,...,nk for some k>=1. Typically each ni is smaller than n. These calls take respective times T(n1),T(n2),...,T(nk). We add these times. Furthermore, we compute the time taken by the function without the recursive calls. Let us denote this time by g(n). We then have:

   T(n) = T(n1) + T(n2) + ... + T(nk) + g(n).

Such an equation is called a recurrence relation. There are tools by which we can solve recurrence relations of some particular types. This is again part of the deadly combinatorics. We will not go to the details, but only mention that a recurrence relation for T(n) together with a set of initial conditions (e.g. T(n) for some small values of n) may determine a closed-form formula for T(n) which can be expressed by the Big O notation. It is often not necessary to compute an exact formula for T(n). Proving a lower and an upper bound may help us determine the order of T(n). Recall how we have analyzed the complexity of the recursive Fibonacci function.

We end this section with two other examples of complexity analysis of recursive functions.

Examples


Course home