Data Structures and Algorithms

Lecture 2: Complexity Analysis

The `vector<>` template

#include <vector>

Description

Variable-size array
Implemented as a container template
Must specify type at compile time – however, not locked into the size of the vector

Adding values

.push_back() member function resizes the vector and appends the additional element to the end.
More efficient to use .resize(int) if you know the size of the vector ahead of time, then to add all elements.
- Don't have to resize every time.

Accessing elements

vector<> overloads operator[](), so indexing can be done like an array.
Alternatively, the member function .at() function can be used.
Out-of-bounds checking.

Iteration

for (size_t i = 0; i < values.size(); ++i) {
     cout << values[i] << endl;
}

Other STL containers

Stack

A last-in, first-out data structure
#include <stack>
Initialization: stack<int> values;
Can push and pop, as well as a new operation: T top();. This returns the value on the top of the stack.

Queue

A first-in, first-out data structure
#include <queue>
Initialization: queue<int> values;
Can push from back, pop from front, and peek at front by using T front();

Common member functions to both

void push(elem);
void pop();
bool empty();

Deque

A deque, pronounced "deck", is a double-ended queue. Instead of being restricted to pushing and popping from a single end, you can use both ends.

STL stack and queue are actually implemented with a deque!

Member functions

void push_front(elem);
T front();
void pop_front();

void push_back(elem);
T back();
void pop_back();

bool empty();

Speeding Up Output

C++ cout is slow, because endl causes the buffer to flush. This delegates it to the OS, so it's slower.

Faster options:

Use '\n', stringstreams to print everything all at once
Turn off C/C++ I/O synchronization with the following command:

#ifdef NDEBUG
    ios_base::sync_with_stdio(false);
#endif

Make sure to use the -DNDEBUG flag on compilation, though. It helps to put this in a makefile for release builds only.

Complexity Analysis

There are typically many different algorithms to accomplish the same task, but some are definitely better than others. Complexity analysis is a way to sift out the bad stuff.

What affects runtime?

In order from most important to least important:

The algorithm: By relative measure, this is always on top. If there is an algorithm to do something in linear time when the competing implementation is in polynomial time, there's a clear winner.
Implementation details: Without knowledge of good programming style, there can be very subtle and tricky errors that can slow the algorithm down. For example, when iterating through a 2D array, you should always go row-major style to access sequential elements in memory.
CPU speed and memory: Usually, by investing a lot of money, you can only get a 2x-5x improvement in computing power. Not much.
Compiler options: The -g flag adds annotations to your files and should only be used for debugging. The -O3 flag turns on the highest level of optimization.
Parallel programs: What is running at the same time as your program in the processor. Virtually makes no difference.
Input size: This is the hardest thing to change, anyway.

Input Size vs Runtime

Rate of growth is independent of most factors
- CPU, compiler, etc.
Key question: does a fast algorithm remain fast at large $n$?

Measuring Input Size

Number of bits: Hard to measure (are you counting ints and doubles correctly? Machines vary in the amount of memory they allocate for data types).
Number of items: Too vague, and domain-specific

Some Terms

$n$: size of input
$f(n)$: max number of steps with a size of input of $n$
$O(f(n))$: upper bounds of $f(n)$ times a constant on another function

Input Size Ambiguity

Let's say you have a graph $G = \left$, where $V = 5$ and $E = 6$. What would you say $n$ is here? It could depend on if it is more significant to talk about the number of vertices, edges, or both.

Counting Program Steps

Steps in a program are considered to be single expressions. These include:

Variable assignment
Arithmetic operations (not all the same)
Comparisons
Array indexing
Function calls

And many more.

Here is a walkthrough of a basic step counting exercise:

int sum = 0;                    // init: 1
for (int i = 0; i < n; ++i) {   // init: 1, test: n, update: n, test fail: 1
    sum += i;                   // 1*n
}
return sum;                     // 1

Therefore, the sum of steps is $4 + 3n$. Note that loops generally take $2n + 2$ steps.

Big O Definition

$f(n) = O(g(n))$ iff there exist constants $c > 0$ and $n_0 > 0$ such that:

$f(n) \leq c*g(n)$

whenever $n > n_0$.

Limit Definition

This is a sufficient, but not necessary condition:

$\lim_{x\to\infty} \frac{f(n)}{g(n)} = d < \infty$

For example, is $\log_2(n) = O(2n)$?

$\lim_{x\to\infty} \frac{f(n)}{g(n)} = \frac{\log_2(n)}{2n} = \frac{1}{2n} = 0 = d < \infty$

Yes.

Remember that you can drop constants in front of either $f(n)$ or $g(n)$ and not have anything change. Therefore, $3n^2 = O(5n^2) = O(15n^2)$.

Order of Growth of Function Families

$O\left(1\right)$: Linear
$O\left(\log (n)\right)$: Logarithmic
$O\left(n \log (n)\right)$: Loglinear
$O\left(n^2\right)$: Quadratic
$O\left(n^3\right), O(n^4)$: Polynomial
$O(c^n)$: Exponential
$O\left(n!\right)$: Factorial
$O\left(2^{2^n}\right)$: Doubly exponential