UCLan, CO2412 (2021/22): Glossary
Adjacency list
Adjacency lists are one of the data structures that can be used to
implement graphs. An adjacency list for a vertex (node) contains
pointers (or references) to the adjacent vertices, i.e., to the
nodes to which it is connected by an edge.
- Adjacency list are associated with the vertices (nodes) of a graph.
- In case of an undirected graph, where edges do not have a
direction (they go both ways), each vertex has exactly one adjacency list.
- In case of a directed graph, an implementation can choose to use one
adjacency list per vertex (pointing to vertices connected by outgoing edges only),
or two adjacency lists: One for adjacent vertices connected by outgoing edges,
one for adjacent vertices connected by incoming edges.
- Any kind of sequential data structure can be used as an adjacency list,
including linked lists or arrays.
- Adjacency lists (and incidence lists) perform
best for sparse graphs (graphs with few edges), e.g.,
when there is a constant upper bound for the
degree (number of adjacent nodes) for all nodes in the graph.
See also: Adjacency matrix,
Graph,
Incidence list
Adjacency matrix
Adjacency matrices are a data structure that is preferably used to
implement dense graphs, i.e., graphs with
comparably many edges per node. An adjacency matrix is a two-dimensional
data structure, such as a list of lists or a two-dimensional array.
- If the adjacency list for a graph is given by adj,
the value of adj[i][j] indicates whether there is an edge
connecting the node with index i to the node with index j.
- Depending on the implementation, the value of adj[i][j] indicating the presence of
an edge from i to j might be 1, True,
or the edge label (in the case of a graph with labelled edges);
the value indicating the absence of an edge might
be 0, False, None, etc.
- For an undirected graph, there is
no distinction between an edge from i to j
and an edge from j to i. Therefore, the adjacency matrix
of an undirected graph is symmetric, i.e.,
adj[i][j] = adj[j][i] for all i and j.
An efficient data structure implementation can exploit this
property in order to reduce memory requirements by 50%.
See also: Adjacency list,
Graph,
Incidence list
Balanced tree
In binary search trees, being balanced
is a property that ensures that the depth of the tree scales with O(log n).
This guarantees that the overall shape of the data structure remains tree-like rather than,
e.g., degenerating into the sequential shape of a sorted linked list.
- The depth of a tree is the maximum depth of a leaf in the tree, i.e.,
the maximum distance (number of links/edges on a path) from the root node to another node in the tree.
If the implementation of a binary search tree data structure ensures that the
depth of the trees will be bounded by O(log n), this means that
search and insertion can both be done in O(log n) time.
- For a tree to be balanced, it is required that its subtrees are also balanced.
- There is no unique definition of the property that would make
a tree balanced. (Different kinds of balanced search trees are based on different criteria.)
- One possibility is to impose a constant upper bound for the ratio between
the size of the greatest branch and the smallest branch. (Such as, for a binary
search tree: The ratio between the greater and the smaller branch may not exceed 3.)
- Another possibility is to impose a constant upper bound for the difference between
the greatest and the smallest depth of a branch. (Such as, for a binary search
tree: The difference between the depth of the left and the right subtree may not exceed 2.)
The property of being balanced is not restricted to binary search trees;
it can in principle be applied to trees in general.
See also: Binary search tree,
Tree
Binary search tree
A search tree
is a tree that is inherently sorted, making it
easy to search for an element or to insert a new element.
A binary tree is a tree in which each node has at most
two links (outgoing edges) to at most two children. These two branches or subtrees
are often called called left and right; in a binary search tree,
each node stores a data item (e.g., called item) such that:
- The data items stored in the subtree left are all smaller than item.
- The data items stored in the subtree right are all greater than or equal to item.
See also: Balanced tree,
Tree
Breadth-first search
Traversal is the process of visiting all the elements
of a data structure. Breadth-first search (BFS) is one of the standard algorithms for traversing a graph;
starting from a given initial node, it continues by always proceeding to the
node that was detected earliest.
- BFS visits nodes in the order of
the number of edges by which they are distant from the initial node.
- In an unweighted graph, BFS can be used to determine the shortest paths
from the initial node to all other nodes. The graph consisting only of these
paths is the BFS spanning tree, also called a breadth-first tree.
- Using a queue
as an iterator (i.e., auxiliary data structure used for traversal),
this traversal algorithm in first-in, first-out (FIFO) order
can be implemented by iteratively carrying out the following steps:
- Visit the present node.
- Detect all nodes that are undetected so far and can be reached
directly from the present node; push (i.e., attach) them to the tail of the queue.
- Pop (i.e., detach) the first node from the head of the queue and proceed there.
It is not strictly necessary to employ a queue;
a variety of workarounds exist that may be more
appropriate depending on the use case.
- The purpose of "visiting" the nodes
and what is done during "visiting"
depends on the use case. Anything that
might be carried out within a for loop
over all nodes is appropriate.
See also: Depth-first search,
Graph,
Queue
Brute force
As an algorithm design strategy, brute force consists in
evaluating each potential solution to the problem, until the right one is found;
or, in the case of an optimization problem, until all potential solutions have been
visited so that the best one is known.
- Brute-force algorithms are easy to design and implement,
and it is also comparably easy to conduct a formal verification.
- This is usually compensated by a comparably poor performance.
Since all potential solutions need to be considered,
time efficiency scales with the number of potential solutions.
See also: Divide and conquer,
Dynamic programming,
Greedy algorithms
Depth-first search
Depth-first search (DFS) is one of the standard algorithms for traversing a graph;
starting from a given initial node, it continues by always proceeding to the
node that was detected most recently.
Using a stack as an iterator,
this traversal algorithm in last-in, first-out (LIFO) order
can be implemented by iteratively carrying out the following steps:
- Visit the present node.
- Detect all nodes that are unvisited so far and can be reached
directly from the present node; push (i.e., attach) them to the top of the stack.
- Pop (i.e., detach) nodes from the top of the stack until an unvisited node is found - proceed there.
For many purposes it may be easier to implement DFS using
recursion (internally implemented via a stack of function
calls) instead of using a stack data structure to store detected nodes.
A graph consisting only of edges used for visiting nodes during DFS
is called a DFS spanning tree (or depth-first tree).
See also: Breadth-first search,
Graph,
Stack
Divide and conquer
Divide and conquer is an algorithm design strategy,
specifically, a decomposition technique. The overall problem is
recursively decomposed into smaller and smaller subproblems (divide),
until a base case with a trivial solution is reached (conquer),
and then the partial solutions to the subproblems
are combined with each other in order to obtain the complete solution.
- Only certain problems can be dealt with by divide-and-conquer algorithms;
that is the case when they can be decomposed into disjoint (i.e., non-overlapping)
subproblems, so that each partial solution needs to be used only once.
- When divide-and-conquer is applicable as a strategy, the resulting algorithms
are often both elegant and performant. A classical example for this is
the mergesort algorithm, which sorts a list in O(n log n) time,
the best possible asymptotic efficiency.
- The approach underlying the design strategy is recursive.
That does not mean that the implementation needs to use recursive function calls.
See also: Brute force,
Dynamic programming,
Greedy algorithms
Dynamic programming
Dynamic programming is an algorithm design strategy, specifically,
a decomposition technique. It can be used whenever the problem can best be
decomposed into overlapping subproblems, i.e., smaller versions
of the problem among which it is beneficial computationally to reuse or
share partial solutions (rather than recompute them). Dynamic programming
stores partial solutions in an appropriate data structure,
so that they can be recalled and reused when they are needed again.
- If divide and conquer does not work
because it would lead to carrying out the same computation many times,
or because there is no clean split of the whole problem into disjoint
portions, this indicates that there are overlapping subproblems.
Instead, dynamic programming might be used, storing and recalling
partial solutions.
- If the greedy design strategy
seems to fail because the outcome is not strictly correct or optimal in
all cases, it may be reasonable to extend the approach so that it
does not limit itself to building a single candidate solution (greedy), but multiple partial solutions
to overlapping subproblems. The result is an algorithm following the
dynamic programming design strategy.
See also: Brute force,
Divide and conquer,
Greedy algorithms
Exponentiation
Exponentiation is the "power of" operation. A term in which
exponentiation is used, such as bk, consists of the base b
and the exponent k.
- Taking the logarithm with base b is the inverse operation
to exponentiation with the base b: If y = bk,
then k = logb y. By default, in theoretical
computer science, omitting the base when taking the logarithm, as in log y,
means that 2 is used as the base.
- The exponential function exp(x) = ex
is equivalent to exponentiation with the base e = 2.718 281 828 …
Graph
A graph is a structure consisting of vertices (also called nodes) and edges;
it is in use both as a data structure and as a theoretical/mathematical concept.
A vertex is understood to be point-like; an edge is a connection between two vertices.
- In a labelled graph, vertices and/or edges can carry labels, i.e., numbers, strings, or other data items that are associated with them.
- In a directed graph, each edge has a direction; there, an edge serves as a link from its source node to its target node, but not vice versa. In an undirected graph, the edges do not have a direction; there, an edge serves as a link between its two incident vertices, going both ways.
- A sequence of edges, where the target of one edge is the source of the next one, is called a path. In an unweighted graph, the length of a path is given by the number of edges; in other words, each edge has the same length, namely 1. In a weighted graph (a kind of labelled graph), the length of an edge is given by its label, and the length of a path by the sum of all the included edge labels.
- A directed graph is a tree (also: out-tree) if it has a unique root node and a unique path from the root to each node.
- A vertex is in general permitted to have an edge to itself; i.e., an edge is in general permitted to have the same node both as its source and its target. However, there are many use cases where this cannot occur.
See also: Adjacency list,
Adjacency matrix,
Breadth-first search,
Depth-first search,
Incidence matrix,
Tree
Greedy algorithms
As a design strategy, greedy algorithms develop
a single candidate solution from the beginning to the end,
making incremental local improvements step by step, until the
task is finished and the complete solution is returned.
- Greedy algorithms are usually comparably easy to design and implement,
and they typically have a good time efficiency, in particular, compared
to brute-force algorithms.
- However, they are not always correct, and even when they are, it may be
complicated to prove their correctness. Not all problems are suitable for
algorithms based on local improvements to a single candidate solution.
- If a greedy algorithm does not return the correct or optimal result,
it might return an approximation to it, which could for some purposes be sufficient.
See also: Brute force,
Divide and conquer,
Dynamic programming
Incidence list
Incidence lists are one of the data structures that can be used
to implement graphs. In a graph, two nodes (vertices) are
incident to each edge, and in a directed graph, one of these nodes is the source
of the edge whereas the other one is its target. Similar as for adjacency lists,
an incidence list belongs to a node of the graph: It is a list of edges to which that node is incident.
- In a singly-linked implementation, each node has one incidence list, for its outgoing edges; for a doubly-linked implementation, this is extended by a second incidence list with incoming edges. (The distinction between incoming and outgoing edges only applies to directed graphs.)
- Incidence lists (like adjacency lists) are typically best suitable for sparse graphs, whereas adjacency matrices are typically more suitable for dense graphs.
See also: Adjacency list,
Adjacency matrix,
Graph
Leading contribution
In asymptotic performance, efficiency, and complexity analysis using
Landau notation, the amount of time and space (memory) resources required
as a function of the problem size n is reduced to the leading contribution,
i.e., the one that dominates all the others for large values of n.
- The leading contribution is the term that, as n keeps increasing,
will eventually exceed all the others in magnitude; e.g., in
5n2 + n4/1000, the leading contribution
is n4/1000, because as n → ∞,
it eventually becomes and remains the greater one of the two terms.
- In addition to eliminating all except the leading contribution, Landau notation
also eliminates constant coefficients, so that straightforward classes of
problems with the same asymptotic complexity or algorithms with the same
asymptotic efficiency are obtained; accordingly, using Landau notation,
we write O(n) for linear scaling, O(n2)
for quadratic scaling, and so on. For example, the
expression 5n2 + n4/1000 is first
reduced to its leading contribution n4/1000, and then
to the class O(n4) by eliminating the constant coefficient 1/1000.
- Whenever there is an overall maximum (i.e., a constant upper bound)
to resource requirements given as a function of n, this is denoted by O(1),
which is often called constant time or constant space,
even in cases where that is not strictly correct (the expression
may still be a function of n, just a function that does
not diverge toward infinity for large n).
Permutations
Permutations are the ways in which the elements of a list can be arranged,
resulting in lists with the same elements, but in all possible orders.
For a list with n elements, there are
n · n-1 · … · 2 · 1 = n! permutations.
- The list [3, 2, 4] has 3! = 3 · 2 · 1 = 6 permutations:
[3, 2, 4], [3, 4, 2], [2, 3, 4],
[2, 4, 3], [4, 3, 2], and [4, 2, 3].
Queue
A queue is a sequential (list-like) dynamic data structure
that functions by the principle first in, first out (FIFO).
It has a method for appending an element, usually called push and
done at the tail of the queue, and a method for detaching an element,
usually called pop and done at the head of the queue.
- Singly and doubly linked lists are well suitable for
implementing a queue, since both push and pop
can be realized in constant time. However, a singly linked
list should only be used if the data structure includes an explicit
reference to the tail node; otherwise, the whole list needs to
be traversed just to reach the tail, taking O(n) time.
- A dynamic array is less suitable for this purpose, since
insertion/deletion operations done close to the head require
shifting all the remaining elements in memory. Therefore,
one of the two operations (the one done on the head element)
would take O(n) time.
See also: Breadth-first search,
Stack
Stack
A stack is a sequential (list-like) dynamic data structure
that functions by the principle last in, first out (LIFO).
It has a method for appending an element, usually called push and
done at the head of the stack, and a method for detaching an element,
usually called pop and also done at the head of the stack.
- Linked lists can be used straightforwardly as a stack,
such that all the push and pop are applied to
the head of the list in constant time.
- Dynamic arrays are well suitable for implementing a stack;
in this case push and pop need to be done
at the end of the array, not at the front, since it is only
insertion/deletion at the end or close to the end
that takes O(1) time for a dynamic array. In the case of insertion,
for the push operation, O(1) time is only reached in the
(frequent) best case, when there is free capacity.
In the worst case, this still requires O(n), since
the whole list needs to be copied into a new region in memory where
there is more free space.
See also: Depth-first search,
Queue
Tree
A tree is a hierarchical linked data structure,
where each node has outgoing links to a number of children, i.e., nodes at the next lower level;
at the highest level of a tree (the level with depth 0), there is exactly one node which is called the root.
- A tree is a graph with a unique root node and a unique path from the root to each node; consequently, a tree may not contain any cycles or diamond-like structures. (This definition of a tree is also referred to as a rooted tree or an out-tree.)
- Each child of the root node is itself the root of a subtree (or branch) of the tree; a node without children is called a leaf.
- The depth of a node is given by the number of links that need to be followed in order to reach that node, starting from the root. The depth of a tree is the maximum depth of a leaf in that tree or, equivalently, the length of the longest path from the root to a leaf.
See also: Balanced tree,
Binary search tree,
Graph
Index