UCLan, CO2412: Glossary

UCLan, CO2412 (2021/22): Glossary

Adjacency list

Adjacency lists are one of the data structures that can be used to implement graphs. An adjacency list for a vertex (node) contains pointers (or references) to the adjacent vertices, i.e., to the nodes to which it is connected by an edge.

Adjacency list are associated with the vertices (nodes) of a graph.
In case of an undirected graph, where edges do not have a direction (they go both ways), each vertex has exactly one adjacency list.
In case of a directed graph, an implementation can choose to use one adjacency list per vertex (pointing to vertices connected by outgoing edges only), or two adjacency lists: One for adjacent vertices connected by outgoing edges, one for adjacent vertices connected by incoming edges.
Any kind of sequential data structure can be used as an adjacency list, including linked lists or arrays.
Adjacency lists (and incidence lists) perform best for sparse graphs (graphs with few edges), e.g., when there is a constant upper bound for the degree (number of adjacent nodes) for all nodes in the graph.

Adjacency matrix

Adjacency matrices are a data structure that is preferably used to implement dense graphs, i.e., graphs with comparably many edges per node. An adjacency matrix is a two-dimensional data structure, such as a list of lists or a two-dimensional array.

If the adjacency list for a graph is given by adj, the value of adj[i][j] indicates whether there is an edge connecting the node with index i to the node with index j.
Depending on the implementation, the value of adj[i][j] indicating the presence of an edge from i to j might be 1, True, or the edge label (in the case of a graph with labelled edges); the value indicating the absence of an edge might be 0, False, None, etc.
For an undirected graph, there is no distinction between an edge from i to j and an edge from j to i. Therefore, the adjacency matrix of an undirected graph is symmetric, i.e., adj[i][j] = adj[j][i] for all i and j. An efficient data structure implementation can exploit this property in order to reduce memory requirements by 50%.

Balanced tree

In binary search trees, being balanced is a property that ensures that the depth of the tree scales with O(log n). This guarantees that the overall shape of the data structure remains tree-like rather than, e.g., degenerating into the sequential shape of a sorted linked list.

The depth of a tree is the maximum depth of a leaf in the tree, i.e., the maximum distance (number of links/edges on a path) from the root node to another node in the tree. If the implementation of a binary search tree data structure ensures that the depth of the trees will be bounded by O(log n), this means that search and insertion can both be done in O(log n) time.
For a tree to be balanced, it is required that its subtrees are also balanced.
There is no unique definition of the property that would make a tree balanced. (Different kinds of balanced search trees are based on different criteria.)
One possibility is to impose a constant upper bound for the ratio between the size of the greatest branch and the smallest branch. (Such as, for a binary search tree: The ratio between the greater and the smaller branch may not exceed 3.)
Another possibility is to impose a constant upper bound for the difference between the greatest and the smallest depth of a branch. (Such as, for a binary search tree: The difference between the depth of the left and the right subtree may not exceed 2.)

The property of being balanced is not restricted to binary search trees; it can in principle be applied to trees in general.

Binary search tree

A search tree is a tree that is inherently sorted, making it easy to search for an element or to insert a new element. A binary tree is a tree in which each node has at most two links (outgoing edges) to at most two children. These two branches or subtrees are often called called left and right; in a binary search tree, each node stores a data item (e.g., called item) such that:

The data items stored in the subtree left are all smaller than item.
The data items stored in the subtree right are all greater than or equal to item.

Breadth-first search

Traversal is the process of visiting all the elements of a data structure. Breadth-first search (BFS) is one of the standard algorithms for traversing a graph; starting from a given initial node, it continues by always proceeding to the node that was detected earliest.

BFS visits nodes in the order of the number of edges by which they are distant from the initial node.
In an unweighted graph, BFS can be used to determine the shortest paths from the initial node to all other nodes. The graph consisting only of these paths is the BFS spanning tree, also called a breadth-first tree.
Using a queue as an iterator (i.e., auxiliary data structure used for traversal), this traversal algorithm in first-in, first-out (FIFO) order can be implemented by iteratively carrying out the following steps:
1. Visit the present node.
2. Detect all nodes that are undetected so far and can be reached directly from the present node; push (i.e., attach) them to the tail of the queue.
3. Pop (i.e., detach) the first node from the head of the queue and proceed there.
It is not strictly necessary to employ a queue; a variety of workarounds exist that may be more appropriate depending on the use case.
The purpose of "visiting" the nodes and what is done during "visiting" depends on the use case. Anything that might be carried out within a for loop over all nodes is appropriate.

See also: Depth-first search, Graph, Queue

Brute force

As an algorithm design strategy, brute force consists in evaluating each potential solution to the problem, until the right one is found; or, in the case of an optimization problem, until all potential solutions have been visited so that the best one is known.

Brute-force algorithms are easy to design and implement, and it is also comparably easy to conduct a formal verification.
This is usually compensated by a comparably poor performance. Since all potential solutions need to be considered, time efficiency scales with the number of potential solutions.

Depth-first search

Depth-first search (DFS) is one of the standard algorithms for traversing a graph; starting from a given initial node, it continues by always proceeding to the node that was detected most recently.

Using a stack as an iterator, this traversal algorithm in last-in, first-out (LIFO) order can be implemented by iteratively carrying out the following steps:

Visit the present node.
Detect all nodes that are unvisited so far and can be reached directly from the present node; push (i.e., attach) them to the top of the stack.
Pop (i.e., detach) nodes from the top of the stack until an unvisited node is found - proceed there.

For many purposes it may be easier to implement DFS using recursion (internally implemented via a stack of function calls) instead of using a stack data structure to store detected nodes. A graph consisting only of edges used for visiting nodes during DFS is called a DFS spanning tree (or depth-first tree).

See also: Breadth-first search, Graph, Stack

Divide and conquer

Divide and conquer is an algorithm design strategy, specifically, a decomposition technique. The overall problem is recursively decomposed into smaller and smaller subproblems (divide), until a base case with a trivial solution is reached (conquer), and then the partial solutions to the subproblems are combined with each other in order to obtain the complete solution.

Only certain problems can be dealt with by divide-and-conquer algorithms; that is the case when they can be decomposed into disjoint (i.e., non-overlapping) subproblems, so that each partial solution needs to be used only once.
When divide-and-conquer is applicable as a strategy, the resulting algorithms are often both elegant and performant. A classical example for this is the mergesort algorithm, which sorts a list in O(n log n) time, the best possible asymptotic efficiency.
The approach underlying the design strategy is recursive. That does not mean that the implementation needs to use recursive function calls.

Dynamic programming

Dynamic programming is an algorithm design strategy, specifically, a decomposition technique. It can be used whenever the problem can best be decomposed into overlapping subproblems, i.e., smaller versions of the problem among which it is beneficial computationally to reuse or share partial solutions (rather than recompute them). Dynamic programming stores partial solutions in an appropriate data structure, so that they can be recalled and reused when they are needed again.

If divide and conquer does not work because it would lead to carrying out the same computation many times, or because there is no clean split of the whole problem into disjoint portions, this indicates that there are overlapping subproblems. Instead, dynamic programming might be used, storing and recalling partial solutions.
If the greedy design strategy seems to fail because the outcome is not strictly correct or optimal in all cases, it may be reasonable to extend the approach so that it does not limit itself to building a single candidate solution (greedy), but multiple partial solutions to overlapping subproblems. The result is an algorithm following the dynamic programming design strategy.

Exponentiation

Exponentiation is the "power of" operation. A term in which exponentiation is used, such as b^k, consists of the base b and the exponent k.

Taking the logarithm with base b is the inverse operation to exponentiation with the base b: If y = b^k, then k = log_b y. By default, in theoretical computer science, omitting the base when taking the logarithm, as in log y, means that 2 is used as the base.
The exponential function exp(x) = e^x is equivalent to exponentiation with the base e = 2.718 281 828 …

Graph

A graph is a structure consisting of vertices (also called nodes) and edges; it is in use both as a data structure and as a theoretical/mathematical concept. A vertex is understood to be point-like; an edge is a connection between two vertices.

In a labelled graph, vertices and/or edges can carry labels, i.e., numbers, strings, or other data items that are associated with them.
In a directed graph, each edge has a direction; there, an edge serves as a link from its source node to its target node, but not vice versa. In an undirected graph, the edges do not have a direction; there, an edge serves as a link between its two incident vertices, going both ways.
A sequence of edges, where the target of one edge is the source of the next one, is called a path. In an unweighted graph, the length of a path is given by the number of edges; in other words, each edge has the same length, namely 1. In a weighted graph (a kind of labelled graph), the length of an edge is given by its label, and the length of a path by the sum of all the included edge labels.
A directed graph is a tree (also: out-tree) if it has a unique root node and a unique path from the root to each node.
A vertex is in general permitted to have an edge to itself; i.e., an edge is in general permitted to have the same node both as its source and its target. However, there are many use cases where this cannot occur.

Greedy algorithms

As a design strategy, greedy algorithms develop a single candidate solution from the beginning to the end, making incremental local improvements step by step, until the task is finished and the complete solution is returned.

Greedy algorithms are usually comparably easy to design and implement, and they typically have a good time efficiency, in particular, compared to brute-force algorithms.
However, they are not always correct, and even when they are, it may be complicated to prove their correctness. Not all problems are suitable for algorithms based on local improvements to a single candidate solution.
If a greedy algorithm does not return the correct or optimal result, it might return an approximation to it, which could for some purposes be sufficient.

Incidence list

Incidence lists are one of the data structures that can be used to implement graphs. In a graph, two nodes (vertices) are incident to each edge, and in a directed graph, one of these nodes is the source of the edge whereas the other one is its target. Similar as for adjacency lists, an incidence list belongs to a node of the graph: It is a list of edges to which that node is incident.

In a singly-linked implementation, each node has one incidence list, for its outgoing edges; for a doubly-linked implementation, this is extended by a second incidence list with incoming edges. (The distinction between incoming and outgoing edges only applies to directed graphs.)
Incidence lists (like adjacency lists) are typically best suitable for sparse graphs, whereas adjacency matrices are typically more suitable for dense graphs.

Leading contribution

In asymptotic performance, efficiency, and complexity analysis using Landau notation, the amount of time and space (memory) resources required as a function of the problem size n is reduced to the leading contribution, i.e., the one that dominates all the others for large values of n.

The leading contribution is the term that, as n keeps increasing, will eventually exceed all the others in magnitude; e.g., in 5n² + n⁴/1000, the leading contribution is n⁴/1000, because as n → ∞, it eventually becomes and remains the greater one of the two terms.
In addition to eliminating all except the leading contribution, Landau notation also eliminates constant coefficients, so that straightforward classes of problems with the same asymptotic complexity or algorithms with the same asymptotic efficiency are obtained; accordingly, using Landau notation, we write O(n) for linear scaling, O(n²) for quadratic scaling, and so on. For example, the expression 5n² + n⁴/1000 is first reduced to its leading contribution n⁴/1000, and then to the class O(n⁴) by eliminating the constant coefficient 1/1000.
Whenever there is an overall maximum (i.e., a constant upper bound) to resource requirements given as a function of n, this is denoted by O(1), which is often called constant time or constant space, even in cases where that is not strictly correct (the expression may still be a function of n, just a function that does not diverge toward infinity for large n).

Permutations

Permutations are the ways in which the elements of a list can be arranged, resulting in lists with the same elements, but in all possible orders. For a list with n elements, there are n · n-1 · … · 2 · 1 = n! permutations.

The list [3, 2, 4] has 3! = 3 · 2 · 1 = 6 permutations: [3, 2, 4], [3, 4, 2], [2, 3, 4], [2, 4, 3], [4, 3, 2], and [4, 2, 3].

Queue

A queue is a sequential (list-like) dynamic data structure that functions by the principle first in, first out (FIFO). It has a method for appending an element, usually called push and done at the tail of the queue, and a method for detaching an element, usually called pop and done at the head of the queue.

Singly and doubly linked lists are well suitable for implementing a queue, since both push and pop can be realized in constant time. However, a singly linked list should only be used if the data structure includes an explicit reference to the tail node; otherwise, the whole list needs to be traversed just to reach the tail, taking O(n) time.
A dynamic array is less suitable for this purpose, since insertion/deletion operations done close to the head require shifting all the remaining elements in memory. Therefore, one of the two operations (the one done on the head element) would take O(n) time.

Stack

A stack is a sequential (list-like) dynamic data structure that functions by the principle last in, first out (LIFO). It has a method for appending an element, usually called push and done at the head of the stack, and a method for detaching an element, usually called pop and also done at the head of the stack.

Linked lists can be used straightforwardly as a stack, such that all the push and pop are applied to the head of the list in constant time.
Dynamic arrays are well suitable for implementing a stack; in this case push and pop need to be done at the end of the array, not at the front, since it is only insertion/deletion at the end or close to the end that takes O(1) time for a dynamic array. In the case of insertion, for the push operation, O(1) time is only reached in the (frequent) best case, when there is free capacity. In the worst case, this still requires O(n), since the whole list needs to be copied into a new region in memory where there is more free space.

Tree

A tree is a hierarchical linked data structure, where each node has outgoing links to a number of children, i.e., nodes at the next lower level; at the highest level of a tree (the level with depth 0), there is exactly one node which is called the root.

A tree is a graph with a unique root node and a unique path from the root to each node; consequently, a tree may not contain any cycles or diamond-like structures. (This definition of a tree is also referred to as a rooted tree or an out-tree.)
Each child of the root node is itself the root of a subtree (or branch) of the tree; a node without children is called a leaf.
The depth of a node is given by the number of links that need to be followed in order to reach that node, starting from the root. The depth of a tree is the maximum depth of a leaf in that tree or, equivalently, the length of the longest path from the root to a leaf.

Index