Graph theory and patterns
Graph theory and patterns
Terminology
1. Node = each individual item of a graph
2. Edge = The connections between nodes
3. Directed = Edges have a direction to them. Think about one-way flights etc.
4. Undirected = Edges have no direction. You can go either way. Think about two-way
streets.
5. Weighted = Edges have a weight to them (often a number). This is used to often
describe the cost or length of travelling the edge.
6. Unweighted = Edges have no weight to them
7. Path = A set of edges/nodes that goes from node U → node V
8. Simple path = a set of unique (only appears once) edges/nodes that goes from
node U → node V
9. Multigraph = Graphs that contain edges which leave and enter the same node, and pairs
of nodes with multiple edges connecting them.
10.Acyclic graphs = Graphs that contain no cycle
11.Cyclic graphs = Graphs that contain a cycle
12.Directed acyclic graphs = Graphs that are directed and have no cycles
13.Trees = Graphs that are DAG’s and have n nodes and n - 1 edges
14.Colouring = The task of assigning a colour to a given node
15.Bipartite graphs = Graphs in which you can colour each node blue or red such that no
adjacent nodes share the same colour. If a graph has an odd length cycle, then it’s not
bipartite.
16.Components = Each connected section of a graph if there are two parts of a graph that
aren’t connected, then they are two components.
17.Connectivity = How the graph is connected
18.Neighbours = All the connected nodes to a given node
19.Degree = How many neighbours a node has
20.In degree = In a directed graph, the in degree is how many nodes have edges directed
towards the current node
21.Out degree = In a directed graph, the out degree is how many edges the node has
directed out to other nodes
Graph representation
Adjacency matrix
In an adjacency matrix, you store all connections between nodes in a n * n matrix where each
node has an array of integers representing whether node i and node j are connected (1) or not
(0).
It has a memory complexity of O(n²). They are represented as an array of arrays often like this:
am[1000][1000];
Tips:
- Adjacency matrices can be stored as an array of bitmasks if n <= 62
- Adjacency matrices cannot be stored if n > 5000 (memory limit issue)
- Use adjacency matrices for fast connection lookups
- Use adjacency matrices for dense graphs (each node has potential of high degree)
Adjacency List
These store for every node, all neighbours of that node. They store the node value, not whether
it is connected or not. Therefore, adjacency lists have lower memory requirements and can be
used for larger data sizes. In addition, they also go through each node quicker than adjacency
matrices but lookup for connectivity is slower.
It has a memory complexity of O(n + m). You are storing all nodes and all connections in the
array. Adjacency lists are often represented as an array of vectors:
vector<int> arr[100000];
Tips:
- Very fast for BFS and DFS
- Can be stored for larger graph sizes
- Can be used with multigraphs
- Can hold other pieces of info (i.e. edge weight)
Graph question patterns
1. To check in a directed graph: ‘Can I reach node X from node Y’ where Y can be multiple
nodes, reverse the graph and run a DFS on node X. This will return all nodes where X
is part of their path. By reversing the graph, you get the steps of all the other paths
‘retraced’. Topic: graph connectivity
2. Some problems may involve using binary search the answer + dfs (or bfs). If you
binary search a value, run dfs on it, see if some condition is met and store that value,
you can do that.
Topic: BSTA + DFS
3. If a graph is bipartite, there musn’t be an odd length cycle. In addition, you can run dfs
on a node, colour it in with a colour x, and then colour all of its uncoloured neighbours
with the opposite colour. If a neighbour is already coloured and it's the same as the
current node, it's not bipartite. Topic: checking bipartiteness
4. In order to check if a node is in a cycle, run dfs. If a neighbour is equal to the original
node, then it is in a cycle. Only works for directional graphs. Topic: Cyclicity
5. To simulate what it would be like if an edge is removed, use an adjacency matrix and
change both values in the matrix to 0. Then revert after the process is finished. Topic:
removal of edges
6. If you need to ‘close’ a node/not visit it ever again. Then you can set another array
(similar to visited) but whether it's closed or not. Then if you dfs and run into that node,
you can return. Topic: removal of nodes
7. To retrace steps from path A → B, you can store a parent array that stores the parent of
the current node. Then, you can loop backwards from B → A and store the parent node
in an array and print it out in reverse order. Topic: BFS minimum path
8. If a dfs requires some condition, say edge weight >= x, you can include that condition
inside the dfs condition (!visited[u] && {condition}) dfs(u) to avoid reconstructing the
adjacency matrix/list again and speed up the time complexity. Topic: property
exclusion
9. To ‘colour’ each individual component of a graph, iterate through every node, run a dfs
on it, and for every single node stored in that dfs, assign them all an ‘id’ to represent the
component. Then if you iterate through that array storing the ids, you can check if two
nodes are in the same component in O(1) time. Topic: checking component
connectivity
10.If some problem has some property that needs to be checked/required such as visited,
distance, parent, avoid, component, etc. then you should store an array for all nodes and
set values accordingly. Topic: graph properties
11.If you colour the nodes in a graph, then iterate through each node and store them in an
array based on their colour, you can get an array storing all connected components of
the graph. Topic: storing connected components
12.If you are performing a flood fill with dynamic updates, e.g. if you visit a cell that will then
allow some other cell to be visitable, if it's a valid traversal, you should immediately DFS
that cell. Topic: flood fill with dynamic updates
13.IMPORTANT! One way you can view graphs (and or BFS/DFS) is that they can
represent many things, more importantly they can represent state spaces. Given a range
of values (5000, 5000 maximum), you can perform a BFS/DFS on the state spaces and
search all possible decisions/outcomes and store meaningful data. Topic: Searching
state spaces
14.In dijkstra’s algorithm, you can figure out of a node is part of the shortest path from A →
B if the edge weight between the node and B (or the current last node) when subtracted
from the shortest distance of B is the distance of the node, the its part of the shortest
path. A → (5) → B → (2) → C. The cost of c == 7, the minimum cost of b = 5. C - (b →
(2) → c) == 5, therefore B is part of the shortest path to C. This is useful for multiple
shortest paths, but for one shortest path, store the previous node. Topic: Backtracking
nodes in a shortest path (dijkstra)
15.Sometimes, there may be some X states that you need to track and they may interact
with each other (check out CSES flight discount). In that case, you want to calculate the
shortest path with dijkstra for each state, and if any values can change state, also push
them to the pq. Topic: multi-state dijkstra
16.If you run a DFS, (also storing the previously visited node, -1 for the first node), and you
encounter an already visited node that is not the previous node, then you have found a
cycle. If you also store the parent of each node, once you find a cycle, you can break out
of the DFS and then print out the parent until the node is equal to the first node in the
cycle. Topic: finding and printing cycles in a graph
17.To figure out how many edges you need to remove to make a graph a forest, well we
know that each component must have n - 1 edges, so if we count all the components
and all of the required edges per component (and sum them), we can subtract the two
quantities to figure out how many edges we need to remove to make it a forest. Topic:
edge removal to make forest
18.If you are given a component on a grid (mapped out via a flood fill), you can find its
perimeter by summing up the total amount of neighbours (including OOB values) the
component has that is not equal to the component identifier (i.e. if the component is
made of 1s we do not count neighbouring 1s). Topic: perimeter of components in a
2D implicit grid
19.