Graph Theory: Loop Detection and Extraction in Directed Graphs

Graph Theory: Loop Detection and Cycle Extraction in Directed Graphs#

Vertex#

Edge (Directed, Undirected)#

If there is an edge pointing to a vertex, we say that the in-degree of this vertex is 1; similarly, if there is an edge departing from the vertex, we say that the out-degree of this vertex is 1.

Representation of Graphs#

Adjacency Matrix

1-1

Assuming a graph with n vertices and m edges, declare a m * n two-dimensional array, where G[i][j] = 1 indicates that i → j is connected.

However, there are several issues:

When traversing elements, both existing and non-existing edges must be checked, leading to low traversal efficiency.
Cannot store duplicate edges.
When the number of vertices is large, memory overhead can be significant.
Space utilization is not high.
When storing undirected graphs, since the matrix is symmetric, the information stored in paired elements at symmetric positions is redundant, leading to low space utilization.

Adjacency List

1-2

Use an array adj to store all vertices reachable from this point, for example: adj[i] = [a, b, c, d], where adj[i] can be a linked list or an array.

The structure of the associated field cellValue in the data table can also be seen as a type of adjacency list (starting from one id, associated with an array of ids): id → record → linkfield → cellvalue [id1, id2, id3]

Cycle Detection Methods in Graphs#

Disjoint Set#

Detailed Explanation of Disjoint Set -- Illustrated, Simple and Easy to Understand (Transfer)

Q & A

Why is path compression needed? How can it be compressed?

In the worst case, the search path may degenerate into a chain, greatly affecting search efficiency;

The pre array does not need to store the actual relationships between nodes; its only purpose is to determine whether any 2 nodes have a common ancestor;

Methods of compression:
1. Union by rank
2. Find and union simultaneously
Under what circumstances does it indicate a cycle?

When the 2 nodes to be merged already have a common ancestor, it indicates that a cycle may exist (note that it is possible).
Under what circumstances can there be a common ancestor but no cycle?

In the case of a directed graph, it is important to know that the edges of a directed graph have direction; even if 2 nodes have a common ancestor, it does not necessarily form a cycle, for example: [[1,2], [1,3], [2,3]].

However, the problem below only involves undirected graphs, so no special handling is required.

// [https://leetcode-cn.com/problems/graph-valid-tree/](https://leetcode-cn.com/problems/graph-valid-tree/)

// Input: n = 5, edges = [[0,1],[0,2],[0,3],[1,4]]
// Output: true

// Input: n = 5, edges = [[0,1],[1,2],[2,3],[1,3],[1,4]]
// Output: false

const int maxN = 2010;
int pre[maxN];

int find(int x) {
  return x == pre[x] ? x : pre[x] = find(pre[x]); // Path compression, find and compress simultaneously
}

void _union(int x, int y) {
  int fx = find(x), fy = find(y);
  if (fx != fy) {
    pre[fx] = fy;
  }
}

class Solution {
  public:
    bool validTree(int n, vector<vector<int> >& edges) {
      for (int i = 0; i < n; i++) {
        pre[i] = i;
      }
      for (int i = 0; i < edges.size(); i++) {
        int a = edges[i][0], b = edges[i][1];
        if (find(a) == find(b)) {
          return false;
        }
        _union(a, b);
      }
			// Multiple disconnected trees are also not a valid tree
      set<int> preSet;
      for (int i = 0; i < n; i++) {
        preSet.insert(find(i));
      }
      return preSet.size() == 1;
    }
};

Topological Sorting#

What is a topological relationship?

For example, a course has its prerequisite courses.

1-3

Representing it graphically:

Untitled

The mutual relationships between courses represent a topological relationship.

What is a topological sequence, and how to find a topological sequence?

Knowing the prerequisite courses for any course, if you were to create a course schedule for students over four years of college, the sequence of courses in this schedule is actually a topological sequence.

The specific approach: In a directed graph, find the vertices with an in-degree of 0, place these vertices in a queue, and repeatedly remove these vertices and their associated edges. By repeating this step, a topological sequence can be obtained.


// Topological Sorting

// input: [[1,0],[2,0],[3,1],[3,2]]
// output: [0, 1, 2, 3] or [0, 2, 1, 3]

const edges = [[1,0],[2,0],[3,1],[3,2]];

const inDegree = [];
// Calculate the in-degree of vertices
edges.forEach(edge => {
  const [target, source] = edge;
  inDegree[target]++;
});

// Find vertices with in-degree of 0
let queue = [];
inDegree.forEach((val, index) => {
  if (val === 0) {
    queue.push(index)
  };
});

const res = [];

while (queue.length) {
  const cur = queue.shift();
  res.push(cur);
  edges.forEach(edge => {
    const [target, source] = edge;
    if (source === cur) {
      inDegree[target]--;
			// New vertices with in-degree of 0 will definitely be generated here
      if (inDegree[target] == 0) {
        queue.push(target);
      }
    }
  })
}

console.log(res)

If a vertex with an in-degree of 0 cannot be found, does it necessarily indicate a cycle?

Intuitively, yes, but a proof method has not yet been found.

How to Extract Cycles While Detecting Them?#

To extract a cycle means we need to traverse the entire graph at least once. A straightforward approach is to use depth-first traversal.

Algorithm explanation: https://www.youtube.com/watch?v=rKQaZuoUR4M

Q & A

Why use 3 colors to mark nodes?

It is essentially a form of pruning; nodes marked with color 2 no longer need to be accessed.
What is the purpose of the pre array?

It records the parent nodes of the nodes. When a cycle is found, starting from the last visited node in dfs, it sequentially searches for its parent nodes until a cycle is formed or the path breaks (meaning this node has no parent), at which point this process constructs the cycle.
What is the complexity?

Assuming the number of vertices is V and the number of edges is E, since each edge must be traversed once, the time complexity is O(V+E), and the space complexity is O(V) (in fact, we need additional space to store the cycle structure, so in the worst case, the space occupied should be cv1+cv2+…+cvv＝2^v).
What are the potential pitfalls?

A graph that is too deep may cause a stack overflow.

JavaScript Version:

// [https://cses.fi/problemset/task/1678](https://cses.fi/problemset/task/1678)

// const n = 4, m = 5;
// const edges = [[1,3], [2,1], [2,4], [3,2], [3,4]];
//const n = 10, m = 20;
//const edges = [[9,8], [2,9],[7,5], [4,5],[1,5],[3,8],[4,2],[5,4],[6,5],[3,6],[8,10],[10,9],[10,7],[9,3],[7,6],[8,7],[7,3],[8,9],[7,10],[2,1]]
const n = 6, m = 7;
const edges = [[1, 2], [2, 3], [3, 1], [1, 4], [4, 6], [6, 5], [5, 4]];

const pre = Array(n + 1).fill(-1);

const color = Array(n + 1).fill(0);

// Build adjacency list
const adj = [[], [], [], [], [], [], [], [], []]; // Not sure why, initializing with Array(n + 1).fill([]) causes an error
edges.forEach(edge => {
  const [source, target] = edge;
  adj[source].push(target);
});

const cycles = [];

const buildCycle = (start, end) => {
  const cycle = [start];
  for (let cur = end; cur !== start; cur = pre[cur]) {
    cycle.push(cur);
  }
  cycle.push(start);
  cycles.push(cycle.reverse());
}

const dfs = source => {
  color[source] = 1;
  adj[source].forEach(target => {
    if (color[target] === 0) {
			pre[target] = source;
      dfs(target);
    } else if (color[target] === 1) {
      // console.log(target, source)
      buildCycle(target, source);
    }
  });
  color[source] = 2;
}

for (let v = 1; v <= n; v++) {
  if (color[v] === 0) {
    dfs(v);
  }
}

console.log(cycles)

C++ Version:

#include<iostream>
#include<vector>
using namespace std;

const int maxN = 1e5+10;
vector<int> color(maxN, 0) , pre(maxN, 0), adj[maxN];

vector<vector<int> > res;

void build_cycle(int start, int end) {
  vector<int> cycle;
  cycle.push_back(start);
  for (int cur = end; cur != start; cur = pre[cur]) {
    cycle.push_back(cur);
  }
  cycle.push_back(start);

  vector<int> reversedCycle;
  for (int i = cycle.size() - 1; i >= 0; i--) {
    reversedCycle.push_back(cycle[i]);
  }

  res.push_back(reversedCycle);
}

void dfs(int source) {
  color[source] = 1;
  for (int &target: adj[source]) {
    if (color[target] == 0) {
      pre[target] = source;
      dfs(target);
    } else if (color[target] == 1) {
      build_cycle(target, source);
    }
  }
  color[source] = 2;
}

int main() {
  int n, m;
  cin >> n >> m;
  while (m--) {
    int a, b;
    cin >> a >> b;
    adj[a].push_back(b);
  }

  for (int i = 1; i <= n; i++) {
    if (color[i] == 0) {
      dfs(i);
    }
  }

  if (res.size() != 0) {
    for (int i = 0; i < res.size(); i++) {
      cout << res[i].size() << endl;
      for (int j = 0; j < res[i].size(); j++) {
        cout << res[i][j] << " ";
      }
      cout << endl;
    }
  } else {
    cout << "IMPOSSIBLE" << endl;
  }

  return 0;
}

The disjoint set can also partially implement this. As mentioned earlier, for directed graphs, we can handle this specially. Taking this example: [[1,2], [1,3], [2,3]], we first merge [1,2], [2,3], at this point [1,2,3] is already in one set. When merging [2,3], we enter the cycle construction logic: 3 is queued, then find that the ancestor of 2 is 1, queue 1, and since 1 has no ancestor, it does not form a cycle, so we can end the traversal.

#include<iostream>
#include<vector>
#include<cmath>
#include<set>
#include<algorithm>
using namespace std;

const int maxN = 1e5+10;

int pre[maxN];

int pre2[maxN];

vector<vector<int> > res;

// TODO: The one-dimensional array pre2 cannot handle cases where vertices have an in-degree greater than 1; it can be changed to two-dimensional, ~~perform a breadth-first search~~, no, it should be depth-first
void buildCycle(int start, int end) {
  
  vector<int> cycle;
  cycle.push_back(start);
  for (int cur = end; cur != start; cur = pre2[cur]) {
		// For vertices with an in-degree greater than 1, it will also reach here, so need to check if the search process has reached the end; if so, return directly
    if (cur == -1) {
      return;
    }
    cycle.push_back(cur);
  }
  cycle.push_back(start);

  vector<int> reversedCycle;

  for (int i = cycle.size() - 1; i >= 0; i--) {
    reversedCycle.push_back(cycle[i]);
  }

  res.push_back(reversedCycle);
}

int find(int x) {
  // cout << "find" << x << endl;
  return x == pre[x] ? x : pre[x] = find(pre[x]); // Path compression, find and compress simultaneously
}

// 5 7
// 1 5
// 1 2
// 1 3
// 3 4
// 4 2
// 2 5
// 4 1

void _union(int x, int y) {
  if (pre2[y] == -1) pre2[y] = x;

  int fx = find(x), fy = find(y);
  if (fx != fy) {
    pre[fx] = fy;
  }
}

int main() {
  int n, m;
  cin >> n >> m;

  for (int i = 0; i <= n; i++) {
    pre[i] = i;
    pre2[i] = -1;
  }

  while (m--) {
    int a, b;
    cin >> a >> b;
    if (find(a) == find(b) ) {
      buildCycle(b, a);
      continue;
    }
    _union(a, b);
  }

  if (res.size() != 0) {
    for (int i = 0; i < res.size(); i++) {
      cout << res[i].size() << endl;
      for (int j = 0; j < res[i].size(); j++) {
        cout << res[i][j] << " ";
      }
      cout << endl;
    }
  } else {
    cout << "IMPOSSIBLE" << endl;
  }

  return 0;
}

Here, a set of data times out mainly because the problem only requires finding one cycle. My solution has been expanded to meet potential business needs by finding all cycles. In theory, changing it to find just one cycle and then ending should be able to pass the test.

1-5

TODO: Still cannot perfectly handle cases where vertices have an in-degree greater than 1; need to change pre2 to two-dimensional, and traverse all parent nodes of a node. ~~Currently, both breadth-first and depth-first can work.~~

~~Depth-first should be used!!~~ No, breadth-first can actually work.