In this article I’d like to tell you a little bit about graphs and how you can
search a graph using the BFS (breadth first search) algorithm.
I’ll address, and hopefully answer, the following questions:
• what is a graph?
• how can a graph be represented as an ADT?
• how can we search/walk through a graph using the BFS algorithm?
What is a graph?
A graph is a (data) structure consisting of a set of vertices and a collection
of pairs of vertices from that set. These pairs denote how the vertices from
the graph are connected to each other.
Think of Hollywood’s film industry as a graph: there are movies made which are
produced by actors (for simplicity, I left out actresses, directors, producers,
etc. which are also a part of a movie, of course!). The movies and actors are
the vertices from that graph and the relation from an actor to a movie is
called an edge.
Here’s an example of a small graph where the Roman numerals are movies and the
lower-case alphabetic characters are the actors:
Such a graph is called an unweighted, undirected graph. Undirected means there
is a relation (edge) from an actor to a movie and that same relation goes back.
Think of it as a two-way street, you can go both ways. And unweighted means that
it doesn’t "cost" anything to go from one vertex to the other.
A nice example of a weighted, bidirected (both un- and directed) graph is a
street network. The crossings (and dead-ends) are the vertices and the roads
connecting those vertices are the edges. It is bidirected because there are
two- and one-way streets, and it is weighted because there’s a distance between
the vertices.
Well that’s all very nice, I hear you thinking, but …
How can a graph be represented as an ADT?
As been stated in the previous paragraph: the vertices in a graph are unique.
And each vertex "points to" a list of vertices which it is connected to. So,
the java.util.Map interface is an ideal structure for such a thing. For those
of you that don't know what it is, a java.util.Map is an ADT from the standard
JDK which holds a unique key and maps a value to that key. For details, please
see the API documentation:
http://java.sun.com/j2se/1.5.0/docs/.../util/Map.html
And to store a collection of edges, we simply use a java.util.List. This kind
of representation of a graph is called an adjacency list.
What else do we need in our Graph class? To begin with, we will need to build
the following functionality:
• we need to be able to add and connect vertices;
• a way to parse a text file that holds the data of our graph;
• perhaps to override the toString() method to check if the graph is properly built;
and, for our BFS algorithm in the next section, we will need to be able to:
• get the edges of a specific vertex;
• check if a certain vertex is connected to another vertex.
Here’s an implementation of the Graph class so far:
Expand|Select|Wrap|Line Numbers
- import java.io.*;
- import java.util.*;
- public class Graph {
- private Map<String, List<String>> adjacencyList;
- /**
- * Instatiates the 'adjacencyList' and then parse the data file.
- */
- public Graph(String fileName) throws FileNotFoundException {
- adjacencyList = new HashMap<String, List<String>>();
- parseDataFile(fileName);
- }
- /**
- * This is an undirected graph, so we connect 'vertexA' to 'vertexB'
- * and the other way around.
- */
- public void addConnection(String vertexA, String vertexB) {
- connect(vertexA, vertexB);
- connect(vertexB, vertexA);
- }
- /**
- * A private helper-method to connect 'vertexA' to 'vertexB'.
- * If 'vertexA' alreay exists in our 'adjacencyList', get it's
- * edges-list and add 'vertexB' to it. If it doesn't exist,
- * create a new ArrayList, add 'vertexB' to it and put it all
- * in our 'adjacencyList'.
- */
- private void connect(String vertexA, String vertexB) {
- List<String> edges;
- if(adjacencyList.containsKey(vertexA)) {
- edges = adjacencyList.get(vertexA);
- edges.add(vertexB);
- } else {
- edges = new ArrayList<String>();
- edges.add(vertexB);
- this.adjacencyList.put(vertexA, edges);
- }
- }
- /**
- * Returns true iff 'vertexA' poits to to 'vertexB'.
- * Note that since this is an undirected graph, we do not
- * need to check the other way around, the case if 'vertexB'
- * is points to 'vertexA'.
- */
- public boolean isConnectedTo(String vertexA, String vertexB) {
- List<String> edges = getEdges(vertexA);
- return edges.contains(vertexB);
- }
- /**
- * Returns all the edges of a certain vertex, or throws an
- * exception if the vertex doesn't exist in this graph.
- */
- public List<String> getEdges(String vertex) {
- List<String> edges = adjacencyList.get(vertex);
- if(edges == null) {
- throw new RuntimeException(vertex+" not present in the graph.");
- }
- return edges;
- }
- /**
- * Reads a text file with the graph-data. The text file contains
- * N-blocks of lines where each block starts with the movie followed
- * by N-lines of text representing the actors and ending with an
- * empty line.
- */
- private void parseDataFile(String fileName) throws FileNotFoundException {
- Scanner file = new Scanner(new File(fileName));
- while(file.hasNextLine()) {
- String movie = file.nextLine().trim();
- while(file.hasNextLine()) {
- String actor = file.nextLine().trim();
- if(actor.length() == 0) break;
- addConnection(movie, actor);
- }
- }
- }
- /**
- * A Sting representation if this Graph.
- */
- public String toString() {
- StringBuilder builder = new StringBuilder();
- Iterator<String> vertices = adjacencyList.keySet().iterator();
- while(vertices.hasNext()) {
- String vertex = vertices.next();
- List<String> edges = adjacencyList.get(vertex);
- builder.append(vertex);
- builder.append(" --> ");
- builder.append(edges);
- builder.append('\n');
- }
- return builder.toString();
- }
- /**
- * main
- */
- public static void main(String[] args) {
- try {
- Graph graph = new Graph("data.txt");
- System.out.println(graph);
- } catch (FileNotFoundException e) {
- e.printStackTrace();
- }
- }
- }
Expand|Select|Wrap|Line Numbers
- I
- a
- b
- c
- d
- e
- II
- c
- h
- q
- r
- III
- h
- o
- p
- IV
- e
- f
- g
- V
- c
- g
- j
- VI
- k
- m
- n
- o
we're ready to do some brute-force searching on our graph.
How can we search/walk through a graph using the BFS algorithm?
First of all: what is BFS anyway? BFS is a brute, and rather simple way of
traversing your graph starting at a given vertex. It divides the graph in
several levels. The starting point of a BFS is called level 0, it's direct
edges level 1, the edges of it's edges level 2, and so on. You could compare
it when you throw a stone in a pool of water: the starting vertex is where
the stone hits the water and the ripples from the impact are "discovering"
unknown territory. So, this algorithm can be used to find a shortest path
between two vertices. By a shortest path in this case I mean the path from
one vertex to another while traversing the least possible number of edges.
Note that I said "in this case", because in the case of a weighted graph, the
shortest path is not necessarily the one with the least edges: one direct road
between two vertices of a length of 10 miles, is longer than two roads with a
length of 4 miles, of course.
Let's say would like to find the shortest path between vertices g and
n. As you can see there are two possible paths between them:
Expand|Select|Wrap|Line Numbers
- g -> IV -> e -> I -> c -> II -> h -> III -> o -> VI -> n
Expand|Select|Wrap|Line Numbers
- g -> V -> c -> II -> h -> III -> o -> VI -> n
shortest.
To keep things orderly, I made a separate class which creates a graph from a
text file and finds a shortest path between two vertices.
The first thing that happens is that during the BFS traversal a 'container' is
being created of newly discovered vertices where the first vertex (the starting
point) is on index 0 (level 0), it's edges at index 1 (level 1), and so on.
So, after discovering our end vertex, that 'container' will look like this:
Expand|Select|Wrap|Line Numbers
- level 0 = g
- level 1 = IV, V
- level 2 = e, f, c, j
- level 3 = I, II
- level 4 = a, b, d, h, q, r
- level 5 = III
- level 6 = o, p
- level 7 = VI
- level 8 = n
("g") using our areConnected(String, String) method from the Graph class
to see if two vertices are connected. That way we end up with the shortest
path from "g" to "n".
Expand|Select|Wrap|Line Numbers
- import java.io.*;
- import java.util.*;
- public class BFSAlgorithm {
- private Graph graph;
- /**
- * Constructor.
- */
- public BFSAlgorithm(Graph g) {
- graph = g;
- }
- /**
- * 1 - Create a stack to store all the vertices of our path on.
- * 2 - First push the 'end' vertex on our stack.
- * 3 - Now loop from the highest level back to the first level and
- * a. loop through each level and
- * b. check each vertex in that level if it's connected to the
- * vertex on the top of our stack, if we find a match, push that
- * match on the stack and break out of the loop.
- * 4 - Now we only need to reverse the collection (stack) before returning
- * the path in the "correct" order (from start to finish).
- *
- * Here's an example ASCII drawing of backtracking from the end vertex "n"
- * to the starting vertex "g". The arrows, <-, denote the path that was
- * found.
- *
- * level: 0 1 2 3 4 5 6 7 8
- * ---------------------------------------------------------
- * g <-+ IV e I a +- III <- o <- VI <- n
- * +- V <-+ f +- II <-+ b | p
- * +- c <-+ | d |
- * j +- h <-+
- * q
- * r
- */
- private List<String> backTrack(List<List<String>> container, String end) {
- Stack<String> path = new Stack<String>(); // 1
- path.push(end); // 2
- for(int i = container.size()-1; i >= 0; i--) { // 3
- List<String> level = container.get(i);
- String last = path.peek();
- for(String s : level) { // a
- if(graph.areConnected(last, s)) { // b
- path.push(s);
- break;
- }
- }
- }
- Collections.reverse(path); // 4
- return path;
- }
- /**
- * 1 - Get the level from the 'container' which was added last.
- * 2 - Create a new level to store (possible) unexplored verices in.
- * 3 - Loop through each of the vertices from the last added level, and
- * a. get the neighboring vertices connected to each vertex,
- * b. loop through each connecting vertex and if the connecting vertex
- * has not yet been visited,
- * c. only then add it to the newly created level-list and mark it as
- * visited in our set.
- * 4 - We don't need to search any further if we stumble upon the 'end'
- * vertex. In that case, just "return" from this method.
- * 5 - Only make the recursive call if we have found vertices which have
- * not been explored yet.
- */
- private void bfs(List<List<String>> container,
- String end, Set<String> visited) {
- List<String> lastLevel = container.get(container.size()-1); // 1
- List<String> newLevel = new ArrayList<String>(); // 2
- for(String vertex : lastLevel) { // 3
- List<String> edges = graph.getEdges(vertex); // a
- for(String edge : edges) { // b
- if(!visited.contains(edge)) { // c
- visited.add(edge);
- newLevel.add(edge);
- }
- if(edge.equals(end)) return; // 4
- }
- }
- if(newLevel.size() > 0) { // 5
- container.add(newLevel);
- bfs(container, end, visited);
- }
- }
- /**
- * 1 - Create an empty 'container' to store all the levels from the
- * 'start'-vertex in. A level is also a list with one or more vertices.
- * 2 - The first level only holds the 'start' vertex, which is added first,
- * this is the 'init' list.
- * 3 - The 'start' vertex is also stored in a Set which keeps track of all
- * the vertices we have encountered so that we don't traverse vertices
- * twice (or more).
- * 4 - Once we initialized the steps 1-3, we can call the actual BFS-
- * algorithm.
- * 5 - Once the BFS-algorithm is done, we call the backTrack(...) method
- * to find the shortest path between 'end' and 'start' between the
- * explored levels of the graph.
- */
- public List<String> getShortestPath(String start, String end) {
- List<List<String>> container = new ArrayList<List<String>>(); // 1
- List<String> init = new ArrayList<String>(); // 2
- init.add(start);
- container.add(init);
- Set<String> visited = new HashSet<String>(); // 3
- visited.add(start);
- bfs(container, end, visited); // 4
- return backTrack(container, end); // 5
- }
- /**
- * Main method:
- * 1 - Create a Graph.
- * 2 - Get a shortest path between two vertices.
- * 3 - Print the shortest path.
- */
- public static void main(String[] args) throws FileNotFoundException {
- Graph graph = new Graph("data.txt"); // 1
- BFSAlgorithm bfsAlgorithm = new BFSAlgorithm(graph);
- List<String> shortestPath =
- bfsAlgorithm.getShortestPath("g", "n"); // 2
- for(int i = 0; i < shortestPath.size(); i++) {
- System.out.print(shortestPath.get(i)); // 3
- System.out.print(i < shortestPath.size()-1 ? "\n -> " : "\n");
- }
- }
- }
you too much. For those who want to experiment some more: I made a small
selection of movies and actors (+/- 70000 vertices) from the IMDB's data files*
which you can experiment with to see how the actors Ron Jeremy and
Kevin Bacon are connected (are they even connected???). You can find that file here:
http://www.iruimte.nl/graph/imdb.txt which can be parsed in the same way as
the text file above.
Regards,
Bart (prometheuzz)
* All data files can be found here: ftp://ftp.sunet.se/pub/tv+movies/imdb/