Consider a very large undirected graph of Email Networks. The nodes (or vertices) represent email addresses, and an edge represents the fact that there was at least one email in at least one direction between the two addresses.

Question

Consider a very large undirected graph of Email Networks. The nodes (or vertices) represent email addresses, and an edge represents the fact that there was at least one email in at least one direction between the two addresses.

Which technique will you use for representation of the above mentioned Graph in a computer program for its manipulation?

Answer 1

Hints:

Properties of the given dataset:

1. Could have a large number of nodes.
2. Generally very large and sparce.
3. Requires relatively efficient search algorithm given a node.
4. In general, contains cycles.

Since cycles are present, we can safely eliminate trees. Lists are not useful since the number of branches is not constant. Consider matrix (each column/row is a node, intersection determines edges) but matrices are not space efficient.
Hash tables should provide an efficient storage, as well as rapid searches.

Answer 2

I would use 2 flat tables.

Address (AddressID*, Email)

Connection (Address1*, Address2*)

Address1 will be the ID number that is numerically lower of the pair.

Answer 3

In order to represent the email network graph in a computer program, there are several techniques you can consider. Here are some commonly used techniques:

1. Adjacency Matrix: This is a two-dimensional array where the rows and columns represent the email addresses, and the values in the matrix indicate whether an edge exists between two addresses. This representation is efficient for dense graphs but can be memory-intensive for sparse graphs.

2. Adjacency List: This representation uses a dictionary or an array where each email address is associated with a list of its neighboring addresses. This approach is memory-efficient, especially for sparse graphs, but may require extra time for searching or checking for the existence of an edge.

3. Edge List: This approach creates a list or an array that contains all the edges in the graph. Each entry in the list represents an edge, and it includes the source and destination email addresses. This representation is simple, but it can be less efficient for certain graph operations.

Your choice of representation depends on the specific requirements of your program. If memory efficiency is a concern and the graph is sparse, you may prefer using the adjacency list. If you need quick access to determine whether an edge exists, the adjacency matrix could be a better choice. The edge list is suitable if you primarily need to iterate over all edges.

Consider the characteristics of your email network graph, such as size, density, and the tasks you need to perform, to determine which technique will suit your program the best.