A novel metric for community analysis
Despite the prevalence of community detection algorithms, relatively less work has been done on understanding whether a network is indeed modular and how resilient the community structure is under perturbations. To address this issue, "Permanence", a new vertex-based metric, is proposed that can quantitatively give an estimate of the community-like structure of the network.
The central idea of permanence is based on the observation that the strength of membership of a vertex to a community depends upon the following two factors: (i) the distribution of external connectivity of the vertex to individual communities and not the total external connectivity, and (ii) the strength of its internal connectivity and not just the total internal edges.
A new greedy approach to detect communities from large newtorks is proposed on maximizing permanence. For a modular network structure, the results of permanence maximization outperforms other popular community detection algorithms.
Permanence mazimization for community detection
We develop a community detection algorithm called "Max Permanence" that identifies communities by maximizing permanence. Our algorithm is a heuristic, that strives to obtain a high value of permanence. In this algorithm, we begin with initializing every vertex to a singleton community. A vertex is moved to a community only if this movement results in a net increase in the number of internal connections and/or a net decrease in the number of external connections to any of the neighboring communities. If such a move is not possible, then either the vertex remains as a singleton (such as in the case where moving to any one of the candidate communities will give equal permanence) or moves to the community where it is more tightly connected with its neighbors (this causes the vertex to have positive permanence). This process is repeated for each vertex and the entire relocation of all vertices is repeated over several iterations until the permanence value converges. However, convergence is not theoretically guaranteed, but we observed that the algorithm converges with high probability.
The code is publicly available. Please send a mail to: Tanmoy Chakraborty (its_tanmoy@yahoo.co.in)
For specific comments concerning the code or if you find any bug, contact Tanmoy Chakraborty
Datasets
Football Network
Football network was proposed by Girvan and Newman (PNAS, 2002) which contains the network of American football games between Division IA colleges during regular season Fall of 2000. The vertices in the graph represent teams (identified by their college names), and edges represent regular-season games between the two teams they connect. Properties: #nodes: 115, #edges: 613, avg. degree=10.57, max degree=12, #communities=12.
Football Network; Football Community; Documentation
Indian Railway Network
Indian Railway network proposed by Ghosh et al. (Acta Physica
Polonica B Proceedings Supplement, 2011) consists of nodes representing stations, where two stations a and b are connected by an edge if there exists at least
one train-route such that both a and b are scheduled halts on that route. The weight of the edge between a and b is the number of train-routes on which both these stations are scheduled halts. We filter out the low-weight edges and then make the resultant network unweighted. We tag each station based on the state in India to which that station belongs. The states act as communities since the number of trains within each state is much higher than the number of trains between two states.
Properties: #nodes: 301, #edges: 1224, avg. degree=16.34, max degree=48, #communities=21.
Railway Network; Railway Community; Documentation
(Please don't forget to cite our paper (Chakraborty et al., SIGKDD, 2014) after using the dataset)
Coauthorship Network
Co-authorship network is derived from the citation dataset developed by Chakraborty et al. (ASONAM, 2013). This dataset contains the metadata (title, author(s), related field(s) of the paper, publication venue, year of publication, references and abstract) of all the papers of computer science published between 1960 to 2009 and archived in DBLP repository. An aggregated undirected coauthorship network is built using this dataset where each node represents an author, and an undirected edge between a pair of authors is drawn if they were co-authors at least once. Since each paper is marked by its related field, this field is assumed as the research area of the author(s) writing
that paper. Therefore, an author may possess more than one area of research interests. This is resolved by tagging each author by the major field on which she has written most of the papers. These fields act as the ground-truth communities.
Properties: #nodes: 103677, #edges: 352183, avg. degree=5.33, max degree=1230, #communities=24.
Coauthorship Network; Coauthorship Community; Documentation
(Please don't forget to cite our paper (Chakraborty et al., SIGKDD, 2014) after using the dataset)
Related Publications
Tanmoy Chakraborty, Sriram Srinivasan, Niloy Ganguly, Animesh Mukherjee, Sanjukta Bhowmick. On the permanence of vertices in network communities, 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York city, August 24 - 27, 2014. (Accepted) (Acceptance rate: 14.6%) ( Paper) ( Supporting Information )
Tanmoy Chakraborty, Sriram Srinivasan, Niloy Ganguly, Sanjukta Bhowmick, Animesh Mukherjee. Constant Communities in Complex Networks, Nature Scientific Reports 3, 1825, 2013. ( Arxiv version) ( Abstract and paper, open access)
Sriram Srinivasan, Tanmoy Chakraborty, Sanjukta Bhowmick. Identifying Base Clusters And Their Application To Maximizing Modularity, Contemporary Mathematics. Graph partitioning and Graph Clustering. (D. A. Bader, H. Meyerhenke, P. Sanders and D. Wagner eds.), AMS-DIMACS, pp. 141-156, 2012.
Stay in touch
- Tanmoy Chakraborty
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Website: http://cse.iitkgp.ac.in/~tanmoyc
- Email:its_tanmoy@yahoo.co.in / its_tanmoy@cse.iitkgp.ernet.in
- Sriram Srinivasan
- Dept. of Computer Science
- University of Nebraska, Omaha, Nebraska 68182
- Email: sriramsrinivas@unomaha.edu
- Niloy Ganguly
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Website: http://www.facweb.iitkgp.ernet.in/~niloy/
- Email:niloy@cse.iitkgp.ernet.in
- Animesh Mukherjee
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Website: http://cse.iitkgp.ac.in/~animeshm
- Email:animeshm@cse.iitkgp.ernet.in
- Sanjukta Bhowmick
- Dept. of Computer Science
- University of Nebraska, Omaha, Nebraska 68182
- Website: http://cs.unomaha.edu/~bhowmick/
- Email: sbhowmick@unomaha.edu