A simulated annealing based circle detection approach
The availability of an overwhelmingly large amount of bibliographic information including citation and co-authorship data makes it imperative to have a systematic approach that will enable an author to organize her own personal academic network profitably. An effective method could be to have one’s co-authorship network arranged into a set of “circles”, which has been a recent practice for organizing relationships (e.g., friendship) in many online social networks.
An unsupervised approach is proposed to automatically detect circles in an ego network such that each circle represents a densely knit community of researchers. The model uses an unsupervised method which combines a variety of node features and node similarity measures.
The model is built from a rich co-authorship network data of more than 8 hundred thousand authors. Experimental results show that including the circle information detected by our model improves link prediction performance of a number of state of the art link prediction mechanisms including Supervised Random Walks.
On the Formation of Circles in Co-Authorship Networks
We develop a circle detection algorithm that uses simulated annealing to detect circles in ego-centric co-authorship networks. Our algorithm uses a rich set of node-based as well as edge-based features to identify circles as densely-knit communities with similar author nodes. The algorithm begins by initializing each node to a singleton circle and then iterates by slightly disturbing the circle membership of each node in each iteration. the disturbance invloves removing a node from a few circles and introducing it to some new circles. After each iteration, each circle is judged on its homogenity and circles with homogenity values less than a threshold are discarded. After each unsupervised iteration, the circle set is evaluated using a maximum likelihood estimate. The algorithm terminates when a local maxima is reached and the likelihood value does not increase even after sufficiently many iterations.
The code is publicly available here.
For specific comments concerning the code or if you find any bug, contact Tanmoy Chakraborty / Sikhar Patranabis.
Dataset
Coauthorship Network
We have crawled one of the largest publicly available data-
sets from Microsoft Academic Search (MAS) which houses over 4.1 million publications and 2.7 million authors. We collected all the papers specifically published in the computer science domain and indexed by MAS. The crawled dataset contains more than 2 million distinct papers by more than 8 hundred thousand authors, which are further distributed over 24 fields of computer science domain. The co-authorship network constructed from this dataset has authors as nodes and edges between authors who have written at least one paper together.
Coauthorship Network
(Please don't forget to cite our paper (Chakraborty et al., SIGKDD, 2015) after using the dataset)
Related Publications
Tanmoy Chakraborty, Sikhar Patranabis, Animesh Mukherjee, Pawan Goyal. On the Formation of Circles in Co-authorship Networks, 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,Sydney, August 10-13, 2015. (Accepted) (Acceptance rate: 19.41% ) ( Paper)
Stay in touch
- Tanmoy Chakraborty
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Website: http://cse.iitkgp.ac.in/~tanmoyc
- Email:its_tanmoy@yahoo.co.in / its_tanmoy@cse.iitkgp.ernet.in
- Sikhar Patranabis
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Email:sikharpatranabis@gmail.com
- Animesh Mukherjee
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Website: http://cse.iitkgp.ac.in/~animeshm
- Email:animeshm@cse.iitkgp.ernet.in
- Pawan Goyal
- Dept. of Computer Science & Engineering
- Indian Institute of Technology Kharagpur, India - 721302
- Website: http://cse.iitkgp.ac.in/~pawang
- Email:pawang@cse.iitkgp.ernet.in