Short-term Course and Workshop on
Abstract:The emergence of online social networks and, with them, the phenomenon of "virality" wherein certain themes or pieces of content (or memes as they are known) spread explosively through the network, has attracted a lot of attention from computer science researchers in the last few years. The central problem in this area can be posed as a simple question: Can we predict which meme will go viral before it has actually gone viral? We began studying this problem by trying to identify metrics that discriminate between viral and nonviral topics and followed up by trying to actually solve the prediction problem. In this talk we will present some of the challenges that we encountered along the way and some of the results we obtained.
Bio:Amitabha Bagchi's research interests include online social networks, network algorithms, wireless networks and data structures. Amitabha earned his PhD (2002) and an MSE (1999) in Computer Science from Johns Hopkins University after having completed his B. Tech in Computer Science and Engineering from IIT Delhi in 1996. He joined IIT Delhi as a member of the faculty of the Department of Computer Science and Engineering in 2005. Amitabha was awarded Yahoo!’s prestigious Faculty Research Engagement Program Award in 2013. In 2012 he received IIT Delhi’s Award for Excellence in Teaching.
Abstract:Large networks are ubiquitous, and yet, estimating the key parameters of large networks is not an easy problem due to various constraints--- either because of their size, or because of the fact that they are not available in their entirety and can only be accessed through some restricted interface (e.g. a rate-limited API). In this talk, we outline sampling based algorithms to estimate a couple of key network properties e.g. the average degree and the size of a sub-population. This is a joint work with Ravi Kumar, Tamas Sarlos and D. Sivakumar.
Bio:Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar and a DST Ramanujan Fellow. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University.
Abstract:In this talk I shall present an overview of our four years long research initiative in citation networks. The investigation is in the context of the computer science domain further sub-divided into its constituent fields (e.g., AI, ALGO, NETW, ML, IR etc.). Out of a number of things, I shall mainly focus on three issues -- (a) the evolutionary landscape of the CS fields and the recent signals of interdisciplinary research; (b) long-time characteristics of the scientific citation profiles and accurate prediction of future citation counts and (c) diversity of research career of CS scientists.
Relevant publications : 1. Chakraborty, T., Kumar, S., Goyal, P., Ganguly, N. and Mukherjee, A. (2015). On the categorization of scientific citation profiles in computer sciences. Communications of the ACM (in press). 2. Chakraborty, T., Tammana, V., Ganguly, N. and Mukherjee, A. (2015). Understanding and Modeling Diverse Scientific Careers of Researchers. Journal of Infometrics , 9 (1),69--78. 3. Chakraborty, T., Kumar, S., Goyal, P., Ganguly, N. and Mukherjee, A. (2014). Towards a Stratified Learning Approach to Predict Future Citation Counts. In JCDL(14) , UK. 4. Chakraborty, T., Sikdar, S., Tammana, V., Ganguly, N. and Mukherjee, A. (2013). Computer Science Fields as Ground-truth Communities: Their Impact, Rise and Fall, In IEEE/ACM ASONAM(13) , Niagra Falls, Canada. 5. Chakraborty, T., Kumar, S., Reddy, M. D., Kumar, S., Ganguly, N. and Mukherjee, A. (2013). Automatic Classification and Analysis of Interdisciplinary Fields in Computer Sciences, In ASE/IEEE Socialcom(13) , Washington DC, USA.
Bio:Animesh Mukherjee is an assistant professor at the Department of Computer Science and Engineering, IIT Kharagpur and a Simons Associate, ICTP, Trieste, Italy. He completed his doctoral degree from the same department and then moved to ISI Foundation, Torino, Italy as a post-doctoral researcher. His areas of interest center around studying various complex sociocultural phenomena (e.g., linguistic ability, scientific productivity) under the lens of statistical physics as well computer science. Dr. Mukherjee received the prestigious Young Scientist award from the Indian Science Congress Association in 2006, the Young Engineer Award from the Indian National Academy of Engineering in 2012 and the Young Scientist Medal from the Indian National Science Academy in 2013. He has authored more than 50 research articles (journals and refereed conferences) and co-edited two books from Birkhauser and a special issue of the Computer Speech and Language Journal. He actively serves as member technical programme committee for various top-notch conferences and as a referee/member editorial board for various journals of repute.
Abstract: Existing research suggest that embedding graphs on a unit sphere can be beneficial in learning labels on the vertices of a graph. However the choice of optimal embedding remains an open issue. Orthonormal representation of graphs, a class of embeddings over the unit sphere, was introduced by Lov\'asz (1979) and was used to derive the famous $\vartheta$ function, an important construct in Algorithmic Graph theory. Recently we discovered an interesting connection between Kernel methods and $\vartheta$ function. In this talk we will review this connection and discuss new results which show that there exist orthonormal representation based classifiers which are statistically consistent over a large class of graphs. Time permitting we will discuss a linear time algorithm for solving the planted clique problem by using SVMs. This is a joint work with R. Shivanna
Bio:Chiranjib Bhattacharyya completed his PhD in 2000 from CSA Department, IISc. Following a postdoctoral stint at UC Berkeley he joined the same department in 2002 as an Assistant professor. Since then he has been working on various aspects of Machine Learning. His current research interests are Kernel methods, Probabilistic modelling using Bayesian Non-parametrics, and Robust optimization.
Abstract: The architecture of both technological and natural complex systems often exhibits hierarchical modularity: simpler modules are used in the construction of increasingly more complex modules. Such architectures can be modelled as a “layered design network'', i.e., a directed, acyclic and layered graph in which the inputs to the system appear at the bottom layer and the ultimate outcomes of the system are produced at the highest layer. Layered design networks are often subject to an evolutionary process in which the structure of the network changes over time. An interesting observation in some of these networks is that the evolutionary process leads to an hourglass-like shape: starting from the bottom layer and moving up the hierarchy, the number of modules at each layer decreases until the “waist'' of the hourglass, followed by an increase until reaching the top layer. The famous Internet protocol stack is an example of such an hourglass structure occurring in a technological context. We propose EvoArch, an abstract model to explain the emergence of the hourglass structure in the Internet protocol stack and to study its evolution in a general framework. The model is based on a few principles about layered network architectures and their evolutions in a competitive environment where protocols acquire value based on their higher layer applications and compete with other protocols at the same layer. An instance of the hourglass effect in a natural system is the genomic hourglass. This property of embryonic development predicts that divergent morphological patterns early and late in embryonic development are separated by the phylotypic stage, i.e., a stage in mid-embryonic development that shows the most morphologically similar patterns across different species. It has also been observed that the phylotypic stage expresses the oldest and most conserved transcriptome across different species. We propose DevoNets, an abstract ``evo-devo'' model to explain the emergence of the hourglass structure considering the underlying hierarchy of gene regulatory interactions as they unfold over time and space. The hourglass effect also exists in other systems such as supply-chain networks, metabolic networks, or the innate immune system. In the final part of this talk, we will also discuss the similarities and differences between the processes that lead to the emergence of the hourglass effect in these very different systems.
Bio:Dr. Constantine Dovrolis is a Professor at the College of Computing of the Georgia Institute of Technology. He received the Computer Engineering degree from the Technical University of Crete in 1995, the M.S. degree from the University of Rochester in 1996, and the Ph.D. degree from the University of Wisconsin-Madison in 2001. He has held visiting positions at Thomson Research in Paris, Simula Research in Oslo, FORTH in Crete, and University of Thessaly in Volos. His current research focuses on cross-disciplinary applications of network science in biology, climate science and neuroscience. He has also worked on the evolution of the Internet, Internet economics, and on applications of network measurement. He received the National Science Foundation CAREER Award in 2003.
Abstract: Localization is an interesting property that arises in many graph mining tasks of large natural networks, such as social networks and technological networks. This property causes the solution to many problems stated on these networks to have a size that is roughly constant regardless of the size of the input network. Localizing arises formally in those graph mining tasks that involve matrix computations, such as PageRank, semi-supervised learning, diffusions, and the like. For instance, one of the most successful community detection methods is based on personalized PageRank vectors. This localization property means that we can design algorithms to exploit this sparsity that run in sublinear or even constant size with respect to the size of the network. We'll survey some of the successes in localized graph mining with a focus on localized methods for graph diffusion. The papers associated with this talk are: 1. Kloster and Gleich, http://arxiv.org/abs/1310.3423 2. Kloster and Gleich, http://arxiv.org/abs/1403.3148 3. Gleich and Mahoney, http://jmlr.org/proceedings/ papers/ v32/gleich14.html
Bio: David Gleich is an assistant professor in the Computer Science Department at Purdue University. His research is on high performance and large scale matrix computations for analyzing data from social networks and scientific simulations. He held the John von Neumann post-doctoral fellowship at Sandia National Laboratories in Livermore CA before joining Purdue in Fall 2011. Since then, he received an NSF CAREER Award in 2011 for work on localized methods for matrix problems, such as the graph diffusions he'll discuss here.
Abstract:Today's social computing systems like Facebook and Twitter have an achilles heel: users of these systems operate behind "weak identities", i.e., identities that can be forged without too much effort. Attackers can easily create forge multiple fake identities and manipulate the functionality of the system. There is mounting evidence that such fake identities are being used to introduce / promote spam content or to manipulate the real popularity of existing users and content on these systems. In this talk, I will discuss approaches to reason about the "strength" or "trustworthiness" of weak identities. Specifically, I will first outline the scenarios where existing defences are fundamentally inadequate and then I will propose new approaches to reason about strength of identities in those scenarios.
Bio: : Krishna Gummadi is a tenured faculty member and head of the Networked Systems research group at the Max Planck Institute for Software Systems (MPI-SWS) in Germany. He received his Ph.D. (2005) and M.S. (2002) degrees in Computer Science and Engineering from the University of Washington. He also holds a B. Tech (2000) degree in Computer Science and Engineering from the Indian Institute of Technology, Madras. Krishna's research interests are in the measurement, analysis, design, and evaluation of complex Internet-scale systems. His current projects focus on understanding and building social Web systems. Specifically, they tackle the challenges associated with protecting the privacy of users sharing personal data, understanding and leveraging word-of-mouth exchanges to spread information virally, and finding relevant and trustworthy sources of information in crowds. Krishna's work on online social networks, Internet access networks, and peer-to-peer systems has led to a number of widely cited papers and award (best) papers at ACM/Usenix's SOUPS, AAAI's ICWSM, Usenix's OSDI, ACM's SIGCOMM IMW, and SPIE's MMCN conferences.
Abstract:Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbours within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives.
Bio:Manuel Gomez Rodriguez is a tenure-track research group leader at Max Planck Institute for Software Systems. Manuel develops machine learning and large-scale data mining methods for the analysis and modelling of large real-world networks and processes that take place over them. He is particularly interested in problems arising in the Web and social media and has received several recognitions for his research, including an Outstanding Paper Award at NIPS'13 and a Best Research Paper Honourable Mention at KDD'10. Manuel holds a PhD in Electrical Engineering from Stanford University and a BS in Electrical Engineering from Carlos III University in Madrid (Spain). You can find more about him at http://www.mpi-sws.org/~manuelgr/
Abstract: In the study of complex systems – such as the brain, cells or our economies – we face conceptual issues of a novel type, because the systems studied involve many variables, many of which are unknown. In addition, their behaviour is not constrained by well-established laws, as in physics. In such high dimensional inference problems one is hardly ever sampling correctly an underlying probability distribution, even with huge data sets. In order to evade the deep under-sampling domain, we implicitly or explicitly resort to dimensionality reduction schemes, where the data is projected into a low-dimensional space where statistics can provide accurate conclusions. Yet, in this process, the data processing inequality tells us that we inevitably lose relevant information on what the system is doing. So understanding which the relevant variables are is crucial in order to limit information losses. We show that the information that a sample contains on the behaviour of the system is quantified by the entropy of the frequency with which different states occur. This allows us to characterize the properties of maximally informative samples: in the under-sampling regime, the most informative frequency size distributions have power law behavior and Zipf's law emerges at the crossover between the under sampled regime and the regime where the sample contains enough statistics to make inference on the behavior of the system. These ideas are illustrated in some applications, showing that they can be used to identify relevant variables or to select most informative representations of data, e.g. in data clustering.
Bio:Matteo Marsili is currently a senior research scientist at the Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy, where he coordinates the activities of the section in Quantitative Life Sciences. He received his Ph.D. in Physics at SISSA, Trieste. He has authored more than 100 scientific publications and two books. He has made important contributions to non-equilibrium statistical physics and to interdisciplinary applications of statistical mechanics to complex systems, and to economics and finance in particular. Coordinator of the programme on Environmental and Ecological Economics (EEE) at ICTP and of several research grants. He is the Scientific Director of Journal of Statistical Mechanics and editor of Journal of Economic Interaction and Coordination and European Journal of Physics B.
Abstract:A common consensus in the analysis of community structure (groups of nodes having dense internal connection ) in large networks is that all the constituent members of a group belong to the community with homogeneous intensity. In other words, the task of community detection is a node-centric binary decision of whether a node is a part of a community or not. However, in practical scenario, the community membership is mostly heterogeneous where few nodes might have high intensity of belongingness in a community, others do not have such intensity. Contrary to the earlier experiments where communities are detected based on the global property of the network, we posit that the belongingness of a node in a community should be a vertex-property, and we should look into how intensely it belongs to that community. To address this issue, we propose a new vertex-based metric called "permanence", that apart from giving the quantitative definition of the membership of a vertex in a community, can estimate how community-like the network structure is. We show that permanence as compared to other standard measures, provides (i) a more accurate estimate of a derived community structure to the ground-truth community and (ii) is more sensitive to perturbations in the network. We further demonstrate that the process of maximizing permanence produces meaningful communities that concur with the ground-truth structure of the networks more accurately. We further deal with a semantic methodology to identify topical groups in Twitter on a large number of topics, each consisting of users who are experts on or interested in a specific topic. This methodology stands in stark contrast to graph based methodologies of identifying communities and is suitable for capturing identity based communities which lack network density. Using our method we show that Twitter has a rich set of topical groups discussing specialized topics ranging from geology, neurology,to astrophysics and karate. We present a detailed characterization of these topical groups based on their network structures and tweeting behaviours. Analyzing these groups on the backdrop of the common identity and bond theory in social sciences shows that these groups exhibit characteristics of topical-identity based groups, rather than social-bond based ones.
Bio:Niloy Ganguly is a professor in the department of computer science and engineering, Indian Institute of Technology Kharagpur. He received his PhD from Bengal Engineering and Science University, Calcutta, India and his Bachelors in Computer Science and Engineering from IIT Kharagpur. He has been a postdoctoral fellow in Technical University of Dresden, Germany. He focuses on investigating several different aspects on online-social networks. He has worked on designing recommendation system based on community structures on various web-social networks like Twitter and Delicious. He has also simultaneously worked on various theoretical issues related to dynamical large networks often termed as complex networks. Specifically he has looked into problems related to percolation, evolution of networks as well as flow of information over these networks. He has been collaborating with various national and international universities and research lab including Duke University, TU Dresden, Germany, MPI PKS and MPI SWS, Germany, Microsoft Lab, India etc. He currently publishes in various top ranking international journals and conferences including CCS, PODC, ICDM, ACL, WWW, INFOCOM, SIGIR, SIGKDD, SIGCHI, Europhysics Letters, Physical Review E, ACM and IEEE Transactions, etc.
Abstract: Harvesting knowledge from web-scale text datasets has emerged as an interesting and promising area of research over the last few years, resulting in the construction of several large knowledge bases (KBs). Multi-relational graphs provide a natural way to organize such harvested knowledge. Moreover, graph-based learning and inference methods have proved to be effective in the construction of such KBs. In this talk, I shall present an overview of my research in this area, with examples and applications of techniques ranging from graph-based semi-supervised learning to random walk inference over knowledge graphs.
Bio: Partha Pratim Talukdar is an Assistant Professor at the Indian Institute of Science (IISc), Bangalore. Before that, he was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University, working with Tom Mitchell on the NELL project. Partha received his PhD (2010) in CIS from the University of Pennsylvania, working under the supervision of Fernando Pereira, Zack Ives, and Mark Liberman. Partha is broadly interested in Machine Learning, Natural Language Processing, Data Integration, and Cognitive Neuroscience, with particular interest in large-scale learning and inference over graphs. His past industrial research affiliations include HP Labs, Google Research, and Microsoft Research. He is a co-author of the book on Graph-based Semi-Supervised Learning published by Morgan Claypool Publishers. More details can be found at http://www.talukdar.net
Abstract: In online shopping, size conventions vary across brands causing a size in one brand to correspond to a different size in another brand. This can lead to a large number of returns in product categories such as Shoes and Apparel. In this talk, I will present a matrix factorization model to recommend size fits for different brands. The model exploits product and user features to handle cold start scenarios involving new products and users. It also employs additional latent variables to capture multiple user personas. We show that leveraging product and user features and incorporating user personas enables our model to provide more accurate fit recommendations.
Bio: Rajeev Ramnarain Rastogi is an Indian computer scientist who graduated from the Indian Institute of Technology Bombay, where he got his Bachelor of Science degree in 1988. He received his Master's and Doctoral degrees from the University of Texas in 1990 and 1993 respectively. He became member of technical staff at its Information Sciences Research Center. Five years later he held the Distinguished Member of Technical Staff position and by 1999 became a director of the Internet Management Research Department and became a Bell Labs fellow in 2003. He was Vice President of Yahoo! Labs in Bangalore and is currently serving as a director of Machine Learning on Amazon.com. He has over 200 peer-reviewed articles with the CURE: An Efficient Clustering Algorithm for Large Databases which received over 2,200 citations since 1998, bringing him an h-index of 56. In 2012, he became a fellow of the Association for Computing Machinery "for contributions to the analysis and management of large data sets.”
Abstract: A fundamental problem in behavioral analysis of human interactions is to understand how communications unfold. In this talk, we will analyze this problem by mining Communication motifs from dynamic interaction networks. A communication motif is a recurring subgraph that has a similar sequence of information flow. Mining communication motifs requires us to explore the exponential subgraph search space where existing techniques fail to scale. To tackle this scalability bottleneck, we will discuss a technique called COMMIT. COMMIT converts a dynamic graph into a database of sequences. Through careful analysis in the sequence space, only a small portion of the exponential search space is accessed to identify regions embedding communication motifs. Extensive experiments on three different social networks show COMMIT to be up to two orders of magnitude faster than baseline techniques. Furthermore, qualitative analysis demonstrate communication motifs to be effective in characterizing the recurring patterns of interactions while also revealing the role that the underlying social network plays in shaping human behavior.
Bio: Sayan Ranu is an Assistant Professor at IIT Madras since January, 2014. His research interests include spatio-temporal data analytics, graph indexing and mining, and bioinformatics. Prior to joining IIT Madras, he was a researcher at IBM Research. He obtained his PhD from University of California, Santa Barbara (UCSB) in March 2012. He was a recipient of the "Distinguished Graduate Research Fellowship" at UCSB. He obtained his Bachelor of Science from Iowa State University, where he received the “President’s top 2% of the class” award.
Abstract:Many social networks are characterized by actors (nodes) holding quantitative opinions about movies, songs, sports, people, colleges, politicians, and so on. These opinions are influenced by network neighbors. Many models have been proposed for such opinion dynamics, but they have some limitations. Most consider the strength of edge influence as fixed. Some model a discrete decision or action on part of each actor, and an edge as causing an “infection” (that is often permanent or self-resolving). Others model edge influence as a stochastic matrix to reuse the mathematics of eigensystems. Actors’ opinions are usually observed globally and synchronously. Analysis usually skirts transient effects and focuses on steady-state behavior. There is very little direct experimental validation of estimated influence models. Here we initiate an investigation into new models that seek to remove these limitations. Our main goal is to estimate, not assume, edge influence strengths from an observed series of opinion values at nodes. We adopt a linear (but not stochastic) influence model. We make no assumptions about system stability or convergence. Further, actors’ opinions may be observed in an asynchronous and incomplete fashion, after missing several time steps when an actor changed its opinion based on neighbors’ influence. We present novel algorithms to estimate edge influence strengths while tackling these aggressively realistic assumptions. Experiments with Reddit, Twitter, and three social games we con- ducted on volunteers establish the promise of our algorithms. Our opinion estimation errors are dramatically smaller than strong base- lines like the DeGroot, flocking, voter, and biased voter models. Our experiments also lend qualitative insights into asynchronous opinion updates and aggregation.
Bio: Sourangshu Bhattacharya completed his B.Tech. in Civil Engineering from I.I.T. Roorkee in 2001 and M.Tech. in Computer Science from I.S.I. Kolkata in 2003. He did his PhD in Computer Science, from Dept. of Computer Science & Automation , Indian Institute of Science, Bangalore in 2008. He was visiting scholar at Helsinki University of Technology from Jan - May 2008. He was a Scientist at Yahoo! Labs from 2008 to 2013. He has published in top tier conferences on various problems in Bioinformatics, Computer Vision and Text Mining. My current interests are Distributed Machine learning, Application of Machine Learning to Influence propagation on Networks.
Abstract:Life and language are discrete combinatorial systems(DCSs) in which the basic building blocks are finite sets of elementary units: nucleotides or codons in a DNA sequence and letters or words in a language.Different combinations of these finite units give rise to potentially infinite numbers of genes or sentences.This type of DCS can be represented as an alphabetic Bipartite Network(αBiN)where there are two kinds of nodes;one type represents the elementary units while the other type represents their combinations.There is an edge between a node corresponding to an elementary unit u and a node corresponding to a particular combination v if u is present in v naturally,the partition consisting of the nodes representing elementary units is fixed, while the other partition is allowed to grow unboundedly. The evolution equations for α BiN under different growth rules are derived and the corresponding degree distributions computed.It is shown that asymptotically the degree distribution of BiN can be described as a family of beta distributions.The one mode projections of the αBiN and their asymptotic is also studied theoretically and further supported through simulation.
Bio:Tyll is a professor at Wroclaw University of Technology. He was Scientific Researcher in the Department for Innovative Methods of Computing, Technical University Dresden,Germany.He has received his Ph.D. in Physics from the University at Bielefeld.He has worked as a researcher at several institutions,including the Technical University of Berlin,PICB Shanghai and MaxPlanck Institute for Mathematics.His research interests include complex networks and random graphs,stochastic processes on networks,and mathematical models in biology and medicine.