I am broadly interested about incorporating human factors in security and privacy, and consequently designing usable online services. My recent research focus is on developing systems to provide usable privacy and security mechanisms to online users while minimizing system abuse.
I am always looking for students who are interested in human aspects of privacy/security and like to tinker with systems. If you are already a student at IIT Kgp && if you feel strongly about making the digital world private and secure for the users just drop me a mail. Also it will be beneficial for both of us if you have taken the Usable Security and Privacy course.
Latest updates
May 2024
Our work on understanding how we can leverage reflective learning based videos to educate used about tageted advertising on Facebook got an honourable mention in CHI 2024! Super excited. Congratulations Garett, Sarah, Rhea, Stephanie, Yun-Chieh, Rachel, Josh, Trevor, Brian, Norman, Bart and Xinru!
April 2024 Our work on improving medical document summarizing using a novel method to handle out of vocabulary words is accepted in IJCAI 2024 ! Congratulations Gunjan, Soumyadeep and Niloy!
March 2024 I am honored to receive the inaugural edition of the IITB Trust Lab Early Career Award!
March 2024
Our work on measuring the asymmetry between the online resources available to victims and abusers of Intimate partner surveillance is accepted in Euro S&P 2024! Congratulations Majed, Mazhar, Saptarshi and Rahul!
Feb 2024
Our work on investigating why end-users switch between VPN services (and what features do they want) is accepted in Usenix Security 2024! Congratulations Rohit!
Feb 2024
Our preliminary work on gender bias in journal-politician interaction in Indian Twitter is accepted in WebSci 2024! Congratulations Brisha!
Feb 2024
Our work on understanding how we can leverage reflective learning based videos to educate used about privacy on Facebook is accepted in CHI 2024! Congratulations Garett, Sarah, Rhea, Stephanie, Yun-Chieh, Rachel, Josh, Trevor, Brian, Norman, Bart and Xinru!
Jan 2024
I am serving as PC co-chair for SOUPS 2024 with Dr. Katharina Krombholz! Please consider submitting your research on human-centered Security and Privacy.
Dec 2023
Our preliminary work on understanding if defenses of backdoor attacks works on recent transformer based models is accepted in SPACE 2023! Congratulations Bipab, Rohit and Abir!
Nov 2023
Our work in understanding the mental model of users regarding different cryptocurrencies is accepted in CCS 2023! Congratulations Easwar, Udit, Mohsen and Aniket! In this work we answered why users stil did not use threshold crypto wallets even though they are more secure against certain attacks.
Nov 2023
Our work in understanding the difference of cultural privacy norms when social media users disclose information about their people they know is accepted in CSCW 2023! Congratulations Anju, Scott, Kenneth, Chaz, Nathan, Noah, Nathaniel, Josh, Jaden, Isha, Yao, Nancy and Xinru!
August 2023
Our system MDAP which leverage module dependency for predicting anomalies is accepted in Computer Communications ! Congratulations Harsh and team!
Research Interests
I design, implement and analyze usable private and secure online systems. My work integrates security and privacy, human-computer interaction and systems research.
Specifically, I often start from prominent privacy, security or anti-abuse norms (via examining laws or performing user studies grounded in contextual integrity of other privacy theories), audit online systems via automation if they are following those norms and finally aim to build end-to-end systems which aligns well with the identified norms. Some of my prominent ongoing research projects are below:
Assisting Users to Afford Protection of Data Privacy and Security Regulations
Today, data privacy regulations are being deployed in multiple different jurisdictions. GDPR is enforced in EU and CCPA is already in effect in the US. Other countries have also enforced regulations like LGPD (Brazil) and PIPEDA (Canada). However, it is not clear, if existing systems are helping end users to afford the protections offered by these laws. Our body of (ongoing) work aims to design and build automated smart assistive mechanisms for privacy and security management of user data as identified by legal regulations. The regulations we focus on span from temporal privacy to third-party tracking protection. Some specific focus of our work is:
Improving usability of retrospective access management in online data archives (enabling "Right to be forgotten") [SEC'22][ICWSM'21][NDSS'21][CCS'19][PoPETS'19][SOUPS'18]
We are investigating the effectiveness of tools (e.g., data privacy dashboard, privacy settings) which enable users to retrospectively modify (delete/edit old content or retrospectively change the audience) their past content in online archives (like social media or cloud storage). Our final goal is to design new mechanisms and systems which will let online users better manage the security and privacy of their old content.
Bringing transparency and control to third-party behavior tracking (enabling "informed consent" in cookie-based tracking) [EuroUSEC'22][WebSci'21][CCS'19]
Laws like GDPR in the EU mandated all websites operating in their jurisdiction to obtain users’ informed consent before tracking
those users and collecting their data. Today, this is achieved by showing users cookie consent notices and sometimes showing users the names of the cookies used by a website. However, the cookie consent notices have multiple designs, the cookie names are often intelligible to the users and overall the end-users might not know what data is aggregated about them by correlating data from multiple websites. In this line of research we aim to give control back to users by identifying principles of designing good cookie consent management interfaces as well as informing users about the purpose of the cookies and the data they enable companies to gather about themselves.
Improving Indian Unified Payment Interface (UPI) apps to enabling users fight against financial fraud [SOUPS'22 Poster]
Online payment methods have gained enormous traction in India due to the launch of Unified Payment Interface (UPI), an API developed by the government-based identity National Payments Corporation of India (NPCI), to facilitate free and instant money transfers between users’ bank accounts. Multiple financial apps use this API and often enable money transfer directly from Indian bank accounts via just a click. However, this functionality also gives rise to a flurry of fraudulent transactions, often via social engineering attacks. We are investigating if the UPI app interfaces help to deter or even facilitate financial frauds.
We investigate user behavior in online platforms using large-scale data and via user-studies. We identified that privacy and anonymity is a blessing to most of the user since they enable users to upheld free speech. However, a few users abuse the system (sometimes) under the veil of anonymity and take advantage of the platform in the form of posting abusive content like hatespeech, vaccine-related misinformation or using ad-hominem fallacies to silence opinions. To that end, we work on developing techniques to detect and investigate hate speech in online platforms.
Managing online data privacy and security in Online Social Media Platforms [IJAESAM'17] [IC'17] [SOUPS'16] [USEC'14] [SOUPS'14] [CoNEXT'12] [EuroSys'12]
We developed the model of exposure control (controlling who actually views a piece of online content), an extension of existing access control (controlling who has access to the online content) for building more secure/private systems. We apply this theory of exposure control in multiple real-world scenarios and built systems for end users. These exposure control based systems enabled us to better capture user intention and design more private and usable systems compare to the state of the art.
Publications
Refereed publications (updated till July 2023, please see google scholar for latest publications)
MASCARA: Systematically Generating Memorable And Secure Passphrases
Avirup Mukherjee, Kousshik Murali, Shivam Kumar Jha, Niloy Ganguly, Rahul Chatterjee, Mainack Mondal.
In Proceedings of the 18th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2023). ABSTRACTPDF BIBTEX  
Abstract: Passwords are the most common mechanism for authenticating users online. However, studies have shown that users find it difficult to create and manage secure passwords. To that end, passphrases are often recommended as a usable alternative to passwords, which would potentially be easy to remember and hard to guess. However, as we show, user-chosen passphrases fall short of being secure, while state-of-the-art machine-generated passphrases are difficult to remember.
In this work, we aim to tackle the drawbacks of the systems that generate passphrases for practical use. In particular, we address the problem of generating secure and memorable passphrases and compare them against user chosen passphrases in use. We identify and characterize 72, 999 user-chosen in-use unique English passphrases from prior leaked password databases. Then we lever- age this understanding to create a novel framework for measuring memorability and guessability of passphrases. Utilizing our framework, we design MASCARA, which follows a constrained Markov generation process to create passphrases that optimize for both memorability and guessability. Our evaluation of passphrases shows that MASCARA-generated passphrases are harder to guess than in-use user-generated passphrases, while being easier to remem- ber compared to state-of-the-art machine-generated passphrases. We conduct a two-part user study with crowdsourcing platform Prolific to demonstrate that users have highest memory-recall (and lowest error rate) while using MASCARA passphrases. Moreover, for passphrases of length desired by the users, the recall rate is 60-100% higher for MASCARA-generated passphrases compared to current system-generated ones
Understanding the Impact of Awards on Award Winners and the Community on Reddit
Avinash Tulasi, Mainack Mondal, Arun Balaji Buduru and Ponnurangam Kumaraguru.
In Proceedings of The IEEE/ACM International Conference on Social Networks Analysis and Mining (ASONAM). ABSTRACTPDF  BIBTEX  SHORT PAPER
Abstract: Non-financial incentives in the form of awards often act as a driver of positive reinforcement and elevation of social status in the offline world, as shown in previous studies. The elevated social status results in people becoming more active, aligning to a change in the communities' expectations. However, the impact in terms of longevity of social influence and community acceptance of leaders of these incentives in the form of awards are not well-understood in the online world. To that end, our work aims to shed light on the impact of these awards on the awardee and the community, with Reddit as an experimental testbed. We specifically focus on three large subreddits with a snapshot of 219K posts and 5.8 million comments contributed by 88K Reddit users (Redditors) who received 14,146 awards. Our work establishes that the behaviour of awardees change statistically significantly for a short time after getting an award; however, the change is ephemeral since the awardees return to their pre-award behaviour within days. Additionally, via a user survey, we identified an extremely interesting long-lasting impact of awards---we found that the community's stance softened towards awardees. In fact the comments written by the same users on awardees' posts (months) before and after receiving an award are different enough that in 75\% of the cases the commenters are perceived as different users. We conclude with discussing the implications of our work.
What Cookie Consent Notices Do Users Prefer: A Study In The Wild
Ashutosh Kumar Singh, Nisarg Upadhyaya , Arka Seth, Xuehui Hu, Nishanth Sastry, Mainack Mondal.
In Proceedings of The European Symposium on Usable Security (EuroUSEC). ABSTRACTPDFBIBTEX
Abstract: Laws like GDPR mandated all websites operating in their jurisdiction to obtain users’ informed consent before tracking those users and collecting their data. Today, this is achieved by showing users cookie consent notices. These notices are ubiquitous (often permeating the geographical boundaries of GDPR enforcement), even though their exact user interface (UI) designs vary. These designs are provided by Consent Management Platforms (CMPs) to different websites, effectively resulting in a handful of cookie consent notice designs being shown to a majority of internet users. Naturally, not all designs are uniformly liked by the users. Thus the first step of improving cookie consent notice UI design and moving to a better consent mechanism is to understand whether users prefer one design over another in the wild and why. To answer these questions, in this work, we conduct an in the wild comparative survey with 98 participants from 30 countries, including 16 countries outside the EU. In this within-subjects study, our participants ranked different popular cookie consent UI designs and gave rationale for their choices. Our analysis found that the slider design is statistically significantly better ranked than all other designs. Surprisingly, these UI rankings did not have any correlation with user location but have a weak correlation with factors like experience with the internet. Our further qualitative analysis identifies and unpacks five key design factors which impacted our participants' ranking of consent notice UI designs - ease of use, amount of information, customisability, decision-making time, and clarity/transparency. We conclude this work by discussing the implications of our findings on future cookie consent notice UI designs.
"Dummy Grandpa, do you know anything?": Identifying and Characterizing Ad hominem Fallacy Usage in the Wild
Utkarsh Patel, Animesh Mukherjee, Mainack Mondal.
In Proceedings of The 17th International AAAI Conference on Weblogs and Social Media (ICWSM'23). ABSTRACTPDFBIBTEXMODEL
Abstract: Today, participating in discussions on online forums is extremely commonplace and these discussions have started rendering a strong influence on the overall opinion of online users. Naturally, twisting the flow of the argument can have a strong impact on the minds of na\xc2\xa8\xc4\xb1ve users, which in the long run might have socio-political ramifications, for example, winning an election or spreading targeted misinformation. Thus, these platforms are potentially highly vulnerable to malicious players who might act individually or as a cohort to breed fallacious arguments with a motive to sway public opinion. Ad hominem arguments are one of the most effective forms of such fallacies. Although a simple fallacy, it is effective enough to sway public debates in offline world and can be used as a precursor to shutting down the voice of opposition by slander. In this work, we take a first step in shedding light on the usage of ad hominem fallacies in the wild. First, we build a powerful ad hominem detector based on transformer architecture with high accuracy (F1 more than 83%, showing a significant improvement over prior work), even for datasets for which annotated instances constitute a very small fraction. We then used our detector on 265k arguments collected from the online debate forum \xe2\x80\x93 CreateDebate. Our crowdsourced surveys validate our in-the-wild predictions on CreateDebate data (94% match with manual annotation). Our analysis revealed that a surprising 31.23% of CreateDebate content contains ad hominem fallacy, and a cohort of highly active users post significantly more ad hominem to suppress opposing views. Then, our temporal analysis revealed that ad hominem argument usage increased significantly since the 2016 US Presidential election, not only for topics like Politics, but also for Science and Law. We conclude by discussing important implications of our work to detect and defend against ad hominem fallacies.
A Privacy Paradox? Impact of Privacy Concerns on Willingness to Disclose COVID-19 Health Status in the United States
Kirsten Chapman, Melanie Klimes, Braden Wellman, Garrett Smith, Mainack Mondal, Staci Smith, Yunan Chen, Haijing Hao, Xinru Page.
In Proceedings of the 25th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'22), Virtual Venue, November 2022. ABSTRACTPDF [FORTHCOMING]BIBTEX [FORTHCOMING] POSTER
Abstract:
Privacy concerns around sharing personal health information are frequently cited as hindering COVID-19 contact tracing app adoption. We conducted a nationally representative survey of 304 adults in the United States to investigate their attitudes towards sharing two types of COVID-19 health status (COVID-19 Diagnosis, Exposure to COVID-19) with three different audiences (Anyone, Frequent Contacts, Occasional Contacts). Using the Internet User's Information Privacy Concern (IUIPC) scale, we were able to identify the effect of different types of privacy concerns on sharing this information with various audiences. We found that privacy concerns around data Collection predicted lower willingness to share either type of health status to all of these audiences. However, desire for Control and for Awareness of data practices only affected willingness to share with certain health information and audiences. We discuss the implications of our findings.
A Platform for Uncovering Indian Users' Decision-Making Process in United Payment Interface (UPI) Apps
Kshitiz Sharma, Nandini Bajaj, Xinru Page, Mainack Mondal.
In Proceedings of the 18th Symposium on Usable Privacy and Security (SOUPS'22),Boston, US, August 2022. ABSTRACT  PDFBIBTEX POSTER
Abstract:
Online payment methods have gained enormous traction in India due to the launch of Unified Payment Interface (UPI), an API developed by the government-based identity National Payments Corporation of India (NPCI), to facilitate free and instant money transfers between users’ bank accounts. Multiple financial apps use this API. However, fraud has also increased leading to efforts to make the UPI protocol more secure. Nonetheless, social phishing is still a threat. Our goal is to develop UPI app interface elements that can help users avoid falling prey to social engineering attacks. In order to do so, we developed a UPI app simulator which provides a way to test user interaction with various interface elements, but in an ethical way where we do not collect or expose their personal financial data. In this paper we demonstrate how our simulator can be used to elicit user feedback and help us understand decision-making process of UPI users. This tool will aid us in devising human-centered phishing prevention strategies.
"Others Have the Right to Know": Determinants of Willingness to Share COVID-19-Related Health Symptoms
Kirsten Chapman, Melanie Klimes, Braden Wellman, Garrett Smith, Madeline Bonham, Yunan Chen, Staci Smith, Mainack Mondal, Xinru Page.
In Proceedings of the 18th Symposium on Usable Privacy and Security (SOUPS'22), Boston, US, August 2022. ABSTRACT  PDFBIBTEX POSTER
Abstract:
Consideration for data privacy is a potentially significant factor behind user-apprehension regarding sharing personal medical information (e.g., disease symptoms), even during the COVID-19 pandemic. To that end, in this study, we set out to unpack the extent to which privacy (and related) factors influence people’s perceptions of data sharing. Specifically, we designed and deployed a 304 participant survey with both qualitative and quantitative questions concerning willingness to share medical information with others. Our findings indicate that although an individual might generally feel strongly about maintaining their privacy, in the scope of a global pandemic, they value altruism more, specially when the are in frequent contact with others. Thus a sense of societal duty potentially plays a larger role in determining disclosure of medical information than privacy does in times of COVID-19.
Designing to Fight Pandemics: A Review of Literature and Identifying Design Patterns for COVID-19 Tracing Apps
Isaac Criddle, Amanda Hardy, Garrett Smith, Thomas Ranck, Mainack Mondal, Xinru Page.
In Proceedings of The 24th 24th International Conference on Human-Computer Interaction, HCI International 2022 (HCII'22). ABSTRACTPDFBIBTEX
Abstract: Administering COVID-19 vaccines at a societal scale has been deemed as the most appropriate way to defend against the COVID-19 pandemic. This global vaccination drive naturally fueled a possibility of Pro-Vaxxers and Anti-Vaxxers strongly expressing their supports and concerns regarding the vaccines on social media platforms. Understanding this online discourse is crucial for policy makers. This understanding is likely to impact the success of vaccination drives and might even impact the final outcome of our fight against the pandemic. The goal of this work is to improve this understanding using the lens of Twitter-discourse data. We first develop a classifier that categorizes users according to their vaccine-related stance with high precision (97%). Using this method we detect and investigate specific user-groups who posted about vaccines in pre-COVID and COVID times. Specifically, we identify distinct topics that these users talk about, and investigate how vaccine-related discourse has changed between pre-COVID times and COVID times. Finally, for the first time, we investigate the change of vaccine-related stances in Twitter users and shed light on potential reasons for such changes in stance.
Winds of Change: Impact of COVID-19 on Vaccine-related Opinions of Twitter users
Soham Poddar, Mainack Mondal, Janardan Misra, Niloy Ganguly, Saptarshi Ghosh.
In Proceedings of The 16th International AAAI Conference on Weblogs and Social Media (ICWSM'22). ABSTRACTPDFBIBTEXDATA+CODEPDF [PREPRINT]
Abstract: Administering COVID-19 vaccines at a societal scale has been deemed as the most appropriate way to defend against the COVID-19 pandemic. This global vaccination drive naturally fueled a possibility of Pro-Vaxxers and Anti-Vaxxers strongly expressing their supports and concerns regarding the vaccines on social media platforms. Understanding this online discourse is crucial for policy makers. This understanding is likely to impact the success of vaccination drives and might even impact the final outcome of our fight against the pandemic. The goal of this work is to improve this understanding using the lens of Twitter-discourse data. We first develop a classifier that categorizes users according to their vaccine-related stance with high precision (97%). Using this method we detect and investigate specific user-groups who posted about vaccines in pre-COVID and COVID times. Specifically, we identify distinct topics that these users talk about, and investigate how vaccine-related discourse has changed between pre-COVID times and COVID times. Finally, for the first time, we investigate the change of vaccine-related stances in Twitter users and shed light on potential reasons for such changes in stance.
Understanding and Improving Usability of Data Dashboards for Simplified Privacy Control of Voice Assistant Data
Vandit Sharma, Mainack Mondal.
In Proceedings of the 31st USENIX Security Symposium (USENIX Security'22), Boston, MA, US, August 2022. ABSTRACTPDFPDF [EXTENDED VERSION]BIBTEX
Abstract:
Today, intelligent voice assistant (VA) software like Amazon's Alexa, Google's Voice Assistant (GVA) and Apple's Siri have millions of users. These VAs often collect and analyze huge user data for improving their functionality. However, this collected data may contain sensitive information (e.g., personal voice recordings) that users might not feel comfortable sharing with others and might cause significant privacy concerns. To counter such concerns, service providers like Google present their users with a personal data dashboard (called 'My Activity Dashboard'), allowing them to manage all voice assistant collected data. However, a real-world GVA-data driven understanding of user perceptions and preferences regarding this data (and data dashboards) remained relatively unexplored in prior research.
To that end, in this work we focused on Google Voice Assistant (GVA) users and investigated the perceptions and preferences of GVA users regarding data and dashboard while grounding them in real GVA-collected user data. Specifically, we conducted an 80-participant survey-based user study to collect both generic perceptions regarding GVA usage as well as desired privacy preferences for a stratified sample of their GVA data. We show that most participants had superficial knowledge about the type of data collected by GVA. Worryingly, we found that participants felt uncomfortable sharing a non-trivial 17.7% of GVA-collected data elements with Google. The current My Activity dashboard, although useful, did not help long-time GVA users effectively manage their data privacy. Our real-data-driven study found that showing users even one sensitive data element can significantly improve the usability of data dashboards. To that end, we built a classifier that can detect sensitive data for data dashboard recommendations with a 95% F1-score and shows 76% improvement over baseline models.
Empirical Understanding of Deletion Privacy: Experiences, Expectations, and Measures
Mohsen Minaei, Mainack Mondal, Aniket Kate.
In Proceedings of the 31st USENIX Security Symposium (USENIX Security'22), Boston, MA, US, August 2022. ABSTRACTPDFPDF [EXTENDED VERSION]BIBTEX
Abstract:
In recent years, social platforms are heavily used by individuals to share their thoughts and personal information. However, due to regret over time about posting inappropriate social content, embarrassment, or even life or relationship changes, some past posts might also pose serious privacy concerns for them. To cope with these privacy concerns, social platforms offer deletion mechanisms that allow users to remove their contents. Quite naturally, these deletion mechanisms are really useful for removing past posts as and when needed. However, these same mechanisms also leave the users potentially vulnerable to attacks by adversaries who specifically seek the users’ damaging content and exploit the act of deletion as a strong signal for identifying such content. Unfortunately, today user experiences and contextual expectations regarding such attacks on deletion privacy and deletion privacy in general are not well understood.
To that end, in this paper, we conduct a user survey-based exploration involving 191 participants to unpack their prior deletion experiences, their expectations of deletion privacy, and how effective they find the current deletion mechanisms. We find that more than 80% of the users have deleted at least a social media post, and users self-reported that, on average, around 35% of their deletions happened after a week of posting. While the participants identified the irrelevancy (due to time passing) as the main reason for content removal, most of them believed that deletions indicate that the deleted content includes some damaging information to the owner. Importantly, the participants are significantly more concerned about their deletions being noticed by large-scale data collectors (e.g., a third-party data collecting company or the government) than individuals from their social circle. Finally, the participants felt that popular deletion mechanisms, although very useful to help remove the content in multiple scenarios, are not very effective in protecting the privacy of those deletions. Consequently, they identify design guidelines for improving future deletion mechanisms.
Prioritizing Minimalistic Design: The Negative Impact on Users’ Control over Privacy in Facebook’s Ad Preferences
Rhea Vengurlekar, Sarah Benson, Garrett Smith, Brian Smith, Mainack Mondal, Norman Makoto Su, Xinru Page.
In Proceedings of the 17th Symposium on Usable Privacy and Security (SOUPS'21), Virtual venue, August 2021. ABSTRACT  PDFBIBTEX POSTER
Abstract:
In 2020, Facebook initiated an overhaul of their user interface. Users saw nearly all sections of their Facebook profile "upgraded" to a modern, minimalistic design. The more intrepid user would have also detected drastic changes to the interface for ad preferences, an interface specifically designed to give users control over how Facebook categorizes them for showing targeted advertisements. In this work, we take a first step to understand the impact of these changes in the ad preferences interface on users by conducting a heuristic evaluation. Our analysis reveals that while there were some improvements in usability, overall these changes had a negative impact on the usability of the ad preferences interface. This has implications on the extent to which Facebook users can control their privacy by limiting exposure of their data to third party advertisers.
CCCC: Corralling Cookies into Categories with CookieMonster
Xuehui Hu, Nishanth Sastry, Mainack Mondal.
In Proceedings of The 13th ACM Web Science Conference (WebSci'21). ABSTRACTPDF [PREPRINT]PDFBIBTEX
Abstract: Browser cookies are ubiquitous in the web ecosystem today. Although these cookies were initially introduced to preserve user- specific state in browsers, they have now been used for numerous other purposes, including user profiling and tracking across multiple websites. This paper sets out to understand and quantify the different uses for cookies, and in particular, the extent to which targeting and advertising, performance analytics and other uses which only serve the website and not the user add to overall cookie volumes. We start with 31 million cookies collected in Cookiepedia, which is currently the most comprehensive database of cookies on the Web. Cookiepedia provides a useful four-part categorisation of cookies into strictly necessary, performance, functionality and targeting/advertising cookies, as suggested by the UK International Chamber of Commerce. Unfortunately, we found that, Cookiepe- dia data can categorise less than 22% of the cookies used by Alexa Top20K websites and less than 15% of the cookies set in the browsers of a set of real users. These results point to an acute problem with the coverage of current cookie categorisation techniques.
Consequently, we developed CookieMonster, a novel machine learning-driven framework which can categorise a cookie into one of the aforementioned four categories with more than 94% F1 score and less than 1.5 ms latency. We demonstrate the utility of our framework by classifying cookies in the wild. Our investigation revealed that in Alexa Top20K websites necessary and functional cookies constitute only 13.05% and 9.52% of all cookies respectively. We also apply our framework to quantify the effectiveness of track- ing countermeasures such as privacy legislation and ad blockers. Our results identify a way to significantly improve coverage of cookies classification today as well as identify new patterns in the usage of cookies in the wild.
Perceptions of Retrospective Edits, Changes, and Deletion on Social Media
Günce Su Yılmaz, Fiona Gasaway, Blase Ur, Mainack Mondal.
In Proceedings of The 15th International AAAI Conference on Weblogs and Social Media (ICWSM'21). ABSTRACTPDFBIBTEX
Abstract: Many social media sites permit users to delete, edit, anonymize, or otherwise modify past posts. These mechanisms enable users to protect their privacy, but also to essentially change the past. We investigate perceptions of the necessity and acceptability of these mechanisms. Drawing on boundary-regulation theories of privacy, we first identify how users who reshared or responded to a post could be impacted by its retrospective modification. These mechanisms can cause boundary turbulence by recontextualizing past content and limiting accountability. In contrast, not permitting modification can lessen privacy and perpetuate harms of regrettable content. To understand how users perceive these mechanisms, we conducted 15 semi-structured interviews. Participants deemed retrospective modification crucial for fixing past mistakes. Nonetheless, they worried about the potential for deception through selective changes or removal. Participants were aware retrospective modification impacts others, yet felt these impacts could be minimized through context-aware usage of markers and proactive notifications.
Cloaking Large-Scale Damaging Deletions on Social Platforms
Mohsen Minaei, S Chandra Mouli, Mainack Mondal, Bruno Ribeiro, Aniket Kate.
In Proceedings of Network and Distributed System Security Symposium (NDSS'21). ABSTRACTPDFPDF [PREPRINT]BIBTEX VIDEO
Abstract:
Over-sharing poorly-worded thoughts and personal information is prevalent on online social platforms. In many of these cases, users regret posting such content. To retrospectively rectify these errors in users' sharing decisions, most platforms offer (deletion) mechanisms to withdraw the content, and social media users often utilize them. Ironically and perhaps unfortunately, these deletions make users more susceptible to privacy violations by malicious actors who specifically hunt post deletions at large scale. The reason for such hunting is simple: deleting a post acts as a powerful signal that the post might be damaging to its owner. Today, multiple archival services are already scanning social media for these deleted posts. Moreover, as we demonstrate in this work, powerful machine learning models can detect damaging deletions at scale.
Towards restraining such a global adversary against users' right to be forgotten, we introduce Deceptive Deletion, a decoy mechanism that minimizes the adversarial advantage. Our mechanism injects decoy deletions, hence creating a two-player minmax game between an adversary that seeks to classify damaging content among the deleted posts and a challenger that employs decoy deletions to masquerade real damaging deletions. We formalize the Deceptive Game between the two players, determine conditions under which either the adversary or the challenger provably wins the game, and discuss the scenarios in-between these two extremes. We apply the Deceptive Deletion mechanism to a real-world task on Twitter: hiding damaging tweet deletions. We show that a powerful global adversary can be beaten by a powerful challenger, raising the bar significantly and giving a glimmer of hope in the ability to be really forgotten on social platforms.
Cultural Norms and Interpersonal Relationships: Comparing Disclosure Behaviors on Twitter
Anju Punuru, Tyng-Wen Scott Cheng, Isha Ghosh, Xinru Page, Mainack Mondal.
In Proceedings of the 23rd ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'20), Virtual Venue, October 2020. ABSTRACTPDFBIBTEX POSTER
Abstract:
This study performs an initial exploration of cultural differences in social media disclosure behaviors. We focus on U.S. and India disclosures about interpersonal relationships on Twitter, a popular social networking platform that has gained enormous traction outside the U.S. We developed a taxonomy of words representing interpersonal relationships and then collected all tweets containing these words (~4.5 million tweets) uploaded from India and the U.S. over a one-month period of time. We found that Indian tweets about others tend to be more positive and uncover differences in how they tweet about various relationships (family, friends, others) in comparison to U.S. users. Drawing on theories of collectivism and individualism, we discuss how different cultural attitudes may explain these behaviors. We present implications for research and for designing to support cultural norms.
Anonymity Effects: A Large-Scale Dataset from an Anonymous Social Media Platform Mainack Mondal, Denzil Correa, Fabrício Benevenuto.
In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT'20), Virtual Event, USA, July 2020. ABSTRACTPDFBIBTEX 
Abstract: Today online social media sites function as the medium of expression for billions of users. As a result, aside from conventional social media sites like Facebook and Twitter, platform designers introduced many alternative social media platforms (e.g., 4chan, Whisper, Snapchat, Mastodon) to serve specific userbases. Among these platforms, anonymous social media sites like Whisper and 4chan hold a special place for researchers. Unlike conventional social media sites, posts on anonymous social media sites are not associated with persistent user identities or profiles. Thus, these anonymous social media sites can provide an extremely interesting data-driven lens into the effects of anonymity on online user behavior. However, to the best of our knowledge, currently there are no publicly available datasets to facilitate research efforts on these anonymity effects. To that end, in this paper, we aim to publicly release the first ever large-scale dataset from Whisper, a large anonymous online social media platform. Specifically, our dataset contains 89.8 Million Whisper posts (called “whispers”) published between a 2-year period from June 6, 2014 to June 6, 2016 (when Whisper was quite popular). Each of these whispers contained both post text and associated metadata. The metadata contains information like coarse-grained location of upload and categories of whispers. We also present preliminary descriptive statistics to demonstrate a significant language and categorical diversity in our dataset. We leverage previous work as well as novel analysis to demonstrate that the whispers contain personal emotions and opinions (likely facilitated by a disinhibition complex due to anonymity). Consequently, we envision that our dataset will facilitate novel research ranging from understanding online aggression to detect depression within online populace.
Oh, the Places You've Been! User Reactions to Longitudinal Transparency About Third-Party Web Tracking and Inferencing
Ben Weinshel, Miranda Wei, Mainack Mondal, Euirim Choi, Shawn Shan, Claire Dolin, Michelle L. Mazurek, Blase Ur.
In the Proceedings of the 26th ACM Conference on Computer and Communications Security (CCS) , London, UK, November 2019. ABSTRACTPDFBIBTEX
Abstract:
Internet companies track users' online activity to make inferences about their interests, which are then used to target ads and per- sonalize their web experience. Prior work has shown that existing privacy-protective tools give users only a limited understanding and incomplete picture of online tracking. We present Tracking Trans- parency, a privacy-preserving browser extension that visualizes examples of long-term, longitudinal information that third-party trackers could have inferred from users’ browsing. The extension uses a client-side topic modeling algorithm to categorize pages that users visit and combines this with data about the web trackers encountered over time to create these visualizations. We conduct a longitudinal field study in which 425 participants use one of six variants of our extension for a week. We find that, after using the extension, participants have more accurate perceptions of the extent of tracking and also intend to take privacy-protecting actions.
Moving Beyond Set-It-And-Forget-It Privacy Settings on Social Media Mainack Mondal, Günce Su Yılmaz, Noah Hirsch, Mohammad Taha Khan, Michael Tang, Christopher Tran, Chris Kanich, Blase Ur, Elena Zheleva.
In the Proceedings of the 26th ACM Conference on Computer and Communications Security (CCS) , London, UK, November 2019. ABSTRACTPDFBIBTEX
Abstract:
When users post on social media, they protect their privacy by choosing an access control setting that is rarely revisited. Changes in users' lives and relationships, as well as social media platforms themselves, can cause mismatches between a post’s active privacy setting and the desired setting. The importance of managing this setting combined with the high volume of potential friend-post pairs needing evaluation necessitate a semi-automated approach. We attack this problem through a combination of a user study and the development of automated inference of potentially mis- matched privacy settings. A total of 78 Facebook users reevaluated the privacy settings for five of their Facebook posts, also indicating whether a selection of friends should be able to access each post. They also explained their decision. With this user data, we designed a classifier to identify posts with currently incorrect sharing set- tings. This classifier shows a 317% improvement over a baseline classifier based on friend interaction. We also find that many of the most useful features can be collected without user intervention, and we identify directions for improving the classifier’s accuracy.
Lethe: Conceal Content Deletion from Persistent Observers
Mohsen Minaei, Mainack Mondal, Patrick Loiseau, Krishna Gummadi, and Aniket Kate.
In the Proceedings of Privacy Enhancing Technologies Symposium (PoPETS), Stockholm, Sweden, July 2019. ABSTRACTPDFBIBTEX ArXiv (PRELIMINARY) JOURNAL
Abstract:
Most social platforms offer mechanisms allowing users to delete their posts, and a significant fraction of users exercise this right to be forgotten. However, ironically, users’ attempt to reduce attention to sensitive posts via deletion, in practice, attracts unwanted attention from stalkers specifically to those (deleted) posts. Thus, deletions may leave users more vulnerable to attacks on their privacy in general. Users hoping to make their posts forgotten face a “damned if I do, damned if I don’t” dilemma. Many are shifting towards ephemeral social platform like Snapchat, which will deprive us of important user-data archival. In the form of intermittent withdrawals, we present, Lethe, a novel solution to this problem of (really) forgetting the forgotten. If the next-generation social platforms are willing to give up the uninterrupted availability of non-deleted posts by a very small fraction, Lethe provides privacy to the deleted posts over long durations. In presence of Lethe, an adversarial observer becomes unsure if some posts are permanently deleted or just temporarily withdrawn by Lethe; at the same time, the adversarial observer is overwhelmed by a large number of falsely flagged undeleted posts. To demonstrate the feasibility and performance of Lethe, we analyze large-scale real data about users’ deletion over Twitter and thoroughly investigate how to choose time duration distributions for alternating between temporary withdrawals and resurrections of non-deleted posts. We find a favorable trade-off between privacy, availability and adversarial overhead in different settings for users exercising their right to delete. We show that, even against an ultimate adversary with an uninterrupted access to the entire platform, Lethe offers deletion privacy for up to 3 months from the time of deletion, while maintaining content availability as high as 95% and keeping the adversarial precision to 20%.
Enforcing Contextual Integrity With Exposure Control Mainack Mondal and Blase Ur.
In the Symposium on Applications of Contextual Integrity, Princeton, NJ, USA, September 2018. ABSTRACTPDFBIBTEX
Abstract:
The normative model of contextual integrity (CI) equips individuals to reason about privacy requirements and violations in online systems. However, a subsequent step is the enforcement of CI in online systems via privacy-management mechanisms. In this work, we first investigate the suitability of access control, the dominant privacy management model in online platforms, in filling this role. We argue that access control is insufficient for enforcing CI because it does not consider the set of expected recipients for a piece of content. To that end, we identify the privacy model of exposure control as an extension of access control to better enforce CI. We discuss the effectiveness of exposure control in better enforcing CI and describe a generic prediction-based framework for controlling exposure in online systems.
Making Retrospective Data Management Usable
Noah Hirsch, Chris Kanich, Mohammad Taha Khan, Xuefeng Liu, Mainack Mondal, Michael Tang, Christopher Tran, Blase Ur, William Wang, Günce Su Yılmaz, Elena Zheleva.
In Proceedings of the 14th Symposium on Usable Privacy and Security (SOUPS'18), Baltimore, MD, USA, August 2018. ABSTRACTPDFBIBTEX POSTER
Abstract:
Today, online archives like social media or cloud storage systems store personal data shared by billions of users. For many accounts, these archives accumulate data over multiple years. Recent work suggested that users feel the need for retrospectively managing security and privacy of this huge volume of content. However, there is also a scarcity of mechanisms and systems to help these users retrospectively manage their data. To that end, in this work we point out the need of creating usable retrospective data management mechanisms and outline our vision for a possible architecture to address this challenge.
Characterizing Usage of Explicit Hate Expressions in Social Media Mainack Mondal, Leandro Arau ́jo Silva, Denzil Correa and Fabr ́ıcio Benevenuto.
In New Review of Hypermedia and Multimedia (THAM), vol. 24, no. 2, pp. 110-130, June 2018. ABSTRACT  PDF [PREPRINT] BIBTEX JOURNAL 
Abstract: Social media platforms provide an inexpensive communication medium that allows anyone to publish content and anyone interested in the content can obtain it. However, this same potential of social media provide space for discourses that are harmful to certain groups of people. Examples of these discourses include bullying, offensive content, and hate speech. Out of these discourses hate speech is rapidly recognized as a serious problem by authorities of many countries. In this paper, we provide the first of a kind systematic large-scale measurement and analysis study of explicit expressions of hate speech in online social media. We aim to understand the abundance of hate speech in online social media, the most common hate expressions, the effect of anonymity on hate speech, the sensitivity of hate speech and the most hated groups across regions. In order to achieve our objectives, we gather traces from two social media systems: Whisper and Twitter. We then develop and validate a methodology to identify hate speech on both of these systems. Our results identify hate speech forms and unveil a set of important patterns, providing not only a broader understanding of online hate speech, but also offering directions for detection and prevention approaches.
Draining the Data Swamp: A Similarity-based Approach
Will Brackenbury, Rui Liu, Mainack Mondal, Aaron Elmore, Blase Ur, Kyle Chard, Michael J. Franklin.
In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA), Houston, TX, June 2018. ABSTRACT  PDFBIBTEX
Abstract: While hierarchical namespaces such as filesystems and repositories have long been used to organize data, the rapid increase in data production places increasing strain on users who wish to make use of the data. So called "data lakes" embrace the storage of data in its natural form, integrating and organizing in a Pay-as-you-go fashion. While this model defers the upfront cost of integration, the result is that data is unusable for discovery or analysis until it is processed. Thus, data scientists are forced to spend significant time and energy on mundane tasks such as data discovery, cleaning, integration, and management – when this is neglected, "data lakes" become "data swamps". Prior work suggests that pure computational methods for resolving issues with the data discovery and management components are insufficient. Here, we provide evidence to confirm this hypothesis, showing that methods such as automated file clustering are unable to extract the necessary features from repositories to provide useful information to end-user data scientists, or make effective data management decisions on their behalf. We argue that the combination of frameworks for specifying file similarity and human-in-the-loop interaction is needed to aid automated organization. We propose an initial step here, classifying several dimensions by which items may be considered similar: the data, its origin, and its current characteristics. We initially consider this model in the context of identifying data that can be integrated or managed collectively. We additionally explore how current methods can be used to automate decision making using real-world data repository and file systems, and suggest how an online user study could be developed to further validate this hypothesis.
Managing Longitudinal Exposure of Socially Shared Data on the Twitter Social Media Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P Gummadi, Aniket Kate.
In International Journal of Advances in Engineering Sciences and Applied Mathematics (IJAESAM), vol. 9, no. 4, pp. 238-257, December 2017. ABSTRACTPDF (PREPRINT)BIBTEX JOURNAL
Abstract: On most online social media sites today, user-generated data remains accessible to allowed viewers unless and until the data owner changes her privacy preferences. In this paper, we present a large-scale measurement study focused on understanding how users control the longitudinal exposure of their publicly shared data on social media sites. Our study, using data from Twitter, finds that a significant fraction of users withdraw a surprisingly large percentage of old publicly shared data---more than 28% of six-year old public posts (tweets) on Twitter are not accessible today. The inaccessible tweets are either selectively deleted by users or withdrawn by users when they delete or make their accounts private. We also found a significant problem with the current exposure control mechanisms -- even when a user deletes her tweets or her account, the current mechanisms leave traces of residual activity, i.e., tweets from it other users sent as replies to those deleted tweets or accounts still remain accessible. We show that using this residual information one can recover significant information about the deleted tweets or even characteristics of the deleted accounts. To the best of our knowledge, we are the first to study the information leakage resulting from residual activities of deleted tweets and accounts. Finally, we propose two exposure control mechanisms that eliminates information leakage via residual activities. One of our mechanisms optimize for allowing meaningful social interactions with user posts and another mechanism aims to control longitudinal exposure via anonymization . We discuss the merits and drawbacks of our proposed mechanisms compared to existing mechanisms.
A Measurement Study of Hate Speech in Social Media Mainack Mondal, Leandro Araújo Silva, Fabrício Benevenuto.
In Proceedings of the 25th ACM Conference on Hypertext and Social Media (HT'17), Prague, Czech Republic, July 2017. ABSTRACTPDFBIBTEX TED NELSON AWARD NOMINEE 
Abstract: Social media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a transformative revolution in our society. However, this same potential of social media systems brings together an important challenge---these system provide space for discourses that are harmful to certain groups of people. This challenge manifests itself with a number of variations, including bullying, offensive content, and hate speech. Specifically, authorities of many countries today are rapidly recognizing hate speech as a serious problem, specially because it is hard to create barriers on the Internet to prevent the dissemination of hate across countries or minorities. In this paper, we provide the first of a kind systematic large-scale measurement and analysis study of hate speech in online social media. We aim to understand the abundance of hate speech in online social media, the most common hate expressions, the effect of anonymity on hate speech and the most hated groups across regions. In order to achieve our objectives, we gather traces from two social media systems: Whisper and Twitter. We then develop and validate a methodology to identify hate speech on both of these systems. Our results identify hate speech forms and unveil a set of important patterns, providing not only a broader understanding of online hate speech, but also offering directions for detection and prevention approaches.
Longitudinal Privacy Management in Social Media: The Need for Better Controls Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P. Gummadi and Aniket Kate.
In IEEE Internet Computing, vol. 21, no. 3, pp. 48-55, May-June 2017. ABSTRACTPDF (PREPRINT)BIBTEX JOURNAL
Abstract: This large-scale measurement study of Twitter focuses on understanding how users control the longitudinal exposure of their publicly shared social data — that is, their tweets — and the limitations of currently used control mechanisms. Our study finds that, while Twitter users widely employ longitudinal exposure control mechanisms, they face two fundamental problems. First, even when users delete their data or account, the current mechanisms leave significant traces of residual activity. Second, these mechanisms single out withdrawn tweets or accounts, attracting undesirable attention to them. To address both problems, an inactivity-based withdrawal scheme for improved longitudinal exposure control is explored.
Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P. Gummadi and Aniket Kate.
In Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS'16), Denver, CO, USA, June 2016. ABSTRACTPDFBIBTEX
Abstract: On most online social media sites today, user-generated data remains accessible to allowed viewers unless and until the data owner changes her privacy preferences. In this paper, we present a large-scale measurement study focused on understanding how users control the longitudinal exposure of their publicly shared data on social media sites. Our study, using data from Twitter, finds that a significant fraction of users withdraw a surprisingly large percentage of old publicly shared data -- more than 28% of six-year old public posts (tweets) on Twitter are not accessible today. The inaccessible tweets are either selectively deleted by users or withdrawn by users when they delete or make their accounts private. We also found a significant problem with the current exposure control mechanisms – even when a user deletes her tweets or her account, the current mechanisms leave traces of residual activity, i.e., tweets from other users sent as replies to those deleted tweets or accounts still remain accessible. We show that using this residual information one can recover significant information about the deleted tweets or even characteristics of the deleted accounts. To the best of our knowledge, we are the first to study the information leakage resulting from residual activities of deleted tweets and accounts. Finally, we propose an exposure control mechanism that eliminates information leakage via residual activities, while still allowing meaningful social interactions with user posts. We discuss its merits and drawbacks compared to existing mechanisms.
Analyzing the Targets of Hate in Online Social Media
Leandro Araújo Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto and Ingmer Weber.
In Poster session, 10th International AAAI Conference on Weblogs and Social Media (ICWSM'16), Cologne, Germany, May 2016. ABSTRACTPDF POSTER 
Abstract: Social media systems allow Internet users a congenial platform to freely express their thoughts and opinions. Although this property represents incredible and unique communication opportunities, it also brings along important challenges. Online hate speech is an archetypal example of such challenges. Despite its magnitude and scale, there is a significant gap in understanding the nature of hate speech on social media. In this paper, we provide the first of a kind systematic large-scale measurement study of the main targets of hate speech in online social media. To do that, we gather traces from two social media systems: Whisper and Twitter. We then develop and validate a methodology to identify hate speech on both these systems. Our results identify online hate speech forms and offer a broader understanding of the phenomenon, providing directions for prevention and detection approaches.
The Many Shades of Anonymity: Characterizing Anonymous Social Media Content
Denzil Correa, Leandro Araújo Silva, Mainack Mondal, Fabrício Benevenuto and Krishna P. Gummadi.
In Proceedings of The 9th International AAAI Conference on Weblogs and Social Media (ICWSM'15), Oxford, UK, May 2015. ABSTRACTPDFBIBTEX
Abstract: Recently, there has been a significant increase in the popularity of anonymous social media sites like Whisper and Secret. Unlike traditional social media sites like Facebook and Twitter, posts on anonymous social media sites are not associated with well-defined user identities or profiles. In this study, our goals are two-fold: (i) to understand the nature (sensitivity, types) of content posted on anonymous social media sites and (ii) to investigate the differences between content posted on anonymous and non-anonymous social media sites like Twitter. To this end, we gather and analyze extensive content traces from Whisper (anonymous) and Twitter (non-anonymous) social media sites. We introduce the notion of anonymity sensitivity of a social media post, which captures the extent to which users think the post should be anonymous. We also propose a human annotator-based methodology to measure the same for Whisper and Twitter posts. Our analysis reveals that anonymity sensitivity of most whispers (unlike tweets) is not binary. Instead, most whispers exhibit many shades or different levels of anonymity. We also find that the linguistic differences between whispers and tweets are so significant that we could train automated classifiers to distinguish between them with reasonable accuracy. Our findings shed light on human behavior in anonymous media systems that lack the notion of an identity and they have important implications for the future designs of such systems.
Understanding and Specifying Social Access Control Lists Mainack Mondal, Yabing Liu, Bimal Viswanath, Krishna P. Gummadi and Alan Mislove.
In Proceedings of the 10th Symposium on Usable Privacy and Security (SOUPS'14), Menlo Park, CA, USA, July 2014. ABSTRACTPDFBIBTEX DISTINGUISHED PAPER AWARD
Abstract: Online social network (OSN) users upload millions of pieces of content to share with others every day. While a significant portion of this content is benign (and is typically shared with all friends or all OSN users), there are certain pieces of content that are highly privacy sensitive. Sharing such sensitive content raises significant privacy concerns for users, and it becomes important for the user to protect this content from being exposed to the wrong audience. Today, most OSN services provide fine-grained mechanisms for specifying social access control lists (social ACLs, or SACLs), allowing users to restrict their sensitive content to a select subset of their friends. However, it remains unclear how these SACL mechanisms are used today. To design better privacy management tools for users, we need to first understand the usage and complexity of SACLs specified by users. In this paper, we present the first large-scale study of fine-grained privacy preferences of over 1,000 users on Facebook, providing us with the first ground-truth information on how users specify SACLs on a social networking service. Overall, we find that a surprisingly large fraction (17.6 %) of content is shared with SACLs. However, we also find that the SACL membership shows little correlation with either profile information or social network links; as a result, it is difficult to predict the subset of a user's friends likely to appear in a SACL. On the flip side, we find that SACLs are often re-used, suggesting that simply making recent SACLs available to users is likely to significantly reduce the burden of privacy management on users. <
Beyond Access Control: Managing Online Privacy via Exposure Mainack Mondal, Peter Druschel, Krishna P. Gummadi. and Alan Mislove.
In Proceedings of the Workshop on Usable Security (USEC'14), San Diego, CA, USA, February 2014. ABSTRACTPDFBIBTEX
Abstract: We posit that access control, the dominant model for modeling and managing privacy in today's online world, is fundamentally inadequate. First, with access control, users must a priori specify precisely who can or cannot access information by enumerating users, groups, or roles---a task that is difficult to get right. Second, access control fails to separate who can access information from who actually does, because it ignores the difficulty of finding information. Third, access control does not capture if and how a person who has access to some information redistributes that information. Fourth, access control fails to account for information that can be inferred from other, public information.
We present exposure as an alternate model for information privacy; exposure captures the set of people expected to learn an item of information eventually. We believe the model takes an important step towards enabling users to model and control their privacy effectively.
Deep Twitter Diving: Exploring Topical Groups in Microblogs at Scale
Parantapa Bhattacharya, Saptarshi Ghosh, Juhi Kulshrestha, Mainack Mondal, Muhammad Bilal Zafar, Niloy Ganguly, and Krishna P. Gummadi.
In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14), Baltimore, MD, USA, February 2014. ABSTRACTPDFBIBTEX
Abstract: We present a semantic methodology to identify topical groups in Twitter on a large number of topics, each consisting of users who are experts on or interested in a specific topic. Early studies investigating the nature of Twitter suggest that it is a social media platform consisting of a relatively small section of elite users, producing information on a few popular topics such as media, politics, and music, and the general population consuming it. We show that this characterization ignores a rich set of highly specialized topics, ranging from geology, neurology, to astrophysics and karate -- each being discussed by their own topical groups. We present a detailed characterization of these topical groups based on their network structures and tweeting behaviors. Analyzing these groups on the backdrop of the common identity and bond theory in social sciences shows that these groups exhibit characteristics of topical-identity based groups, rather than social-bond based ones.
Defending against large-scale crawls in online social networks Mainack Mondal, Bimal Viswanath, Allen Clement, Peter Druschel, Krishna P. Gummadi, Alan Mislove and Ansley Post.
In Proceedings of the 8th International Conference on emerging Networking EXperiments and Technologies (CoNEXT'12), Nice, France, December 2012. ABSTRACTPDFBIBTEX Slides/
Abstract: Thwarting large-scale crawls of user profiles in online social networks (OSNs) like Facebook and Renren is in the interest of both the users and the operators of these sites. OSN users wish to maintain control over their personal information, and OSN operators wish to protect their business assets and reputation. Existing rate-limiting techniques are ineffective against crawlers with many accounts, be they fake accounts (also known as Sybils) or compromised accounts of real users obtained on the black market.
We propose Genie, a system that can be deployed by OSN operators to defend against crawlers in large-scale OSNs. Genie exploits the fact that the browsing patterns of honest users and crawlers are very different: even a crawler with access to many accounts needs to make many more profile views per account than an honest user, and view profiles of users that are more distant in the social network. Experiments using real-world data gathered from a popular OSN show that Genie frustrates large-scale crawling while rarely impacting honest users; the few honest users who are affected can recover easily by adding a few friend links.
Simplifying Friendlist Management (Demo Paper)
Yabing Liu, Bimal Viswanath, Mainack Mondal, Krishna P. Gummadi, and Alan Mislove.
In Proceedings of the 21st International World Wide Web Conference (WWW'12), Lyon, France, April 2012. ABSTRACTPDFBIBTEX
Abstract: Online social networks like Facebook allow users to connect, communicate, and share content. The popularity of these services has lead to an {\\em information overload} for their users; the task of simply keeping track of different interactions has become daunting. To reduce this burden, sites like Facebook allows the user to group friends into specific lists, known as {\\em friendlists}, aggregating the interactions and content from all friends in each friendlist. While this approach greatly reduces the burden on the user, it still forces the user to create and populate the friendlists themselves and, worse, makes the user responsible for maintaining the membership of their friendlists over time.
We show that friendlists often have a strong correspondence to the structure of the social network, implying that friendlists may be automatically inferred by leveraging the social network structure. We present a demonstration of Friendlist Manager, a Facebook application that proposes friendlists to the user based on the structure of their local social network, allows the user to tweak the proposed friendlists, and then automatically creates the friendlists for the user.
Canal: Scaling social network-based Sybil tolerance schemes
Bimal Viswanath, Mainack Mondal, Krishna P. Gummadi, Alan Mislove and Ansley Post.
In Proceedings of the 7th European Conference on Computer Systems (EuroSys’12), Bern, Switzerland, April 2012. ABSTRACTPDFBIBTEX
Abstract: There has been a flurry of research on leveraging social networks to defend against multiple identity, or Sybil, attacks. A series of recent works does not try to explicitly identify Sybil identities and, instead, bounds the impact that Sybil identities can have. We call these approaches Sybil tolerance; they have shown to be effective in applications including reputation systems, spam protection, online auctions, and content rating systems. All of these approaches use a social network as a credit network, rendering multiple identities ineffective to an attacker without a commensurate increase in social links to honest users (which are assumed to be hard to obtain). Unfortunately, a hurdle to practical adoption is that Sybil tolerance relies on computationally expensive network analysis, thereby limiting widespread deployment.
To address this problem, we first demonstrate that despite their differences, all proposed Sybil tolerance systems work by conducting payments over credit networks. These payments require max flow computations on a social network graph, and lead to poor scalability. We then present Canal, a system that uses landmark routing-based techniques to efficiently approximate credit payments over large networks. Through an evaluation on real-world data, we show that Canal provides up to a three-order-of-magnitude speedup while maintaining safety and accuracy, even when applied to social networks with millions of nodes and hundreds of millions of edges. Finally, we demonstrate that Canal can be easily plugged into existing Sybil tolerance schemes, enabling them to be deployed in an online fashion in real-world systems.
Limiting Large-scale Crawls of Social Networking Sites Mainack Mondal, Bimal Viswanath, Allen Clement, Peter Druschel, Krishna P. Gummadi, Alan Mislove and Ansley Post.
In Poster session, Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'11),Toronto, Canada, August 2011. ABSTRACTPDF POSTER SIGCOMM'11 STUDENT RESEARCH COMPETITION FINALIST
Abstract: Online social networking sites (OSNs) like Facebook and Orkut contain personal data of millions of users. Many OSNs view this data as a valuable asset that is at the core of their business model. Both OSN users and OSNs have strong incentives to restrict large scale crawls of this data. OSN users want to protect their privacy and OSNs their business interest. Traditional defenses against crawlers involve rate-limiting browsing activity per user account. These defense schemes, however, are vulnerable to Sybil attacks, where a crawler creates a large number of fake user accounts. In this paper, we propose Genie, a system that can be deployed by OSN operators to defend against Sybil crawlers. Genie is based on a simple yet powerful insight: the social network itself can be leveraged to defend against Sybil crawlers. We first present Genie's design and then discuss how Genie can limit crawlers while allowing browsing of user profiles by normal users.
TweLEX: A tweaked version of the LEX stream cipher Mainack Mondal, Avik Chakraborty, Nilanjan Dutta, Debdeep Mukhopadhyay.
In 5th Benelux Workshop on Information and System Security, (WISSec’10), Nijmegen, the Netherlands, November 2010. ABSTRACTPDFSlides
Abstract: LEX is a stream cipher proposed by Alex Biryukov. It was selected to phase 3 of the eSTREAM competition. LEX is based on the Advanced Encryption Standard (AES) block cipher and uses a methodology called Leak Extraction, proposed by Biryukov himself. However, Dunkelman and Keller show that a key recovery attack exists against LEX. Their attack requires 236.3 bytes of keystream produced by the same key and works with a time complexity of 2112 operations. In this work we explore LEX further and have shown that under the assumption of a related key model we can obtain 24 secret state bytes with a time complexity of 296 and a data complexity of 254.3. Subsequently, we introduce a tweaked version of LEX, called TweLEX, which is shown to resist all known attacks against LEX. Though the throughput of TweLEX is half of LEX, it is still 1.25 times faster than AES, the underlying block cipher. This work attempts to revive the principle of leak extraction as a simple and elegant method to design stream ciphers.
Pinpointing Cache Timing Attacks on AES
Chester Rebeiro, Mainack Mondal, Debdeep Mukhopadhyay.
In 23rd International Conference on VLSI design and 9th International Conference on Embedded Systems (VLSID'10),Bangalore, India, January 2010. ABSTRACTPDF
Abstract: The paper analyzes cache-based timing attacks on optimized codes for Advanced Encryption Standard (AES). The work justifies that timing based cache attacks create hits in the first and second rounds of AES, in a manner that the timing variations leak information of the key. To the best of our knowledge, the paper justifies for the first time that these attacks are unable to force hits in the third round and concludes that a similar third round cache timing attack does not work. The paper experimentally verifies that protecting only the first two AES rounds thwarts cache-based timing attacks.
Non-refereed publications
Double-edged Swords: The Good and the Bad of Privacy and Anonymity in Social Media (Invited talk abstract) Mainack Mondal
In Proceedings of the 3rd International Workshop on Social Media World Sensors (SIDEWAYS'17), Prague, Czech Republic, July 2017. PDFBIBTEX
Exploring the design space of social network-based Sybil defenses (Invited paper)
Bimal Viswanath, Mainack Mondal, Allen Clement, Peter Druschel, Krishna P. Gummadi, Alan Mislove and Ansley Post.
In Proceedings of the 4th International Conference on Communication Systems and Networks (COMSNETS'12), Bangalore, India, January 2012. ABSTRACTPDFBIBTEX
Abstract: Recently, there has been significant research interest in leveraging social networks to defend against Sybil attacks. While much of this work may appear similar at first glance, existing social network-based Sybil defense schemes can be divided into two categories: Sybil detection and Sybil tolerance . These two categories of systems both leverage global properties of the underlying social graph, but they rely on different assumptions and provide different guarantees: Sybil detection schemes are application-independent and rely only on the graph structure to identify Sybil identities, while Sybil tolerance schemes rely on application-specific information and leverage the graph structure and transaction history to bound the leverage an attacker can gain from using multiple identities. In this paper, we take a closer look at the design goals, models, assumptions, guarantees, and limitations of both categories of social network-based Sybil defense systems.
Defending against large-scale crawls in online social networks Mainack Mondal, Bimal Viswanath, Allen Clement, Peter Druschel, Krishna P. Gummadi, Alan Mislove and Ansley Post. MPI-SWS Technical Report 2011-006, MPI-SWS, November 2011. ABSTRACTPDFBIBTEX
Abstract: Thwarting large-scale crawls of user profiles in online social networks (OSNs) like Facebook and Renren is in the interest of both the users and the operators of these sites. OSN users wish to maintain control over their personal information, and OSN operators wish to protect their business assets and reputation. Existing rate-limiting techniques are ineffective against crawlers with many accounts, be they fake accounts (also known as Sybils) or compromised accounts of real users obtained on the black market.
We propose Genie, a system that can be deployed by OSN operators to defend against crawlers in large-scale OSNs. Genie exploits the fact that the browsing patterns of honest users and crawlers are very different: even a crawler with access to many accounts needs to make many more profile views per account than an honest user, and view profiles of users that are more distant in the social network. Experiments using real-world data gathered from a popular OSN show that Genie frustrates large-scale crawling while rarely impacting honest users; the few honest users who are affected can recover easily by adding a few friend links.
Our Systems/Datasets
A common theme of our work is to collect real world data from deployed systems and analyze this data to identify and address privacy, security or accountability issues in those systems. Consequently, we created some online systems as part of our research to help social network users better understand and manage their data privacy. Please find below a list of such system and datasets from our work:
Check Your Secondary Digital Footprint on Twitter: In Twitter, people may converse with you by mentioning your name in their tweets. These conversations constitute your secondary digital footprint. Secondary digital footprints are not created or controlled by you. However, they can still leak your personal information. Our Twitter application aims to help you check what information others leak about you on Twitter (You will need a Twitter account to use it ).
Friendlist Manager:Friendlists in Facebook are a great way to share your content with the people you intend to. But they are a huge pain to create and update. Our Facebook application was designed to facilitate and simplify management of your friendlists. Unfortunately, the new version of Facebook API does not allow developers to fetch the data the app needed to use, consequently the app is not live any more. You can check the functions of this (now discontinued) app here.
Privacy IQ: Privacy IQ is a quiz that measures both your understanding of how privacy works on Facebook and your knowledge of your own privacy settings. However due to the change in Facebook API this app too is not live any more.