Contact | Pawan Goyal

Pawan Goyal

Associate Professor

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur, India -- 721302

Phone: +91-3222282370 (Office)

Email: pawang AT cse DOT iitkgp DOT ac DOT in

My Google Scholar page

My CV

Brief Bio

I joined the Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur as an Assistant Professor on July 30th, 2013. Prior to that, I was working at INRIA Paris-Rocquencourt as a post doctoral fellow with Prof. Gérard Huet on The Sanskrit Heritage Site.

I did my B. Tech. in Electrical Engineering from Indian Institute of Technology, Kanpur. I received my Ph. D. from Intelligent Systems Research Centre, Faculty of Computing and Engineering, University of Ulster, UK. My PhD advisors were Prof. Laxmidhar Behera and Prof. T. M. McGinnity. The topic of my PhD dissertation was "Analytic Knowledge Discovery Techniques for Ad-Hoc information Retrieval and Text Summarization".

My main research interests include Text Mining, Natural Language Processing, Information Retrieval and Sanskrit Computational Linguistics.

Professional Activities

Senior Area Chair: EMNLP 2025, AACL-IJCNL 2025

Academic Co-ordinator for the ACM India Summer School on Generative AI for Text at IIT Gandhinagar from June 24th - July 5th, 2024.

Senior Area Chair: EMNLP 2024

PC Chair: BDA 2023, IndoML 2023

We organized sixth international conference on Sanskrit Computational Linguistics from October 23-25, 2019 at IIT Kharagpur. For more details, please visit this link.

Senior PC / Meta-Reviewer / Area Chair: ACL ARR 2024, EMNLP 2023, AAAI 2023, AAAI 2021

News

May 16th, 2025: Our paper, "Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification" got accepted in ACL Findings'25.

Jan 23rd, 2025: Our paper, "Periodic Materials Generation using Text-Guided Joint Diffusion Model" got accepted in ICLR'25.

Dec 29th, 2024: I will be offering a new course on NPTEL, "Deep Learning for Natural Language Processing". Course Website

Sep 20th, 2024: 3 main papers and a Finding paper got accepted in EMNLP'24.

June 11th, 2024: Abhilash Nandy received the Microsoft Research India PhD Award.

June 11th, 2024: Our paper, "FastDoc: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy" got accepted in TMLR.

May 16th, 2024: Our papers, "IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning" (long paper, main) and "On The Persona-based Summarization of Domain-Specific Documents" (shot paper, Findings) got accepted in ACL'24.

March 26th, 2024: Our papers, "Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts" (short paper) and "Legal Statute Identification: A Case Study using State-of-the-Art Datasets and Methods" (Resource and Reproducibility track) got accepted in SIGIR'24.

March 13th, 2024: Our papers, "Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling" and "Order-Based Pre-training Strategies for Procedural Text Understanding" got accepted in NAACL'24 (main conference).

Februay 20th, 2024: Our paper, "How Robust are the QA Models for Hybrid Scientific Tabular Data? A Study using Customized Dataset" got accepted in LREC-Coling'24 (short).

October 8th, 2023: 4 long papers accepted in EMNLP'23 (Findings).

September 25th, 2023: Our proposal, "A Human-Aligned Automated Evaluation Framework for Natural Language Generation via Large Language Models" got selected for the Microsoft Accelerate Foundation Models Research Program.

June 2023: I will be serving as an area chair for EMNLP'23.

June 2023: I will be serving as PC co-chair for IndoML'23 in IIT Bombay along with Prof. Abir De and Prof. Sunita Sarawagi.

May, 2023: I will be serving as PC co-chair for Big Data Analytics (BDA)'23 in IIIT Delhi along with Prof. Sourav Bhowmick, NTU.

May 8th, 2023: Our paper, "SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes" got accepted in ACL'23 (System Demonstrations).

May 8th, 2023: Our paper, "CrysMMNet: Multimodal Representation for Crystal Property Prediction" got accepted in UAI'23.

May 2nd, 2023: Our paper, "Financial Numeric Extreme Labelling: A dataset and benchmarking" got accepted in Findings of ACL'23 (short).

January 22nd, 2023: Our papers, "Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages" (Long paper, findings) and "Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing" (Short paper, main) got accepted in EACL'23.

November 19th, 2022: Our paper, "CrysGNN : Distilling pre-trained knowledge to enhance property prediction for crystalline materials" got accepted in AAAI'23.

October 6th, 2022: Our papers, "TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizert" (Long paper, findings) and "ECTSum: A New Benchmark Dataset for Bullet Point Summarization of Long Earnings Call Transcripts" (Long paper, main) got accepted in EMNLP'22.

September 21st, 2022: Our papers, "Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation" (Long paper, main) and "ArgGen: Prompting Text Generation Models for Document-Level Event-Argument Aggregation" (short paper, findings) got accepted in AACL'22.

August 17th, 2022: Our papers, "A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit" (Long paper, Oral presentation) and "Does Meta-learning Help mBERT for Few-shot Question Generation in a Cross-lingual Transfer Setting for Indic Languages?" (Short paper, poster presentation) got accepted in Coling'22.

August 16th, 2022: Received Faculty Excellence Award (Associate Level) from IIT Kharagpur.

June 15th, 2022: Our paper, "Linguistically Informed Post-processing for ASR Error correction in Sanskrit" got accepted in Interspeech.

May 30th, 2022: Our proposal, " Adapting Dialog Systems to New Domains through Natural Language Interactionst" got selected as part of Microsoft Academic Partnership Grant (MAPG 2022) program.

April 8th, 2022: Our papers, "Representation Learning for Conversational Data using Discourse Mutual Information Maximization" and "A Framework to Generate High-quality Datapoints for Multiple Novel Intent Detection" got accepted in NAACL'22 (main and findings, respectively).

March 4th, 2022: Our paper, "Using Data Augmentation to Identify Relevant Reviews for Product Question Answering" got accepted in the Web Conference as a poster.

January 27th, 2022: Our paper, "CrysXPP: An Explainable Property Predictor for Crystalline Materials" got accepted in npj Computational Materials.

December 1st, 2021: Our paper, "LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents" got accepted in AAAI'22.

October 12th, 2021: Our paper, "MTLVS: A Multi-Task Framework to Verify and Summarize Crisis-Related Microblogs" got accepted in WSDM'22.

September 3rd, 2021: Our paper, "Network Embeddings from Distributional Thesauri for Improving Static Word Representations" got accepted in Expert Systems with Applications.

September 2nd, 2021: Our paper received Hypertext Ted Nelson Best Newcomer Paper Award.

August 26th, 2021: Our papers, "PASTE: A Tagging-free Decoding Framework using Pointer Networks for Aspect Sentiment Triplet Extraction" and "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA" got accepted in EMNLP'21 (main and findings, respectively).

June 23rd, 2021: Our proposal, " Multilingual Dialogue as a Novel Framework for AutoSuggest" got selected as part of Microsoft Academic Partnership Grant 2021 program.

May 6th, 2021: Our paper, " Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights" got accepted in Findings of ACL'21.

March 11th, 2021: Our paper, " Hierarchical Transformer for Task Oriented Dialog Systems" got accepted in NAACL-HLT'21.

January 19th, 2021: Our paper, " MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature" got accepted in Computational Materials Science, Elsevier.

December 31st, 2020: Abhilash Nandy and Ankan Mullick receive Prime Ministers Research Fellowship (PMRF).

December 15th, 2020: One Full paper and One Reproducibility Track paper got accepted in ECIR'21.

December 2nd, 2020: Our paper, "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection" got accepted in AAAI'21.

October 3rd, 2020: Our paper, "A Graph Based Framework for Structured Prediction Tasks in Sanskrit" got accepted in Computational Linguistics Journal.

September 30th, 2020: Our paper, "Automatic Crime Identification from Facts: A Few Sentence-Level Crime Annotations is All You Need" got accepted in COLING.

September 15th, 2020: Our paper, "Keep it Surprisingly Simple: A Simple First Order Graph Based Parsing Model for Joint Morphosyntactic Parsing in Sanskrit" got accepted in EMNLP as a short paper.

September 10th, 2020, 2020: Received INAE Young Engineer Awards 2020.

July 24th, 2020: Our paper, "Hate begets Hate: A Temporal Study of Hate Speech" got accepted in ACM CSCW.

June 10th, 2020: Received Google India AI/ML Research Awards 2020.

April 22nd, 2020: Our paper, "Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews" got accepted in SIGIR 2020 as a short paper.

April 20th, 2020: Our paper, "Logic Constrained Pointer Networks for Interpretable Textual Similarity" got accepted in IJCAI 2020.

March 7th, 2020: Our paper, "Aspect-based Sentiment Analysis of Scientific Reviews" got accepted in JCDL 2020.

February 11th, 2020: Two papers got accepted in LREC 2020.

August 13th, 2019: Our paper, "Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs" got accepted in EMNLP 2019 as a short paper.

July 3rd, 2019: Our paper, "Spread of hate speech in online social media" get the best paper award (honorable mention) at WebSci 2019.

May 14th, 2019: Our long paper, "On the Compositionality Prediction of Noun Phrases using Poincaré embeddings" and a short paper, "Poetry to Prose Conversion in Sanskrit as a Linearisation Task: A case for Low-Resource Languages" got accepted in ACL 2019.

April 14th, 2019: Our paper, "Addressing Vocabulary Gap in E-commerce Search" got accepted in SIGIR 2019 as a short paper.

April 6th, 2019: Our paper, "Spread of hate speech in online social media" got accepted in WebSci 2019.

March 16th, 2019: Our paper, "Thou shalt not hate: Countering online hate speech" got accepted in ICWSM 2019.

December 5th, 2018: One long paper, "Automated Early Leaderboard Generation From Comparative Tables" and three short papers got accepted in ECIR 2019.

August 11th, 2018: Our paper, "Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit" got accepted in EMNLP.

August 9th, 2018: Our paper, "Opinion Conflicts: An Effective Route to Detect Incivility in Twitter" got accepted in CSCW.

July 27th, 2018: Our paper, "Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit" got accepted in CoNLL.

May 16th, 2018: Our paper, "WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages" got accepted in Coling.

April 12th, 2018: Our paper, "Identifying Sub-events and Summarizing Information during Disasters" got accepted in SIGIR.

February 15th, 2018: Our paper, "Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?" got accepted in NAACL-HLT.

December 30th, 2017: Our paper, "Extracting and Summarizing Situational Information from the Twitter Social Media during Disasters" got accepted in ACM Transactions on the Web.

December 13th, 2017: Two full papers, "Building a Word Segmenter for Sanskrit Overnight" and "Network Features Based Co-hyponymy Detection" got accepted in LREC 2018 for oral presentations.

December 11th, 2017: Our Paper, "Automated Assistance in E-commerce: An Approach based on Category-Sensitive Retrieval" got accepted in ECIR 2018 as a short paper.

August 21st, 2017: I will be chairing the Young Researchers' Symposium at CODS-COMAD 2018 along with Dr. Amit Awekar from IIT Guwahati. Please consider submitting. You can find more details here.

August 5th, 2017: Our Paper, "Extracting Entities of Interest from Comparative Product Reviews" got accepted in CIKM 2017 as a short paper.

May 29th, 2017:We are organizing ACM summer school on NLP and Machine Learning from June 1st to June 21st, 2017. More details acan be found here.

May 16th, 2017: Our Paper, "Relay-Linking Models for Prominence and Obsolescence in Evolving Networks" got accepted in KDD 2017 for a poster presentation.

March 21st, 2017: Two full papers, "Understanding the Impact of Early Citers on Long-Term Scientific Impact", "WikiM: Metapaths based Wikification of Scientific Abstracts" and one short paper, "Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science" accepted in JCDL 2017.

February 25th, 2017: We are organizing workshop on Complex and Social Networks on March 15th, 2017 in Gargi Auditorium. Prof. Frank Schweitzer (ETH, Zurich), Prof. Laxmidhar Behera (IIT Kanpur) and Dr. Manish Gupta (Microsoft Bing) are the speakers. There will also be a panel discussion on "How to sell your thesis to industry". For more details, visit the website here.

February 23rd, 2017: OCR++ got selected for the Gandhian Young Technological Innovation (GYTI) Award/Appreciation 2017.

February 11th, 2017: Our paper, "A Generic Opinion-Fact Classifier with Application in Understanding Opinionatedness in Various News Section" got accepted as a poster in WWW 2017.

September 21st, 2016: Our papers, "Word Segmentation in Sanskrit Using Path Constrained Random Walks" and "OCR++: A Robust Framework For Information Extraction from Scholarly Articles" got accepted in Coling 2016.

July 19th, 2016: Our paper, "peq : An explainable, specification-based, aspect-oriented product comparator for e-commerce" got accepted in ACM CIKM, 2016 as a short paper.

April 1st, 2016: Our paper, "Summarizing Situational Tweets in Crisis Scenario" got accepted in ACM HyperText, 2016.

December 11th, 2015: Our paper, "FeRoSA: A Faceted Recommendation System for Scientific Articles" got accepted in PAKDD, 2016.

July 4th, 2015: Our papers, "Extracting Situational Information from Microblogs during Disaster Events: A Classification-Summarization Approach" and "The role of citation context in predicting long-term citation profiles: an experimental study based on a massive bibliographic text dataset" got accepted in ACM CIKM, 2015.

May 13th, 2015: Our paper, "On the formation of circles in co-authorship networks" got accepted in ACM SIGKDD, 2015.

January 12th, 2015: Our paper, "An automatic approach to identify word sense changes in text media across timescales" got accepted in JNLE special issue on Graph Methods for NLP.

December 21st, 2014: Our paper, "On the categorization of scientific citation profiles in computer sciences" got accepted in Communications of the ACM.

October 1st, 2014: Our proposal, "IndicView: because language is no more a barrier" has been accepted as part of the Google - IIT Pilot program.

September 8th, 2014: Received a grant of USD 1000 from Yahoo! Labs towards encouraging student participation in the SNLP course projects this semester.