Abir Das

Research Works

My research interests span the areas of computer vision, machine learning and pattern recognition. Recently, I have started exploring the area of explainable AI - the science of explaining decisions made by artificial intelligence (AI) models and using the explanations to make them better. Interpreting AI models has huge significance for safety critical applications e.g., healthcare, autonomous driving or criminal justice.
During my PhD I focussed on the theory and mechanism of re-identifying a large number of people over a network of cameras with non-overlapping Fields-of-view (FOVs). Especially, I explored what role transformation of features, consistency of re-identification and efficient involvement of human in the process, play towards a scalable and improved person re-identification.
I have also explored the task of summarizing videos in the form of short length key-frame sequences or a video skims. Video summarization has a lot of practical importance as in several video analysis applications like content-based search, interactive browsing, retrieval etc. In particular, my research aims to address the relatively less explored area of video summarization – the multi-view video summarization where a concise summary from a set of input videos captured from different cameras are generated. Large amount of inter and intra view content correlations along with difference in illumination, pose and synchronization among the videos from different views adds to the challenge compared to the traditional single-view video summarization.
During my post-doc years, I grew interest in producing natural language descriptions of in-the-wild videos. The ability to automatically describe videos in natural language enables many important applications, such as generating textual summary of video clips, content-based video retrieval, video segmentation, descriptive video service (DVS) for the visually impaired among others. My research built on the latest techniques in deep neural-network approaches to machine learning and natural language processing. I'm also excited about the field of detecting activities in untrimmed videos especially end-to-end systems capable of recognizing activities and localizing them in time from untrimmed videos.
For a more detailed description about the individual works please visit the publications page.