Department of Electronics and Electrical Engineering
Indian Institute of Technology Guwahati
Guwahati-781039, India

EEE Department, IIT Guwahati

Multimedia Analytics Laboratory

The Multimedia Analytics Laboratory was set up in the department of Electronics and Electrical Engineering (EEE), Indian Institute of Technology (IIT) Guwahati during July, 2013. The laboratoty focuses on the research and development activities related to video, speech and text analytics and applications of computer vision in graphics and robotics.

Sponsored Projects
    Project Title: Multi-Modal Broadcast Analytics – Structured Evidence Visualization for Events of Security Concern
    Funding Agency: DIT New Delhi.
    PI: Dr. Prithwijit Guha
    Co-PI: Dr.Sanasam Ranbir Singh
    CI: Prof. S. R. M. Prasanna, Prof. S. Nandi
    Duration: 2013 to 2016
    Status: Ongoing

Multi-Modal Broadcast Analytics

Multi-modal contents from different broadcasting authorities present news events and related opinions from varying perspectives. Information content of events with varying levels of sensitization and their capabilities of triggering further events are of paramount importance to national security strategists, content monitoring agencies and media analysts. For example, “Mumbai Attack” was widely reported across all news channels and websites. This event also triggered other news events like “Honorary Awards and Recognition for Martyr’s Families”, “Terror Tourism”, “Home Ministry Reshuffle”, “Kasab Trial Case” etc. These events are linked through common keywords or key phrases like “Terrorist”, “Kasab”, “Martyr”, “Hemant Karkare”, “ATS” etc. Audio-visual descriptions, interviews and debates were also broadcast on television channels. Video sequences of news reporting or debates also had commercial breaks and logo animations. With the availability of such huge amount of news on this topic (along with other news), how can we automatically obtain a consolidated report on “Mumbai Attack” (while rejecting others) and the development of related events with chronologically ordered news articles and videos? Or, can we be alerted with the news instantiations containing keywords like “Terrorist”, "Attack" or "Bomb Blast"? A necessity of such functionalities motivates us to propose a multi-modal analytics platform for indexing and querying broadcast media contents.

We propose to develop a prototype system capable of analyzing, indexing and querying multi-modal information, cross-link keywords with meaningful audio-video segments (e.g. removing commercials, logo animations etc.) and relate news events along a timeline. This will provide us with a structured organization of the multi-modal broadcast data along with their inter-relations. The availability of such a structured knowledge base will enable us to generate reports in response to queries with specific news events or keywords. Thus, the scope of the work involves the development of an analytical framework to extract structured information from unstructured multi-modal data through video analytics, text mining and speech processing.




Speech/Audio Analytics

The task of Speech/Audio analytics involves identifying different segments in broadcast audio like pure speech, speech with background music and pure music and further classifying them into their respective classes. Pure speech generally contains a lot of information regarding a particular event. Hence , a speaker independent speech-to-text transcription is a necessary first step to extract keywords or event tags. The presence of a particular keyword in a speech segment relating to a particular event may be obtained by performing keyword spotting in the continuous speech segment. The textual information, so extracted will be a useful source for text mining and video analytics in a multi-modal analytical framework.




Multiple Object Tracking

Tracking object of interest continues to be a challenging problem in the domain of video analytics. Most existing systems track multiple targets in images acquired with a static camera. For example, the adjacent videos show a few results from our previous work where multiple objects are tracked in a video acquired with a static camera. Our method does not assume any prior object model; uses a second order motion model and single/multiple patch mean-shift tracking. The algorithm detects the object occlusion states based on which it localizes the object and updates target features. Our present focus lies in tracking multiple objects in images acquired from a moving camera or a Kinect. Such image sequences may be acquired by a person with a video camera or an imaging sensor mounted on a mobile robot. We propose to combine the traditional tracking techniques with the recent TLD based methods to achieve higher tracking accuracy and reduced track switches.



Faculty Incharge: Dr. Prithwijit Guha

Associate Faculty Incharge: Dr. Suresh Sundaram

Staff Incharge: -