We propose to develop a multi-modal broadcast analytics system which can detect and track events of security concern from continuous flow of multimodal data streams originating from hetrogeneous data sources. These multi-modal sources include TV news channels and news websites. Due to multi-modal nature of the data streams our system consists of three sub-systems viz. video analysis sub-system, audio analysis and speech recognition sub-system and Text analytics sub-system
Input to the System :The input of the broadcast analytics system is the multi-modal data available from various heterogeneous sources such as TV news channels and web news articles. The multi-modal data consists of video, audio, images and textual data. The broadcast news videos are processed beforehand to segment the video streams into news stories and to extract the overlay text, speech transcript and meta-data.
Output of the System : The multi-modal broadcast analytics system has following two types of outputs.Speech synthesis is the artificial production of human speech.Text to speech (TTS) convention transforms linguistic information stored as text into speech. Objective of Manipuri TTS synthesis is to convert an arbitrary Manipuri text into corresponding spoken waveform.Text processing and speech generation are two main components of a text to speech synthesis system.In our work we mainly focusing on Concatinative Unit-Selection and Parametric Synthesis system.
For Concatinative Unit-Selection based synthesis we have used openly available Festival framework.Concatenative synthesis is based on the concatenation of units selected from the database.Unit selection synthesis uses large databases of recorded speech.The basic speech segments of our systems are phones (consonants, short vowels and long vowels), bi-phones (CV- clusters: speech segments with a consonant followed by a vowel) and syllables. Indian languages are mostly syllabic in nature, so we have considered syllables as main segmented units.Segmentation is done using Hybrid segmentation(HMM with Group Delay Method).An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position of syllables in a word(since syllables present in beginning, middle and end of a word has different energy levels).In addition to this, to provide more linguistically balanced clusters, we have also considered textual informations like neighboring syllables of a particular syllable.During synthesis, the desired target utterance is created by determining the best chain of candidate units from the database.Selection of units is done using classification and regression(CART) tree.
Flite is designed as an alternative synthesis engine to Festival.It is primarily designed for small embedded machines and large servers.
In Parametric Synthesis system, we have considered Hidden Markov Model based synthesis(HTS).In the training phase of HTS,spectral parameters and excitation parameters are extracted from the speech data.Using these features and the time-aligned phonetic transcriptions, context-independent monophone HMMs are trained .In the synthesis phase, context-dependent label files are generated for the given text and the required context-dependent HMMs are concatenated.Speech waveform is synthesized directly from the generated spectral and excitation parameters using mel log spectrum approximation(MLSA) filter.
Project Link:Click HereAakash project at IIT Guwahati was dedicated to the development of useful applications and contents for use with Aakash. It attempts to empower teachers by using a unique blend of technology, e-contents, and an innovative pedagogy.
Projects developed under the Aakash Application Development ProjectWe often see a political party splitting, government instability, or a politician changing his/her party. In our study, we aim to predict integrity of candidates within a political party using Twitter posts, called tweets, by Indian politicians which in turn would help us predict the above mentioned phenomena. We identify important topics/issues on which the politicians have commented and classify those as either expressing Positive/Negative/Neutral opinion using Sentiment Analysis. We use statistical inference on the results to find how integrated the candidates are. We aim to extend the study to different parties and to journalists.
Online demos: