DS&AI | IIT G

Seventh Semester BTech Minor Course Syllabus

Course Code: DA421M	Multi-modal Data Processing and Learning	Credits: 3-0-0-6
Pre-requisite: None
Syllabus: Text: Natural Language Processing – Text normalization: subword tokenization, lemmatization, morphology; Language models and smoothing techniques; Vector space models. Text: Information Retrieval- Introduction: Text processing and statistics, Document parsing, Inverted index; Retrieval and Ranking: TFIDF, BM-25, Binary independent model, Page rank, HITS, Query Expansion; Evaluation methods. Speech Processing - Speech production and perception, Acoustic and articulatory phonetics; Audio and Speech signal processing. Digital Image and Video Processing - Image/video acquisition and perception; Basic image processing operations; Image and Video features; Motion estimation; Applications of image and video processing. Learning with multi-modal data: VQA, Emotion Recognition etc.
Textbooks: W. B. Croft, D. Metzler, T. Strohman, Search Engines Information Retrieval in Practice, Pearson, 2015. C. J. Chen, Elements of Human Voice, World Scientific Publishing, 2016. M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and Machine Vision, 4th Ed., Cengage, 2017.
References: D. Jurafsky, J.H. Martin, Speech and Language Processing, Dive into Deep Learning, 3rd Ed. 2022. C. D. Manning, P Raghavan, H Schutz, Introduction to Information Retrieval, Cambridge University Press, 2008. T. F. Quatieri, Discrete-Time Processing of Speech Signals, Pearson Education, 2005. L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals, Pearson Education, 2004. D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd Ed. University Press, 2005. R Szeliski, Computer Vision: Algorithms and Applications, Springer, 2022. M. K. Bhuyan, Computer Vision and Image Processing – Fundamentals and Applications, CRC Press, USA, 2019. R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, 3rd Ed. 2008.