Seventh Semester BTech Minor Course Syllabus

Course Code: DA421M Multi-modal Data Processing and Learning Credits: 3-0-0-6
Pre-requisite: None
Syllabus: Text: Natural Language Processing – Text normalization: subword tokenization, lemmatization, morphology; Language models and smoothing techniques; Vector space models. Text: Information Retrieval- Introduction: Text processing and statistics, Document parsing, Inverted index; Retrieval and Ranking: TFIDF, BM-25, Binary independent model, Page rank, HITS, Query Expansion; Evaluation methods. Speech Processing - Speech production and perception, Acoustic and articulatory phonetics; Audio and Speech signal processing. Digital Image and Video Processing - Image/video acquisition and perception; Basic image processing operations; Image and Video features; Motion estimation; Applications of image and video processing. Learning with multi-modal data: VQA, Emotion Recognition etc.
Textbooks:
  • W. B. Croft, D. Metzler, T. Strohman, Search Engines Information Retrieval in Practice, Pearson, 2015.
  • C. J. Chen, Elements of Human Voice, World Scientific Publishing, 2016.
  • M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and Machine Vision, 4th Ed., Cengage, 2017.
References:
  • D. Jurafsky, J.H. Martin, Speech and Language Processing, Dive into Deep Learning, 3rd Ed. 2022.
  • C. D. Manning, P Raghavan, H Schutz, Introduction to Information Retrieval, Cambridge University Press, 2008.
  • T. F. Quatieri, Discrete-Time Processing of Speech Signals, Pearson Education, 2005.
  • L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals, Pearson Education, 2004.
  • D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd Ed. University Press, 2005.
  • R Szeliski, Computer Vision: Algorithms and Applications, Springer, 2022.
  • M. K. Bhuyan, Computer Vision and Image Processing – Fundamentals and Applications, CRC Press, USA, 2019.
  • R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, 3rd Ed. 2008.