Course Code: DA421M |
Multi-modal Data Processing and Learning |
Credits: 3-0-0-6 |
Pre-requisite: None |
|
|
Syllabus: Text: Natural Language Processing – Text normalization: subword tokenization, lemmatization, morphology; Language models and smoothing techniques; Vector space models. Text: Information Retrieval- Introduction: Text processing and statistics, Document parsing, Inverted index; Retrieval and Ranking: TFIDF, BM-25, Binary independent model, Page rank, HITS, Query Expansion; Evaluation methods. Speech Processing - Speech production and perception, Acoustic and articulatory phonetics; Audio and Speech signal processing. Digital Image and Video Processing - Image/video acquisition and perception; Basic image processing operations; Image and Video features; Motion estimation; Applications of image and video processing. Learning with multi-modal data: VQA, Emotion Recognition etc. |
Textbooks:
- W. B. Croft, D. Metzler, T. Strohman, Search Engines Information Retrieval in Practice, Pearson, 2015.
- C. J. Chen, Elements of Human Voice, World Scientific Publishing, 2016.
- M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and Machine Vision, 4th Ed., Cengage, 2017.
|
References:
- D. Jurafsky, J.H. Martin, Speech and Language Processing, Dive into Deep Learning, 3rd Ed. 2022.
- C. D. Manning, P Raghavan, H Schutz, Introduction to Information Retrieval, Cambridge University Press, 2008.
- T. F. Quatieri, Discrete-Time Processing of Speech Signals, Pearson Education, 2005.
- L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals, Pearson Education, 2004.
- D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd Ed. University Press, 2005.
- R Szeliski, Computer Vision: Algorithms and Applications, Springer, 2022.
- M. K. Bhuyan, Computer Vision and Image Processing – Fundamentals and Applications, CRC Press, USA, 2019.
- R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, 3rd Ed. 2008.
|