DA321 & Multi-modal Data Processing & Learning - I
L-T-P-C: 3-0-0-6 Pre-Requisite: Course Content/ Syllabus: Introduction: Introduction to Multimodal data and applications, Challenges of multimodal data, Data collection & cleaning. Text Processing: Text normalization, Lemmatization, Morphology, Subword tokenization; Text processing and statistics: TFIDF, BM-25, Zipf’s law, Hipf’s law; Language models and smoothing techniques; Vector space models. Speech Processing: Speech production and perception, Acoustic and articulatory phonetics; Short-term analysis: Need and windowing, Energy, Zero-crossing rate, Autocorrelation function, Fourier transform, Spectrogram; Short-term synthesis: Overlap-add method; Cepstrum analysis: Basis and development, mel-cepstrum. Digital Image and Video Processing: Point processing, Neighborhood processing, Enhancement, Edge detection, Segmentation, Feature descriptors, Restoration, Morphological operations, Image transforms, Spatial and temporal data handling. Other Modalities: Biomedical signals, and Conventional multi-modal learning. Textbooks: 1. R. C. Gonzalez, R. E. Woods, Digital Image Processing, Pearson, Prentice-Hall, 2008 2. R. Klette, Concise Computer Vision: An Introduction into Theory and Algorithms, Springer, 2014 3. L. R. Rabiner, R. W. Schafer, Introduction to Digital Speech Processing, Now Publishers Inc, 2007 References: 1. D. Jurafsky, J.H. Martin, Speech and Language Processing, 3rd ed. Jan 2022 (Online available at https://web.stanford.edu/~jurafsky/slp3/)
L-T-P-C: 3-0-0-6 Pre-Requisite: Course Content/ Syllabus: Functional units of a computer: CPU, memory, I/O; Data representation; Processor design: Instruction set architecture, pipelining; Memory: Concept of hierarchical memory organization, cache memory, mapping functions and replacement algorithms, main memory organization; Input-Output: I/O transfers - program-controlled, interrupt-driven and DMA. Processes and threads and their scheduling, synchronization, deadlocks in concurrent processes; Memory management basics, demand paging and virtual memory implementation; File system design and implementation. OSI and TCP/IP Model; Local area networks: Multiple access techniques – wired and wireless; Concepts of switched networks, Internet addressing and routing algorithms; Transport protocols, UDP, TCP, flow control, congestion control; Application Layer: Client-Server and P2P architecture, API; Application layer protocols such as DNS, SSL, WWW, HTTP. Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. D. A. Patterson, J. L. Hennessy, Computer Organization and Design, 5th Edition, Morgan Kaufmann, 2013. 2. A. Silberschatz, P. B. Galvin and G. Gagne, Operating System Concepts, 8th Edition, Wiley India, 2009. 3. J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach, 8th Edition, Pearson, 2021. References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. W. Stallings, Computer Organization and Architecture: Designing for Performance, 10th Edition, Pearson, 2015. 2. W. Stallings, Operating Systems: Internals and Design Principles, 9th Edition, Pearson, 2018. 3. A. S. Tanenbaum, Computer Networks, 5th Edition, Pearson India, 2013.
L-T-P-C: 3-0-0-6 Pre-Requisite: Course Content/ Syllabus: Introduction to learning: supervised and unsupervised, generative and discriminative models, classification and regression problems, performance measures, design of experiments; Feature space and dimensionality reduction: Feature selection, PCA, exploratory factor analysis, LDA, ICA; Unsupervised learning: K-means clustering, hierarchical agglomerative clustering, DBSCAN, MLE, MAP, Bayesian learning, Gaussian Mixture Models; Supervised learning: Bayesian decision theory, Logistic Regression, data balancing, simple perceptron and multi-layer perceptron, Parzen windows, k-nearest neighbor, decision trees, support vector machines; ensemble methods, bagging and boosting; Applications and case studies. Books (In case UG compulsory courses, please give it as “Text books” and “Reference books”. Otherwise give it as “References”. Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. E. Alpaydin, Introduction to Machine Learning, 3rd Edition, Prentice Hall (India) 2015. 2. R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd Edition, Wiley India, 2007. References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2nd Edition, Springer, 2011. 2. S. O. Haykin, Neural Networks and Learning Machines, 3rd Edition, Pearson Education (India), 2016.
DA311 & Machine Learning Laboratory
L-T-P-C: 0-0-3-3 Pre-Requisite: Course Content/ Syllabus: Review: Sci-kit Learn, NumPy and MatPlotLib; PCA and LDA; K-means Clustering, Hierarchical Agglomerative Clustering and DBSCAN; MLE and Bayesian learning; Linear and Logistic Regression; Perceptron; Data Balancing & Imbalance-Learning; Multi-layer perceptron; k-nearest neighbor, Classification and Regression Trees; Support Vector Machines; Random Forest, AdaBoost. Textbooks: 1. E. Alpaydin, Introduction to Machine Learning, 3rd Edition, Prentice Hall (India) 2015. 2. R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd Edition, Wiley India, 2007. References: 1. C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2nd Edition, Springer, 2011. 2. S. O. Haykin, Neural Networks and Learning Machines, 3rd Edition, Pearson Education (India), 2016.
DA341 & Applied Time Series Analysis
L-T-P-C: 3-0-0-6 Pre-Requisite: Course Content/ Syllabus Fundamental components of time series; Preliminary tests: randomness, trend, seasonality; Estimation/elimination of trend and seasonality; Mathematical formulation of time series; Stationarity concepts; Auto Covariance and Autocorrelation functions of stationary time series and its properties; Linear stationary processes and their time-domain properties: AR, MA, ARMA, seasonal, non-seasonal and mixed models; ARIMA models; Multivariate time series processes and their properties: VAR, VMA and VARMA; Parameter estimation of AR, MA, and ARMA models: LS approach, ML approach for AR, MA and ARMA models, Asymptotic distribution of MLE; Best Linear predictor and Partial autocorrelation function; Model-identification with ACF and PACF; Model order estimation techniques; Frequency domain analysis: spectral density and its properties and its estimation, Periodogram analysis. Books (In case UG compulsory courses, please give it as “Text books” and “Reference books”. Otherwise give it as “References”. Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. P. J. Brockwell and R.A. Davis, Introduction to Time Series and Forecasting, 2nd Edition, Springer, 2002. 2. T. W. Anderson, The Statistical Analysis of Time Series. Vol. 19, 1st Edition, John Wiley & Sons, 2011 References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. P. J. Brockwell and R.A. Davis, Time Series: Theory and Methods, 2nd Edition, Springer Science & Business Media; 2009 2. J. D. Hamilton, Time Series Analysis, 1st Edition, Princeton University Press. 2020
DA331 & Big Data Analytics: Tools & Techniques
L-T-P-C: 2-0-2-6 Pre-Requisite: Fundamentals of Big Data: Understanding big data, datasets, data analysis, data analytics, big data characteristics, types of data, case studies; Big data adoption and planning considerations: data procurement, big data analytics lifecycle, case study examples; Big data storage concepts: cluster computing, file system, distributed file systems, Relational & non-relational databases, scaling up & scaling out storage; No-SQL: Data types, Creating, Updating & Deleting documents, Querying, An example No-SQL database; Distributed computing framework: Introduction, file system, MapReduce programming model, examples of distributed computing environment framework; Stream data processing: tools such as Apache Spark, Apache Storm; Analytics with distributed computing framework: supervised learning examples, unsupervised learning examples. The lectures will focus on the well-established algorithms in these topics, and the laboratory exercises will supplement those lectures with programming assignments and mini projects. Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. Thomas Erl, Wajid Khattak and Paul Buhler, Bigdata fundamentals, concepts, drivers & techniques, 1st Edition, Pearson, 2016. 2. Pramod J. Sadalage, Martin Fowler, Addison-Wesley, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Edition, Pearson, 2012. 3. Sandy Ryza, Uri Laserson, Sean Owen and Josh wills, Advanced Analytics with Spark – patterns for learning from data at scale, 1st Edition, O’reilly, 2017. References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) 1. Sean T. Allen, Matthew Jankowski, and Peter Pathirana, Storm Applied - Strategies for real-time event processing, 1st Edition, Manning Publications, 2015 2. Shannon Bradshaw, Eoin Brazil, Kristina Chodorow, MongoDB: The Definitive Guide, 3rd Edition, O’reilly, 2019. 3. Bill Chambers, Matei Zaharia, Spark: The Definitive Guide, 1st Edition, O’Reilly, 2018. 4. Tom White, Hadoop: The Definitive Guide, 4th Edition, Shroff/O'Reilly, 2015. 5. Balamurugan Balusamy, Nandhini Abirami R, Seifedine Kadry, Amir H., Big Data: Concepts, Technology, and Architecture, 1st Edition, Pearson, 2015.