DA321 & Multi-modal Data Processing & Learning - I
L-T-P-C: 3-0-0-6 |
|
Pre-Requisite: |
|
Course Content/ Syllabus:
Introduction: Introduction to Multimodal data and applications, Challenges of multimodal data, Data collection & cleaning.
Text Processing: Text normalization, Lemmatization, Morphology, Subword tokenization; Text processing and statistics: TFIDF, BM-25, Zipf’s law, Hipf’s law; Language models and smoothing techniques; Vector space models.
Speech Processing: Speech production and perception, Acoustic and articulatory phonetics; Short-term analysis: Need and windowing, Energy, Zero-crossing rate, Autocorrelation function, Fourier transform, Spectrogram; Short-term synthesis: Overlap-add method; Cepstrum analysis: Basis and development, mel-cepstrum.
Digital Image and Video Processing: Point processing, Neighborhood processing, Enhancement, Edge detection, Segmentation, Feature descriptors, Restoration, Morphological operations, Image transforms, Spatial and temporal data handling.
Other Modalities: Biomedical signals, and Conventional multi-modal learning. |
|
Textbooks: |
|
1. |
R. C. Gonzalez, R. E. Woods, Digital Image Processing, Pearson, Prentice-Hall, 2008 |
2. |
R. Klette, Concise Computer Vision: An Introduction into Theory and Algorithms, Springer, 2014 |
3. |
L. R. Rabiner, R. W. Schafer, Introduction to Digital Speech Processing, Now Publishers Inc, 2007 |
References: |
|
1. |
D. Jurafsky, J.H. Martin, Speech and Language Processing, 3rd ed. Jan 2022 (Online available at https://web.stanford.edu/~jurafsky/slp3/) |
L-T-P-C: 3-0-0-6 |
|
Pre-Requisite: |
|
|
|
Course Content/ Syllabus:
Functional units of a computer: CPU, memory, I/O; Data representation; Processor design: Instruction set architecture, pipelining; Memory: Concept of hierarchical memory organization, cache memory, mapping functions and replacement algorithms, main memory organization; Input-Output: I/O transfers - program-controlled, interrupt-driven and DMA.
Processes and threads and their scheduling, synchronization, deadlocks in concurrent processes; Memory management basics, demand paging and virtual memory implementation; File system design and implementation.
OSI and TCP/IP Model; Local area networks: Multiple access techniques – wired and wireless; Concepts of switched networks, Internet addressing and routing algorithms; Transport protocols, UDP, TCP, flow control, congestion control; Application Layer: Client-Server and P2P architecture, API; Application layer protocols such as DNS, SSL, WWW, HTTP.
|
|
|
|
Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
D. A. Patterson, J. L. Hennessy, Computer Organization and Design, 5th Edition, Morgan Kaufmann, 2013. |
2. |
A. Silberschatz, P. B. Galvin and G. Gagne, Operating System Concepts, 8th Edition, Wiley India, 2009. |
3. |
J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach, 8th Edition, Pearson, 2021. |
References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
W. Stallings, Computer Organization and Architecture: Designing for Performance, 10th Edition, Pearson, 2015. |
2. |
W. Stallings, Operating Systems: Internals and Design Principles, 9th Edition, Pearson, 2018. |
3. |
A. S. Tanenbaum, Computer Networks, 5th Edition, Pearson India, 2013. |
L-T-P-C: 3-0-0-6 |
|
Pre-Requisite: |
|
Course Content/ Syllabus: Introduction to learning: supervised and unsupervised, generative and discriminative models, classification and regression problems, performance measures, design of experiments; Feature space and dimensionality reduction: Feature selection, PCA, exploratory factor analysis, LDA, ICA; Unsupervised learning: K-means clustering, hierarchical agglomerative clustering, DBSCAN, MLE, MAP, Bayesian learning, Gaussian Mixture Models; Supervised learning: Bayesian decision theory, Logistic Regression, data balancing, simple perceptron and multi-layer perceptron, Parzen windows, k-nearest neighbor, decision trees, support vector machines; ensemble methods, bagging and boosting; Applications and case studies. |
|
Books (In case UG compulsory courses, please give it as “Text books” and “Reference books”. Otherwise give it as “References”. |
|
Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
E. Alpaydin, Introduction to Machine Learning, 3rd Edition, Prentice Hall (India) 2015. |
2. |
R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd Edition, Wiley India, 2007. |
References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2nd Edition, Springer, 2011. |
2. |
S. O. Haykin, Neural Networks and Learning Machines, 3rd Edition, Pearson Education (India), 2016. |
DA311 & Machine Learning Laboratory
L-T-P-C: 0-0-3-3 |
|
Pre-Requisite: |
|
Course Content/ Syllabus: Review: Sci-kit Learn, NumPy and MatPlotLib; PCA and LDA; K-means Clustering, Hierarchical Agglomerative Clustering and DBSCAN; MLE and Bayesian learning; Linear and Logistic Regression; Perceptron; Data Balancing & Imbalance-Learning; Multi-layer perceptron; k-nearest neighbor, Classification and Regression Trees; Support Vector Machines; Random Forest, AdaBoost. |
|
Textbooks: |
|
1. |
E. Alpaydin, Introduction to Machine Learning, 3rd Edition, Prentice Hall (India) 2015. |
2. |
R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd Edition, Wiley India, 2007. |
References: |
|
1. |
C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2nd Edition, Springer, 2011. |
2. |
S. O. Haykin, Neural Networks and Learning Machines, 3rd Edition, Pearson Education (India), 2016. |
DA341 & Applied Time Series Analysis
L-T-P-C: 3-0-0-6 |
|
Pre-Requisite: |
|
Course Content/ Syllabus
Fundamental components of time series; Preliminary tests: randomness, trend, seasonality; Estimation/elimination of trend and seasonality; Mathematical formulation of time series; Stationarity concepts; Auto Covariance and Autocorrelation functions of stationary time series and its properties; Linear stationary processes and their time-domain properties: AR, MA, ARMA, seasonal, non-seasonal and mixed models; ARIMA models; Multivariate time series processes and their properties: VAR, VMA and VARMA; Parameter estimation of AR, MA, and ARMA models: LS approach, ML approach for AR, MA and ARMA models, Asymptotic distribution of MLE; Best Linear predictor and Partial autocorrelation function; Model-identification with ACF and PACF; Model order estimation techniques; Frequency domain analysis: spectral density and its properties and its estimation, Periodogram analysis.
|
|
Books (In case UG compulsory courses, please give it as “Text books” and “Reference books”. Otherwise give it as “References”. |
|
Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
P. J. Brockwell and R.A. Davis, Introduction to Time Series and Forecasting, 2nd Edition, Springer, 2002. |
2. |
T. W. Anderson, The Statistical Analysis of Time Series. Vol. 19, 1st Edition, John Wiley & Sons, 2011 |
References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
P. J. Brockwell and R.A. Davis, Time Series: Theory and Methods, 2nd Edition, Springer Science & Business Media; 2009 |
2. |
J. D. Hamilton, Time Series Analysis, 1st Edition, Princeton University Press. 2020 |
DA331 & Big Data Analytics: Tools & Techniques
L-T-P-C: 2-0-2-6 |
|
Pre-Requisite: |
|
Fundamentals of Big Data: Understanding big data, datasets, data analysis, data analytics, big data characteristics, types of data, case studies; Big data adoption and planning considerations: data procurement, big data analytics lifecycle, case study examples; Big data storage concepts: cluster computing, file system, distributed file systems, Relational & non-relational databases, scaling up & scaling out storage; No-SQL: Data types, Creating, Updating & Deleting documents, Querying, An example No-SQL database; Distributed computing framework: Introduction, file system, MapReduce programming model, examples of distributed computing environment framework; Stream data processing: tools such as Apache Spark, Apache Storm; Analytics with distributed computing framework: supervised learning examples, unsupervised learning examples.
The lectures will focus on the well-established algorithms in these topics, and the laboratory exercises will supplement those lectures with programming assignments and mini projects. |
|
Texts: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
Thomas Erl, Wajid Khattak and Paul Buhler, Bigdata fundamentals, concepts, drivers & techniques, 1st Edition, Pearson, 2016. |
2. |
Pramod J. Sadalage, Martin Fowler, Addison-Wesley, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Edition, Pearson, 2012. |
3. |
Sandy Ryza, Uri Laserson, Sean Owen and Josh wills, Advanced Analytics with Spark – patterns for learning from data at scale, 1st Edition, O’reilly, 2017. |
References: (Format: Authors, Book Title in Italics font, Volume/Series, Edition Number, Publisher, Year.) |
|
1. |
Sean T. Allen, Matthew Jankowski, and Peter Pathirana, Storm Applied - Strategies for real-time event processing, 1st Edition, Manning Publications, 2015 |
2. |
Shannon Bradshaw, Eoin Brazil, Kristina Chodorow, MongoDB: The Definitive Guide, 3rd Edition, O’reilly, 2019. |
3. |
Bill Chambers, Matei Zaharia, Spark: The Definitive Guide, 1st Edition, O’Reilly, 2018. |
4. |
Tom White, Hadoop: The Definitive Guide, 4th Edition, Shroff/O'Reilly, 2015. |
5. |
Balamurugan Balusamy, Nandhini Abirami R, Seifedine Kadry, Amir H., Big Data: Concepts, Technology, and Architecture, 1st Edition, Pearson, 2015. |