CS523 : Advanced Computer Architecture
Instructor: Dr. A. Sahu
Course Structure | Lecture Slides | Books | ClassTiming, Venue and Rules
Focus of this course will be on concept in designing industry (Intel/AMD/NVIDIA/Google/IBM/CISCO) standard high performance computer system
Pre-Requisites: CS222 (Computer Architecture and Organisation) http://jatinga.iitg.ernet.in/~asahu/cs222/
Allocated list of topics for lecture note scribing is available AllocatedListHere.
Motivation behind lecture note scribing is to create course material/book for next ACA batch. As there is no good book available which cover all the topics of current ACA course. This book/course material will be openly accessible to all and name of the scriber will be mentioned in each chapters and in the book.
- 24-29 Jul 2012 MON: Course Struture, Introduction, Motivation, Reference, Timing and Venue PDF Slides
- 31 Jul 2012 TUE: Advanced Architecture: Top down Approach (Classifications) PDF Slides
[[Sima Book, Preface, Page 4]]
- 01 Aug 2012 WED: Advanced Architecture: Top down Approach (Pipeline and ILP)PDF Slides
[[Sima Book, Chapter 4]]
- 06 Aug 2012 MON: ACA:Data Parallel and Function Parallel) and Understanding a given Processor Arcitecture (8085)PDF Slides
[[Sima Book, Introduction and Preface, 8085 Ramesh S Gaononkar Book]]
Pthread Thread Affinity (Mapping User Thread to Hardware thread)
[[Understanding a given Processor Architecture (8085), 8085 Ramesh S Gaononkar Book]]
- 07 Aug 2012 TUE: Designing a processor using components (Single cycle and with only 9 MIPS instructions) PDF Slides
[[Hennessy Peterson, Basic Architecture Book, Chapter 5 ]]
- 08 Aug 2012 WED: Extending to Multi Cycle Design, Pipeline designPDF Slides
[[Hennessy Peterson, Basic Architecture Book, Chapter 5]]
- 13 Aug 2012 MON: Pipeline Design, Clock Skew and Stage DivisionPDF Slides
[[Hennessy Peterson, Basic Architecture Book, Chapter 5, Flynn Book,Chapter 2 ]]
- 14 Aug 2012 TUE: Wave/Self Timed Pipeline [[Non Uniform Line]])PDF Slides
[[Flynn Book,Chapter 2 , Wave Pipeline Tutorial and Survey, IEEE Trans VLSI, 1998>]]
- 21 Aug 2012 TUE: Pipeline Hazards: Data Forwarding and Pipeline Scheduling PDF Slides
[[6.4/6.5 of 3rd Edd Hennessy book (Ebook given), Chapter 6 Hwang Book
- 22 Aug 2012 WED: Branch Performance (Predication and Speedup)PDF Slides
[[Flynn Book,Chapter 4.5,, Sima Book Chapter 8 ]]
- 23 Aug 2012 THU/MON: Branch Performance (Prediction and Target Capture)PDF Slides
[[Flynn Book,Chapter 4.5,, Sima Book Chapter 8 ]]
- 24 Aug 2012 FRI: Extra class in 4th slot Design Space of Super-scalar ProcessorPDF Slides
[[Sima Book,Chapter 7]]
Sima Paper I on Superscalar, Sima Paper II on Superscalar and Sima Paper II on Superscalar
- 27 Aug 2012 MON: Super-scalar Design Space: Shelving, Renaming, Operand Fetch PDF Slides
[[Sima Book,Chapter 7, Hennessy CA-QA Book, Chapter 3.4 and 3.5, Flynn Book Section 7.6.5]]
- 28 Aug 2012 TUE: Super-scalar : Instruction Scheduling (Scoreboard, Tomasulo's Approach)PDF Slides
[[Sima Book,Chapter 7, Hennessy CA-QA Book, Chapter 3.4 and 3.5, Appendix A7 (scoreboard scheduling)]]
Tamasulo's Approach Demo1, Demo2 and Demo3 (Java Appet, Required JRE to be installed)]]
- 29 Aug 2012 WED: Super-scalar: Speculation, Reordering, ILP Limitation PDF Slides
[[Hennessy CA-QA Book, Chapter 3.10]]
- 03 SEP 2012 (MON) : ILP Limitation and Simultaneous Multithreading PDF Slides
[[Hennessy CA-QA Book, Chapter 3.10-3.12]]
- 04 SEP 2012 (TUE) : SMT, Processor Case Study Intel P4-HT, Intel Atom, Comparison of Processors(AMD Athlon, PowerPC 5, Intel Itanium and Intel P4HTExtEd), Performance/Energy Efficiency of Intel Core-i7 and Intel Atom PDF Slides
[[Hennessy CA-QA Book, Chapter 3.10-3.12]]
- 05 Sep 2012 WED: Memory Hierarchy, Memory Wall, and Cache: Set/Index, Associativity, Line size/offset PDF Slides
[[Flynn Book,Chapter 5 and 6, Hennessy Paterson Chapter 2, 5th Ed ]]
- 10 Sep 2012 Monday: Program Cache Behavior, Miss Classification, AMAT and Local/Global Miss PDF Slides
[[Flynn Book,Chapter 5 and 6, Hennessy Paterson Chapter 2, 5th Ed ]]
- 11 Sep 2012 Tuesday: Cache Policies (access:seq/cun/fwd, load:blk/warp/forward,replacement:lru/lfu/mfu/fifo,fetch:demand/pre/swpre, write:wt/wb), Performance Optimization PDF Slides
[[ Hennessy CA-QA Book 4th Ed Chapter 5, Section 2.2 of Cragon Book]]
- 12 Sep 2012, Wednesday: Cache Performance Optimization PDF Slides
[[Hennessy CA-QA Book 4th Ed Chapter 5 ]]
MID SEMESTER EXAM [[MidSemQuestion], [Solution will be Uploaded Soon]
II Part of ACA: Multicore Computing
- 24-Sep-2012 (MON): Why Multicore: Power/Cost Efficiency, Speedup, Issues in Multiprocessing(Sharing, Mapping/Scheduling, Parallelising) PDFSlides
[[Flynn Book, Hennessy Book Introductions]]
- 25-Sep-2012 : Multiprocessing, Amdhal's law,Gustafon Law's, Equal Work Hypothesis, Efficiency of Parallel Processing, Paralleling Program, Shared Memory Vs Distributed Memory, Shared Memory Architecture PDF Slide
[[Flynn Book 8.1, 8.5 and 8.6, Intel TBB Book (Reminders) Chapter 2 and Parhami Book ]
- 26-Sep-2012: BUS Protocols: Comparison with ALOHA and CSMA, Queueing/Prob Analysis of Multiprocessor BUS PDF Slides
[[flynn Chap 6.8]]
- 01-Oct-2012: Static: Interconnection Network (array, ring, tree, mesh, hyper cube) embedding, 2.5D/3D MESH, Denser/Sparser MESH/Torous PDF Slides
[[Kai Hwang Book Chapter:1 and 2, Parhami Book Chapter 12]]
- 03-Oct-2012 : Dynamics Network/Switching (Bayes, Banes), Routing: Static/Dynamic, Store forword/Warm hole/Cut Through, Mesh NOC: Routing, Router Architecture, Routing Algorithms PDF Slides
[[Flynn Book, Interconnection Network Book, Parhami Book Chapter 12, ]]
- 04-Oct-2012 THURSDAY: Mesh NOC: Routing Algorithms (XY, West First, North Last, negative First and Even Odd)PDF Slides
[[Flynn Book, Interconnection Network Book ]]
- 08-OCT-2012 MONDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part I PDF Slides
[[ Hennessy CA-QA Book, 4th Ed, Chapter 4.2 (Sec 5,5 of 5th Edd), Culler Book, Sec 5.5, Memory Consistency Coherence Book ]]
- 09-OCT-2012 TUESDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part II PDF Slides
[[Ref Prev Lects]]
- 10-OCT-2012 WEDNESDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part III PDF Slides
[[Ref Prev Lects]]
- 16/17 Oct 2012 : Data Parallel Architecture (Vector Architecture and SIMD ) part I PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
- 26 Oct 2012 (FRI with Tuesday Time Table): Data Parallel Architecture (Vector Architecture and SIMD ) part II PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
- 29 Oct 2012: GPU Architecture, Cuda Programming PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
- 30 Oct 2012: Cuda Programming, Compiler Transformation for Parallelism PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4, David Kirk Book, Compiler Book (Sethi, Aho, Ulman and Lam) Page 801, 810,819 and 848, ]]
- 31 Oct 2012: Data Placement Model in Multicore: bipartite matching formulation, ILP formulation, MST formulation for Dual Port Memory, Cache Data Placement in Multicore : Multicore Caching PDF Slide
[[Extra work:Read algorithms for stable matching, weighted bipartite matching by augmented path iteration, max Span Tree and Integer Linear Program]],[[ 2008 R-NUCA paper ]]
- 02 Nov 2012 (Makeup Class): Cache Partitioning and BW Partitioning PDF Slides
[[ Paper: Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance, Solihin HPCA 2010]], [[Paper:Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches ,Saholin MICRO 2006]]
- 05 Nov 2012 : Cache Prefetching: Helper based, Hardware, Pining and Locking, Off-chip Bandwidth Scheduling of Multicore in Presence of PrefetchingPDF Slides
[[ (a) Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems IPDPS 06]](b) [[Paper: Adaptive Prefetching for Shared Cache Based Chip Multiprocessors, KandemirDate09]], and (c) [[Paper: Prefetch-Aware Shared-Resource Management for Multi-Core Systems, ebrahimi_isca11]]
- 06 Nov 2012: Multiprocessor Scheduling (Theory), Multiprocessor (Approximation), List Scheduling PDFSlides
[[ PeterBrukerBook]], [[Paper:A Survey of Hard Real-Time Scheduling for Multiprocessor Systems ]]
- 07-Nov-2012: Real Time Scheduling (Schedulability test, RMS and EDF), Distributed Scheduling, Cilk, Work Stealing, 2D/3D MESH multicore scheduling PDFSlides
[[ Web Material: A Literature Study on Scheduling in Distributed Systems]], [[Paper: A Taxonomy of Scheduling in General-Purpose Distributed Computing System]] and [[Book:Advanced OS by Singal]] [[ Chapter 17, Algorithm CLR Book 3rd Ed ]], [[ Loh Paper: 3D-Stacked Memory Architectures for Multi-Core Processors]], [[ Kandemir Paper: Design and Management of 3D Chip Multiprocessors Using Network-in-Memory]] and [[ Chou Paper: Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip]]
- 12 Nov 2012 : Tiled manycore architecture, Re-configurable Mesh, Architecture Warehouse Scale Computer PDFSlides
[[ Anant Agrawal Paper: ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR]],[[ Ravichandran Book and Link IEEE.TC.MillerPaper93 ]], and Hennesy CA-QA Book, 5th Ed, Chapter 6 ]]
- Self Reading..: Pthread, Cilk, OpenMP and Cuda
Research tools: Simulators (Multi2Sim, SESC, SIMIC/GEMS) and Benchmarks (SpecOMP, ParSec, Splash, etc) [[ ]]
End Semester Timing: Nov 20, 1PM-4PM, Room:3101
Question uploaded EndSemQuestion
- Venue: 2001
- Timing : Monday (4PM-5PM), Tuesday (3PM-4PM) and Wednesday (2PM-3PM)
- Rules :
- 75% attendance mandatory
- 30% Lecture note scribing (in Latex+Xfig) + 30% mid sem exam + 40% end sem exam
Text:
- Patterson, D.A., and Hennessy, J.L. , “Computer Architecture : A Quantitative Approach ”, Morgan Kaufmann Publishers, 5th Edition, Inc.2011
- Dezso Sima, Peter Kacsuk, Terence Fountain, " Advanced Computer Architectures : A Design Space Approach", Pearson Education India, 1997
- Michael J Flynn, " Computer Architecture: Pipelined and Parallel Processor Design ", Narosa Publishing India, 2003
References:
- David Culler, J.P. Singh and Anoop Gupta, "Parallel Computer Architecture: A Hardware/Software Approach", Morgan Kaufmann, first edition, 1998.
- Harvey G Cragon, " Memory Systems and Pipelined Processors", Narosa Book Distributors, India, 1998
- Patterson, D.A., and Hennessy, J.L. , “Computer Organization and Design: The Hardware/Software Interface”, Morgan Kaufmann Publishers, 4th Edition, Inc.2005,
- Kai Hwang, " Advanced Computer Architecture: Parallelism, Scalability, Programmability", McGraw-Hill, first edition, 1992.
- Ramachandran Vaidtyanathan and J L Trahan, " Dynamic Reconfiguration: Architectures and Algorithms ", Kluwer Academic Publisher, New York, 2003
- David Kirk and Wen-mei Hwu " Programming Massively Parallel Processors: A Hands-on Approach", Morgan Kaufmann Publishers, 2010 EBookCopy From NVidia Website
- P Pacheco " An Introduction to Parallel Programming", Morgan Kaufmann Publishers, 2011
- J Reinders " Intel Threading Building Blocks ", O'Reily, SPD Books India, 2007
- Behrooz Parhami " Introduction to Parallel Processing: Algorithms and Architectures ECopyPDF From External Weblink", Plenum Press, New York, 1999
- Duoto J, Yalamanchili S, Ni L " Interconnection Network ", Morgan Kauffman, 2002
- Peter Bruker"Scheduling Algorithm", Springer, 2007 OnlineCopy