Haibin Lai
12211612@mail.sustech.edu.cn

After this step, you can connect to supercomputer and start your journey!
Missing Semester of CS Education:
Strongly recommend you to learn all the videos: 超算习堂
Programming Basics:
After this step, you will know how programs run on our computer!
Digital logic:
Computer Organization:
Suggested: Computer Organization and Design by Patterson & Hennessy (book or online lectures).
Parallel Computing Fundamentals:
Suggested: Introduction to Parallel Computing (online, University of Illinois via Coursera).
Computer Systems:
Database Principles:
(CMU 15-445/645 & 15-721): CMU
Key topics: Relational models, storage architectures (heaps, log-structured), indexing (B+ trees, hash tables), transaction processing (ACID, concurrency control), recovery (logging, checkpoints), and parallel/distributed query processing.
Machine Learning and Deep Learning Fundamentals:
Understand ML and DL basics, focusing on models and architectures relevant to HPC, where parallel computing (e.g., GPU clusters) accelerates training and inference.
Basic ML Models:
Key topics: Supervised learning (regression, classification), unsupervised learning (clustering), model evaluation (accuracy, precision, recall), and optimization (gradient descent).
Deep Learning Papers:
AlexNet (2012): Introduces deep CNNs with ReLU, dropout, and GPU acceleration for image classification. Paper.
VGG (2014): Deepens CNNs with small (3x3) filters for improved performance. Paper.
ResNet (2015): Introduces residual connections to train very deep networks, reducing vanishing gradients. Paper.
Transformer (2017): Proposes attention-based architecture for NLP, scalable for HPC with parallel processing. Attention Is All You Need.
Numerical Computing:
Grid Methods (Mesh-Based Techniques):
[Optional] Ordinary and Partial Differential Equations (ODE/PDE):
[Optional] Advanced Algorithms (MIT 18.337/18.338):
HPC Middleware and Frameworks:
Distributed Systems (MIT 6.824/6.5840):
big data idea: GFS ; big table ; map reduce
Key topics: MapReduce, Raft consensus algorithm, key-value stores, sharding, and distributed file systems (e.g., GFS, HDFS).
Suggested: MIT 6.824: Distributed Systems (course website, includes lectures, labs, and papers) or YouTube Lectures. Labs involve implementing MapReduce, Raft, and a fault-tolerant key-value store in Go.
[Optional] Advanced Parallel Algorithms:
bpftrace and BCC, and monitoring system calls and network performance.PyTorch for Distributed Training:
torch.distributed), model parallelism, pipeline parallelism, and integration with HPC frameworks like MPI and NCCL.DistributedDataParallel and RPC) or Distributed Machine Learning with PyTorch (PyTorch Lightning tutorial on YouTube). Practice with Horovod for PyTorch-based distributed training on clusters.NCCL (NVIDIA Collective Communications Library):
Large-Scale GPU Training:
RDMA (Remote Direct Memory Access):
Advanced GPU Computing:
bank conflict, warp divergence
Key topics:
Common Methods: Parallel thread execution, kernel launches, memory coalescing, and stream processing.
Optimization Tricks: Minimizing warp divergence (avoiding branch divergence within warps), resolving bank conflicts in shared memory, optimizing memory access patterns, and using asynchronous data transfers.
Principles: GPU architecture (SMs, warps, threads), memory hierarchy (global, shared, register), and latency hiding.