This course aims to familiarize participants with the complete development workflow of end-to-end, software ultrasound systems running on GPUs. Participants will be guided through several realistic use-cases that demonstrate full data flow-from raw channel data to reconstructed outputs-and the corresponding GPU implementations.
This course will explore advanced aspects of GPU programming, including:
Optimal utilization of GPU cores and memory hierarchies
Dynamic parallelism and kernel orchestration
Use of CUDA-optimized libraries, such as
cuBLAS/cuSPARSE for sparse matric operations
cuFFT for frequency-domain processing
TensorRT/cuDNN for machine-learning-based ultrasound pipelines
Performance modeling, occupancy analysis, and memory-bandwidth optimization
Mixed-precision computing and considerations for signal-processing accuracy