Python developers building machine learning models often interact with GPUs “indirectly” through frameworks like PyTorch, TensorFlow, and JAX. Because these frameworks abstract away low-level behavior, it’s easy to assume that GPU execution works like CPU execution—deterministic, sequential, and predictable. In practice, the GPU is running an entirely different execution model, with its own scheduling rules, memory hierarchies, precision trade-offs, and parallelism strategies. These differences can lead to surprising behaviors: sudden slowdowns, unexpected memory spikes, kernel launch bottlenecks, and accuracy drift when switching hardware.
This talk offers a practical, developer-friendly introduction to the hardware concepts that directly impact Python-based AI workloads. You don’t need background in computer architecture; instead, the session focuses on the small set of hardware ideas that make the biggest difference when building or deploying machine learning models.
We will walk through how GPU memory is structured, why kernel launches behave differently from Python function calls, how floating-point math on GPUs differs from CPUs, and why the same Python code behaves differently across GPU generations or SDK versions. Along the way, we’ll connect these concepts to real examples in PyTorch and TensorFlow, demonstrating how awareness of hardware behavior can improve performance, stability, and debugging outcomes.
The aim of this session is not to turn Python developers into hardware experts, but to give them the mental models they need to reason about GPU behavior in a practical, Python-centered way.