Back to Blog
Education

Why Every AI Engineer Needs to Understand Hardware

Sep 22, 20258 min read

Every semester, I start my High-Performance Machine Learning course at Columbia with a provocation: "Your model is only as good as the hardware it runs on." Most students arrive thinking AI is purely a software problem. By the end of the course, they understand that the most impactful breakthroughs in AI will come from engineers who can think across the full stack — from silicon to software.

The Growing Hardware Literacy Gap

The AI community has an asymmetry problem. We produce thousands of papers on new architectures, training techniques, and fine-tuning methods. But far fewer researchers understand why a particular GPU kernel is 10x faster than another, or how memory bandwidth constrains transformer inference, or why quantization works differently on different hardware targets.

This gap isn't academic — it's practical. When companies try to deploy LLMs at scale, the bottleneck is rarely the model itself. It's the serving infrastructure, the memory management, the communication overhead in distributed systems. These are fundamentally hardware-software co-design problems.

What I Teach at Columbia

In COMS E6998: High-Performance Machine Learning, we start with the basics — PyTorch profiling, CUDA programming, memory hierarchies — and build up to the frontier: FlashAttention, vLLM's PagedAttention, speculative decoding, and LoRA/QLoRA fine-tuning.

The key pedagogical principle is measure everything. Students don't just implement distributed training — they profile it, identify bottlenecks, and optimize. They don't just read about quantization — they quantize a model, measure the accuracy-latency tradeoff on real hardware, and reason about when INT4 is sufficient and when it isn't.

From CUDA to Analog: The Full Spectrum

What makes this moment in AI hardware so exciting is the breadth of the design space. We're not just optimizing GPUs anymore. At IBM Research, my team works on analog in-memory computing — a fundamentally different computing paradigm where matrix multiplications happen in the physics of the device itself, rather than through digital logic.

Students who understand both digital and analog paradigms, who can reason about the tradeoffs between precision, energy, and throughput across different hardware targets, will be the ones who define the next era of AI.

The Co-Design Mindset

The most important thing I try to instill isn't any specific technical skill — it's a mindset. The co-design mindset means asking: "How does my algorithmic choice interact with the hardware it will run on?" It means understanding that a pruned model isn't actually faster unless the hardware can exploit sparsity. It means knowing that batch size isn't just a hyperparameter — it's a hardware utilization decision.

This is the thinking that will separate the next generation of AI leaders from the crowd. And it's why every AI engineer needs to understand hardware.

Dr. Kaoutar El Maghraoui

Dr. Kaoutar El Maghraoui

Principal Research Scientist at IBM Research · Adjunct Professor at Columbia University