Back to Blog
Education

From Scaling Laws to Agentic AI: What I Teach Columbia Students About the Future of LLMs

Aug 15, 202510 min read

When I designed the seminar "Scaling LLMs: Systems, Optimization, and Emerging Paradigms" for Columbia, I wanted to create something different from a typical graduate course. Not a lecture series where students passively absorb information, but a research apprenticeship where they learn to read, critique, and extend the frontier of AI systems research.

The Arc of the Course

The seminar follows a deliberate progression that mirrors the evolution of the field itself.

We begin with foundations — scaling laws, distributed training systems like Megatron-LM and ZeRO++, and the architectural innovations (attention mechanisms, KV caching, state-space models like Mamba and Bamba) that make modern LLMs possible. Students need this vocabulary before they can engage with the frontier.

Then we move to inference scaling — the engineering challenge of actually serving these models. This is where theory meets production: paged attention, throughput-latency tradeoffs, memory hierarchies, and the design of serving systems like vLLM and TGI. Students are often surprised by how much of the "AI problem" is really a systems problem.

The third module is where things get genuinely exciting: Agentic AI and multimodal models. Here, LLMs stop being prediction engines and become agents that can plan, use tools, retrieve information, and reason across modalities. This is the frontier, and the papers are often only months old.

Finally, we look at hardware futures — analog accelerators, neuromorphic systems like IBM's NorthPole, TPU architectures, and storage-offload approaches. This module connects back to my research at IBM, where we're building the physical infrastructure for the next generation of AI.

The Paper Critique Method

Every week, students present and critique papers from top venues — NeurIPS, ICML, ICLR, ISCA, ACL. But this isn't a book report. I teach them to ask: What is the key insight? What are the hidden assumptions? What would break if you changed the hardware target? What experiment is missing?

This adversarial reading skill is, I believe, the most valuable thing a graduate student can develop. It's the difference between consuming research and contributing to it.

The Survey Paper Project

The capstone is a group survey paper with experimental evaluation. Students pick a topic from the course, synthesize the literature, identify gaps, and run experiments. The best projects earn bonus credit and, more importantly, produce work of publishable quality.

Several student projects have gone on to become workshop papers or contributed to larger research efforts. This is the goal — not just teaching students about the field, but making them contributors to it.

Why This Matters Now

We're at an inflection point. The easy gains from scaling — just make the model bigger, train on more data — are hitting diminishing returns. The next wave of progress will come from systems-level innovation: better inference, smarter hardware, more efficient architectures, and the emergence of agentic capabilities.

The students in this seminar are learning to think about all of these dimensions simultaneously. That's what the field needs.

Dr. Kaoutar El Maghraoui

Dr. Kaoutar El Maghraoui

Principal Research Scientist at IBM Research · Adjunct Professor at Columbia University