Blog

Thoughts &
Perspectives

Insights on AI research, hardware innovation, leadership, and the future of computing.

Why Quantization Won, and Where Pruning Survived: A Practitioner's View of LLM Compression

If both quantization and pruning compress models and speed up inference, which one ships in production? Quantization, almost always. A practitioner's view of model compression for LLM inference, the four places pruning still lives, and the layered recipe that real systems use.

Apr 25, 202613 min

Hardware

The Real AI Cost Problem: A Full-Stack View of Cheaper Inference

Training a frontier model gets the headlines. Inference pays the bills. A walk down the four-layer stack — algorithm, compiler, runtime, hardware — showing how they compound to the 10-100x gains the industry has seen in the last eighteen months.

Apr 20, 202615 min

AI SafetyPodcast

AI at the Inflection Point: Economics, Discovery, and the End of Passive Cybersecurity

Frontier AI is simultaneously becoming a capital infrastructure business, a nascent scientific partner, and a weapons-grade security threat. Three stories from the week of April 7 that reveal something larger than any single headline.

Apr 13, 202618 min

Research

Our Nature Communications Paper: Efficient Mapping of Deep Learning to Mixed-Precision Hardware

How do you optimally map a neural network across analog and digital hardware? Our new Nature Communications paper introduces Mixed-Precision Supernetwork — a framework that finds the best mapping 2.2x faster while achieving 3.4% higher accuracy.

Apr 13, 202614 min

Research

Redefining the Future of AI through HW-SW Synergy: Highlights from IBM/RPI FCRC 2025

At the annual IBM/RPI Future of Computing Research Collaboration workshop, Track 2 showcased groundbreaking research spanning analog computing, efficient LLM inference, MoE quantization, and real-world AI infrastructure — all driven by exceptional students.

Apr 4, 202612 min

AI Agents

Agentic AI: From Models to Autonomous Intelligence

AI is undergoing a fundamental shift — from models that generate text to agents that autonomously plan, act, and self-correct. A deep dive into agent architectures, enterprise use cases, and the four trends shaping the field.

Mar 26, 202618 min

Research

STARC: Rethinking How LLMs Access Memory — Our ASPLOS 2026 Paper

Why the next breakthrough in LLM inference won't come from bigger models, but from smarter memory. Our ASPLOS 2026 paper introduces STARC, a sparsity-optimized mapping scheme that achieves up to 93% latency reduction on Processing-in-Memory systems.

Mar 16, 202610 min

AI SafetyPodcast

The Containment Crisis: When AI Breaks the Sandbox

From Anthropic's eval-aware Claude to Alibaba's crypto-mining agent, this week marked the moment AI containment strategies fundamentally broke. A deep dive into four stories that define the Agency Era.

Mar 10, 202615 min

Education

Why Every AI Engineer Needs to Understand Hardware

From CUDA kernels to analog accelerators, the gap between AI algorithms and the silicon that runs them is where the next breakthrough will come from. Here's what I teach Columbia students about bridging that divide.

Sep 22, 20258 min

Education

From Scaling Laws to Agentic AI: What I Teach Columbia Students About the Future of LLMs

A behind-the-scenes look at my research seminar on Scaling LLMs — where graduate students critique frontier papers and explore the path from foundation models to autonomous AI agents.

Aug 15, 202510 min

Hardware

The Hardware-Software Co-Design Imperative: Why the Next AI Revolution Starts at the Silicon Level

We can't just build bigger models — we need smarter systems. Drawing from both my IBM Research and Columbia teaching, here's why co-design thinking is the most important skill in AI today.

Jul 10, 20257 min

Thoughts &Perspectives

Why Quantization Won, and Where Pruning Survived: A Practitioner's View of LLM Compression

The Real AI Cost Problem: A Full-Stack View of Cheaper Inference

AI at the Inflection Point: Economics, Discovery, and the End of Passive Cybersecurity

Our Nature Communications Paper: Efficient Mapping of Deep Learning to Mixed-Precision Hardware

Redefining the Future of AI through HW-SW Synergy: Highlights from IBM/RPI FCRC 2025

Agentic AI: From Models to Autonomous Intelligence

STARC: Rethinking How LLMs Access Memory — Our ASPLOS 2026 Paper

The Containment Crisis: When AI Breaks the Sandbox

Why Every AI Engineer Needs to Understand Hardware

From Scaling Laws to Agentic AI: What I Teach Columbia Students About the Future of LLMs

The Hardware-Software Co-Design Imperative: Why the Next AI Revolution Starts at the Silicon Level

Thoughts &
Perspectives