side14:20–14:50

Batch AI Pipelines: How to Go Fast Without Losing Work or Money

// ABOUT THIS SESSION

Running batch AI workloads — inference, fine-tuning, data preprocessing — on demand sounds simple. In practice, the moment you mix CPU and GPU tasks, add parallelism, and introduce preemptible instances to cut costs, pipelines break in expensive ways. Marouane and Mikhail share architecture patterns and hard lessons from running mixed serverless batch pipelines in production: how to structure jobs across CPU and GPU, when preemptibles are safe and when they're not, and how to build pipelines that recover gracefully when instances are killed mid-run. The surprises: cold start was never the bottleneck — slow image pulls and data loading were. Three takeaways: put CPU and short jobs on preemptibles and checkpoint everything else; make every job resumable by design so a restart costs nothing; and skip the orchestrator until you have 10+ pipeline steps.

// SPEAKERS · 2

Marouane Khoukh

Developer Advocate · Nebius

Serverless AI infrastructure for batch and inference workloads

AI Infrastructure

Mikhail Rozhkov

Technical Product Manager · Nebius

Serverless AI infrastructure for batch and inference workloads

AI Infrastructure