side14:55–15:25

Inference Without the Wait: A Live Demo of Instant-On Model Deployment

// ABOUT THIS SESSION

Deploying AI models for inference has traditionally meant long setup times, Docker complexity, and cold start delays that slow down production. In this live demo, we will show how Runpod Flash changes that entirely. Flash lets you write a Python function, decorate it, and run it on serverless GPU infrastructure instantly, with no Docker required. We will walk through a real deployment from a simple Python script to a live inference endpoint, showcasing how Flash handles dependency management, dramatically reduces cold starts, and scales automatically with demand. Whether you are running LLMs, computer vision models, or custom pipelines, Flash gets you from code to production in minutes.

// SPEAKER

Emmett Fear

Director of Demand Gen · Runpod

GPU cloud for AI workloads

AI Infrastructure