side15:45–16:05

From Caching to Batching to Flex — How to optimize AI system for production

// ABOUT THIS SESSION

Modern AI applications aren't just about model quality — they're about running at scale without burning your budget. This talk shares the systems-level thinking and product decisions behind the Gemini API's inference optimization stack: context caching, asynchronous batch processing, and flexible inference tiers. These aren't just features; they're a strategy to make large-scale, production-grade AI economically viable.

// SPEAKERS · 2

Lucia Loher

Product Manager · Google DeepMind

Patrick Löber

Member of Technical Staff · Google DeepMind