
Lucia Loher
Product Manager · Google DeepMind
// ABOUT THIS SESSION
Modern AI applications aren't just about model quality — they're about running at scale without burning your budget. This talk shares the systems-level thinking and product decisions behind the Gemini API's inference optimization stack: context caching, asynchronous batch processing, and flexible inference tiers. These aren't just features; they're a strategy to make large-scale, production-grade AI economically viable.
// SPEAKERS · 2