Applied AI ConfConf day
agenda
main15:0015:20

Prompt Learning: Distilling Expensive Reasoning Into Fast Production Prompts

// ABOUT THIS SESSION

The gap between heavy and light models is a resource. A capable model can evaluate output that a fast model produces, and that feedback loop, run offline, can deposit the heavy model's judgment directly into a prompt — no fine-tuning, no labeled datasets. Oğuz walks through Peec AI's offline optimization loop: a small model generates, parallel reward models evaluate (each returning a score plus a natural-language failure explanation), and a heavy model rewrites the system prompt. Everything runs offline, so there's zero latency overhead in production. One surprising finding: high reward scores don't always predict useful outputs — the optimizer once converged on highly specific outputs that scored well but missed the common patterns clients cared about, and the fix was a reward signal that penalized excessive specificity. Three takeaways: reward design is the real engineering work; heavy models earn their cost as offline evaluators; and natural-language failure explanations beat aggregate scores alone.

// SPEAKER