
Jacek Golebiowski
Co-Founder & CTO · distil labs
Custom small language models that replace expensive LLM API calls at 1/100th the cost
AI Infrastructure// ABOUT THIS SESSION
Most production AI workloads are structured, repeatable tasks: classification, function calling, extraction. You don't need a trillion-parameter model for those. A fine-tuned small language model can match the accuracy at 1/100th the cost, run on your own infrastructure, and keep customer data private. Knowunity — serving 23M students — replaced their Gemini pipeline with a fine-tuned SLM: accuracy moved from 81% to 93% and costs dropped 50%, starting from 50 labeled examples and a few hours of training. This talk covers how to spot the LLM calls in your stack worth replacing, the workflow to get there using your existing production traces, and where you should keep using frontier APIs instead.
// SPEAKER