SmartDuke Technologies
All topics
Topic·02 essays

Evals.

How we design eval suites that catch real regressions before users do — unit evals on tool calls, frozen LLM-as-judge regression sets, and continuous production sampling.