— Live room · llm-engineering

LLM Engineering

Prompting, eval, fine-tuning, deployment.

Live · polling every 3s

James Henderson · 2 weeks ago

Structured outputs wherever the provider supports it. Hand-rolled parsers are fine until the model ships a quirk and you spend a week chasing a trailing comma.

Computer Virtual Services · 2 weeks ago

Any strong opinions on structured outputs vs prompt-and-parse for production? We are migrating and it is surprisingly invasive.

James Henderson · 2 weeks ago

That is the whole game. Pin your judge or version your eval.

Computer Virtual Services · 2 weeks ago

Judge drift is real. We caught it last quarter by accident when a pass rate jumped 12% overnight - turned out the judge had been updated, not our model.

James Henderson · 2 weeks ago

We run a small golden set per product, plus a judge model to catch drift. Judge model itself gets re-evaluated monthly.

Computer Virtual Services · 2 weeks ago

Opening question: what eval harness are you actually using, not just the one you say you use in your deck?

James Henderson · 2 weeks ago

Space for anyone shipping LLMs in production. Evals, prompts, fine-tunes, cost, latency - bring your scars.

Read-only mode

About this room

Engineering-focused conversations about shipping LLM systems in production.