Applied AI Engineering Community

Practical discussions for teams building with AI.

A focused board for model evaluation, retrieval systems, agent reliability, inference operations, and field notes from production AI systems.

18,420members
42,108posts archived
314active this week
27open study groups

Active Threads

Latest Unanswered Deep Dives

How are you weighting human review against synthetic evals?

Looking for production examples where qualitative review and automated scoring disagree, especially on multi-turn support agents.

evaluation started by Mira Chen 12 min ago
38replies
1.9kviews

Incident notes: vector store recall dropped after source cleanup

A postmortem on chunk boundaries, metadata filters, and why the staging replay set missed the issue.

retrieval started by Owen Park 34 min ago
21replies
812views

Reliable tool-use loops without hiding intermediate state

Patterns for exposing plans, intermediate artifacts, and audit trails while keeping the user interface calm.

agents started by Sofia Alvarez 1 hr ago
47replies
2.4kviews

Benchmarking local inference on bursty workloads

Comparing queue depth, warm pools, cache pressure, and batch sizing across common deployment setups.

inference started by Daniel Ito 3 hrs ago
16replies
640views

May reading group: scalable oversight and eval governance

Shared notes, paper links, and discussion prompts for this month's applied AI safety reading group.

governance started by Priya Raman yesterday
29replies
1.1kviews