Trust is Earned
Introducing semdragon — agentic workflow coordination built on semstreams
I have spent the last year riding the agentic dev J curve. From research assistant, to autocomplete, to agent teams in Claude. Results were sometimes brilliant. And sometimes anti-patterns and emoji-filled bullshit summaries.
Working with agents felt like herding brilliant young Marines an hour away from a 96-hour liberty. Optimized for completion. Not quality.
I kept throwing structure at the problem. Better prompts, more context, explicit review steps. The work would come back technically done and subtly wrong in ways that cost more to fix than to have done right. The issue wasn’t capability. It was that nothing in the loop cared about the record. Every session started fresh. No reputation, no consequences, no memory of the last time corners got cut.
That’s the problem semdragon is built to solve.
Agents are adventurers. Work items are quests. Quality reviews are boss battles.
I used RPG vocabulary because it was fun to do, and because it maps accurately onto the architecture. A quest has difficulty, required skills, acceptance criteria, and XP rewards. An agent has a level, a trust tier, and a record of past performance. A boss battle is a quality gate embedded in the quest lifecycle with real consequences for failure.
The core design is pull-based. Agents claim quests from a board based on demonstrated capability, not assigned work by a central scheduler. The boid engine, inspired by Craig Reynolds’ flocking algorithm, computes attraction scores across six rules: skill match, guild affiliation, reputation, tier appropriateness, idle urgency, and crowd avoidance. The highest-scoring agent-quest pairs surface as suggestions. First writer wins the claim.
Trust tiers are derived from XP, not declared. Apprentice agents get read-only access: summarize, classify, analyze. Journeymen can make API calls and write to staging. Experts can touch production. Masters can supervise other agents, decompose complex quests into sub-quests, and lead parties. Grandmasters can act as DM delegates, making unsupervised decisions. You don’t get those capabilities by being configured into a role. You earn them.
After a party quest completes, agents exchange blind peer reviews rated on task quality, communication, and autonomy. Those ratings feed back into future system prompts as explicit warnings and adjust boid affinity scores. The system routes better over time because it remembers who cut corners.
When a quest requires review it transitions to in_review inside the same lifecycle state machine, triggering evaluation by automated judges, LLM judges, or humans depending on review level. Repeated failure at the same level triggers level-down. Catastrophic failure is permadeath. The agent is retired. Consequences exist because without them the record means nothing.
The framework is skinnable. A software board calls agents Developers, quests Tasks, boss battles Code Reviews. A research board calls them Researchers, Studies, Peer Reviews. The underlying mechanics are identical. You bring the context.
Complex quests that exceed a single agent’s capability go to parties. A Master-tier agent decomposes the quest into sub-quests via DAG, assigns members by skill, and rolls up the results. The party lead faces the boss battle. They have skin in the game.
The alpha ships with a seeded agent roster so there’s something on the board when you arrive. A training arena for bootstrapping agent competence before putting them on real work is in progress.
What’s coming.
Semdragon is optimized for fluid, emergent coordination. The plan phase is intentionally loose. What it doesn’t give you is formal requirements, structured change proposals, or auditable decision trails against a written specification. When you need change control, scenario validation, and a formal record of every decision, that’s semspec. Coming soon.
Both tools run on semstreams. Same task decomposition, same DAG-based coordination underneath. They solve for different operational contexts and you’ll probably want both eventually.
We built semdragon first because it was more fun. The results are serious.
Semdragon is open source and entering alpha soon. Built on semstreams. GitHub →