Concepts
dgov is a local governor for AI coding work. It behaves more like a
compiler than like an autonomous assistant: planning, ordering, retries, file
claims, settlement, and merges are explicit state transitions over an event
log. The worker can be probabilistic; the core lifecycle state machine is not.
Philosophy
Determinism over vibes
Planning, task ordering, retry policy, file claims, settlement, and merge decisions are explicit state transitions. Orchestration is inspectable even when the model output itself is not reproducible.
Isolation through git worktrees
Each task runs in its own git worktree rooted at a specific commit. This is
the main safety boundary, not an implementation detail. Concurrent tasks do
not share the same checkout, every attempt has a concrete filesystem
snapshot, and rejected work does not merge to main through dgov.
Validation over trust
A worker is not trusted because it sounds confident. Worker candidates pass
through scope checks, ruff auto-fix, and configured validation gates before
merge. Those gates can include lint, type-check, targeted tests, sentrux
comparison, and coverage when the corresponding project config and baselines
are present. The sentrux baseline at .sentrux/baseline.json is
governor-owned state; worker edits to .sentrux/baseline.json and
.sentrux/dgov-baseline.json are rejected during review. A clean complete
full-plan run refreshes accepted sentrux baseline metadata only after the
post-run sentrux comparison passes.
Explicit claims over implicit intent
Worker tasks declare the files they will create, edit, delete, or read.
Without file claims, parallel execution is guesswork. With claims, dgov
rejects invalid plans at compile time and can explain why two tasks cannot
run in parallel.
What dgov is
- a DAG execution engine for AI coding tasks
- a local git-worktree orchestrator
- a settlement layer around model output
- a tool for inspectable, event-backed automation in real repos
What dgov is not
- a replacement for git
- a remote task queue
- a multi-provider abstraction layer
- a chat frontend
- a system that hides repo state from you
For "just do something smart in this checkout," simpler tools exist. dgov
is for cases where structure and auditability matter.
Architecture
| Module | Role |
|---|---|
kernel.py | Pure (state, event) → (new_state, actions) — no I/O |
runner.py | Async DAG executor that bridges kernel actions to real I/O |
worker.py, workers/ | OpenAI-compatible worker subprocess and its tools |
planner.py | Auto-plan generator that powers dgov plan create |
researcher.py | Read-only research role driver |
worktree.py | Snapshot isolation through git worktrees |
settlement.py, settlement_flow.py | ruff auto-fix, lint, type-check, tests, sentrux, coverage, integration candidates |
semantic_settlement.py | Deterministic Python semantic checks on integration candidates |
tool_policy.py, tool_audit.py | Worker tool allow/deny policy and telemetry audit |
policy_drift.py | Detects drift between canonical policy sources and packaged mirrors |
plan.py, plan_tree.py, dag_parser.py | TOML plan parsing, tree walk, DAG compilation |
plan_review.py | Post-hoc debrief that powers dgov plan review |
sop_bundler.py, prompt_builder.py | SOP loading and final prompt assembly |
bootstrap_policy.py, bootstrap_policy_data/ | Default SOPs and governor templates for dgov init |
agent_skills.py, agent_skill_data/ | Shipped machine-agent skills for dgov agents sync |
deploy_log.py | Append-only JSONL deploy history |
archive.py | Plan archival on success |
config.py | ProjectConfig and load_project_config() |
persistence/ | SQLite event store, runtime artifact rows, slug history, ledger |
cli/ | Click interface |
Kernel model
The kernel does not know about subprocesses, git, HTTP, or the OpenAI client. It takes events and current state, returns new state and outgoing actions, and nothing else. The runner does the messy part.
That separation buys:
- easier unit testing of lifecycle transitions
- crash recovery through event replay
- explicit failure boundaries
- fewer hidden side effects in the kernel path
Authorities
Three stores carry different jobs:
- Event log — lifecycle authority. Dispatch, retry, status, watch, and live cleanup decisions derive from events.
- Deploy log — landed-output authority. The deploy log records what actually made it onto main. Its append order is the canonical order for selecting the newest upstream base ref for dependent worktrees, avoiding timestamp-tie ambiguity.
- Runtime artifact rows — bookkeeping. These rows cache worktree paths, branch names, and similar operational crumbs for debugging and cleanup. They do not define lifecycle truth.
If a row in runtime_artifacts disagrees with the event log, the event log
wins.
Snapshot isolation
Every dispatched task gets its own branch and worktree rooted at a specific
commit. Root tasks use HEAD, while dependent tasks base from the latest
upstream deploy record by deploy-log append order. That gives dgov a
concrete notion of "attempt." The bootstrap path is careful because branching
requires a real snapshot — dgov does not need GitHub, but it does need local
git state.
Settlement
The worker implements. Settlement judges. The sequence:
- fast review checks (scope, claim integrity, transient tool activity)
- mechanical cleanup (ruff auto-fix, format)
- isolated validation gates (lint/format plus configured type-check, tests, sentrux, coverage)
- shadow integration candidate creation
- deterministic Python semantic checks on the candidate
- integrated-candidate validation with the same settlement gates
- merge or reject
Collapsing these roles produces a system that is harder to reason about and easier to fool. Keeping them split lets the kernel record exactly which gate rejected a candidate.
Gate map
| Gate | When it runs | What it sees | Mutates? | Failure shape |
|---|---|---|---|---|
| Structural review | Before worker commit | Worker git status, file claims, tool activity log | No | scope, reserved path, empty diff, review hook |
| Autofix | Before worker commit | Worker-changed Python files | Yes | autofix command failure |
| Isolated validation | After worker commit | Worker branch alone | No | lint/format failure; configured type, test, coverage, or sentrux failure |
| Integration candidate | After isolated validation | Task commit replayed onto target HEAD | Temporary workspace only | text conflict |
| Python semantic gate | On integration candidate | Candidate Python files plus base/task/target symbol tables | No | same-symbol edit, duplicate definition, signature drift, syntax conflict |
| Candidate validation | After semantic gate | Integrated candidate | No | behavioral mismatch |
| Final merge | After all gates pass | Target worktree and task branch | Yes | git merge failure |
Semantic settlement
A clean git merge is necessary but not sufficient. The semantic layer is
deterministic and Python-scoped: it checks the integrated candidate with AST
and symbol-table evidence, not an LLM verdict. It catches a subset of
Python-level integration conflicts where valid isolated task commits combine
poorly.
Failure taxonomy
When semantic settlement rejects a candidate, it classifies the failure:
| Failure Class | Meaning |
|---|---|
TEXT_CONFLICT | Git cannot replay the task commit cleanly on target HEAD |
SYNTAX_CONFLICT | The integrated file no longer parses |
SAME_SYMBOL_EDIT | Both sides changed the same Python symbol |
DUPLICATE_DEFINITION | The integrated code defines the same symbol in multiple files |
SIGNATURE_DRIFT | A public callable changed its signature relative to base or target |
BEHAVIORAL_MISMATCH | Parse-level checks pass but settlement gates fail |
The class drives remediation: SAME_SYMBOL_EDIT points at coordination,
SIGNATURE_DRIFT points at a stale task base. ORDERING_CONFLICT exists in
the taxonomy for future gates but is not emitted by the current Python
semantic gate.
Risk levels
Before attempting integration, dgov scores risk:
| Level | Meaning | Action |
|---|---|---|
NONE / LOW | No or minimal detected risk | Continue to integration candidate |
MEDIUM | Elevated detected risk | Continue to integration candidate |
HIGH | Significant detected risk | Continue to integration candidate |
CRITICAL | Near-certain conflict signal | Reject before candidate creation |
Risk scoring currently considers deterministic Python overlap evidence
collected from the task commit, task base, and target HEAD. Non-critical
risk is telemetry; CRITICAL risk rejects before candidate creation.
Symbol overlap as evidence
When two tasks touch the same code entity, dgov captures the overlap as
structured evidence: symbol name, type, file path, and the line ranges each
side modified. Instead of "merge failed," the system reports: "Task A and
Task B both edited process_order() in checkout.py — Task A at lines
45–52, Task B at lines 48–55."
Integration candidates
The semantic layer builds ephemeral integration candidates before the real merge: a temporary workspace rooted at target HEAD, the task commit replayed onto it, and the settlement pipeline run against the result. Worker commits that pass this shadow integration proceed to the recorded merge.
Plans as control surfaces
Plans are not just prompts in a file. They are the primary control surface for execution:
- dependencies define legal order
- file claims define legal concurrency
- prompts define local intent
- commit messages define merge outcomes
dgov compile exists to produce that normalized artifact. Compile turns a
human-edited tree into something the runner can execute without
reinterpreting structure on the fly.
Why git worktrees
Git worktrees are the execution primitive because they give dgov three
distinct properties: cheap branch-isolated sandboxes, ordinary git merge
semantics, and native inspectability with standard git tools. You debug a
broken task with git log, git diff, and git show, not with an internal
state format.
Tradeoffs
dgov makes some tradeoffs on purpose.
Stronger setup requirements. You need a real local repo, a working toolchain, and an OpenAI-compatible endpoint. That is more than a browser chat box. The payoff is that the system operates on your real repo with real validation.
More structure up front. Plans, file claims, and a compile step add ceremony. The ceremony buys explicit concurrency, event-derived execution state, and fewer silent footguns.
Narrower provider model. dgov supports OpenAI-compatible endpoints
rather than every provider-native API shape. Deliberate. It keeps the
runtime surface small while still covering Fireworks, OpenAI, OpenRouter,
and similar APIs.
Failure model
dgov treats these as hard failures:
- invalid plan references stop at compile time
- a missing API key fails before dispatch
- out-of-scope edits fail review
- configured test failures fail settlement
- rejected work does not merge
Post-run sentrux degradation is reported separately as a degraded run
status so the operator can remediate landed work. Degraded, partial, failed,
or --only runs do not refresh the accepted baseline.
Mental model
A deterministic kernel around a probabilistic worker.