Concepts

dgov is a local governor for AI coding work. It behaves more like a compiler than like an autonomous assistant: planning, ordering, retries, file claims, settlement, and merges are explicit state transitions over an event log. The worker can be probabilistic; the core lifecycle state machine is not.

Philosophy

Determinism over vibes

Planning, task ordering, retry policy, file claims, settlement, and merge decisions are explicit state transitions. Orchestration is inspectable even when the model output itself is not reproducible.

Isolation through git worktrees

Each task runs in its own git worktree rooted at a specific commit. This is the main safety boundary, not an implementation detail. Concurrent tasks do not share the same checkout, every attempt has a concrete filesystem snapshot, and rejected work does not merge to main through dgov.

Validation over trust

A worker is not trusted because it sounds confident. Worker candidates pass through scope checks, ruff auto-fix, and configured validation gates before merge. Those gates can include lint, type-check, targeted tests, sentrux comparison, and coverage when the corresponding project config and baselines are present. The sentrux baseline at .sentrux/baseline.json is governor-owned state; worker edits to .sentrux/baseline.json and .sentrux/dgov-baseline.json are rejected during review. A clean complete full-plan run refreshes accepted sentrux baseline metadata only after the post-run sentrux comparison passes.

Explicit claims over implicit intent

Worker tasks declare the files they will create, edit, delete, or read. Without file claims, parallel execution is guesswork. With claims, dgov rejects invalid plans at compile time and can explain why two tasks cannot run in parallel.

What dgov is

a DAG execution engine for AI coding tasks
a local git-worktree orchestrator
a settlement layer around model output
a tool for inspectable, event-backed automation in real repos

What dgov is not

a replacement for git
a remote task queue
a multi-provider abstraction layer
a chat frontend
a system that hides repo state from you

For "just do something smart in this checkout," simpler tools exist. dgov is for cases where structure and auditability matter.

Architecture

Module	Role
`kernel.py`	Pure `(state, event) → (new_state, actions)` — no I/O
`runner.py`	Async DAG executor that bridges kernel actions to real I/O
`worker.py`, `workers/`	OpenAI-compatible worker subprocess and its tools
`planner.py`	Auto-plan generator that powers `dgov plan create`
`researcher.py`	Read-only research role driver
`worktree.py`	Snapshot isolation through git worktrees
`settlement.py`, `settlement_flow.py`	ruff auto-fix, lint, type-check, tests, sentrux, coverage, integration candidates
`semantic_settlement.py`	Deterministic Python semantic checks on integration candidates
`tool_policy.py`, `tool_audit.py`	Worker tool allow/deny policy and telemetry audit
`policy_drift.py`	Detects drift between canonical policy sources and packaged mirrors
`plan.py`, `plan_tree.py`, `dag_parser.py`	TOML plan parsing, tree walk, DAG compilation
`plan_review.py`	Post-hoc debrief that powers `dgov plan review`
`sop_bundler.py`, `prompt_builder.py`	SOP loading and final prompt assembly
`bootstrap_policy.py`, `bootstrap_policy_data/`	Default SOPs and governor templates for `dgov init`
`agent_skills.py`, `agent_skill_data/`	Shipped machine-agent skills for `dgov agents sync`
`deploy_log.py`	Append-only JSONL deploy history
`archive.py`	Plan archival on success
`config.py`	ProjectConfig and `load_project_config()`
`persistence/`	SQLite event store, runtime artifact rows, slug history, ledger
`cli/`	Click interface

Kernel model

The kernel does not know about subprocesses, git, HTTP, or the OpenAI client. It takes events and current state, returns new state and outgoing actions, and nothing else. The runner does the messy part.

That separation buys:

easier unit testing of lifecycle transitions
crash recovery through event replay
explicit failure boundaries
fewer hidden side effects in the kernel path

Authorities

Three stores carry different jobs:

Event log — lifecycle authority. Dispatch, retry, status, watch, and live cleanup decisions derive from events.
Deploy log — landed-output authority. The deploy log records what actually made it onto main. Its append order is the canonical order for selecting the newest upstream base ref for dependent worktrees, avoiding timestamp-tie ambiguity.
Runtime artifact rows — bookkeeping. These rows cache worktree paths, branch names, and similar operational crumbs for debugging and cleanup. They do not define lifecycle truth.

If a row in runtime_artifacts disagrees with the event log, the event log wins.

Snapshot isolation

Every dispatched task gets its own branch and worktree rooted at a specific commit. Root tasks use HEAD, while dependent tasks base from the latest upstream deploy record by deploy-log append order. That gives dgov a concrete notion of "attempt." The bootstrap path is careful because branching requires a real snapshot — dgov does not need GitHub, but it does need local git state.

Settlement

The worker implements. Settlement judges. The sequence:

fast review checks (scope, claim integrity, transient tool activity)
mechanical cleanup (ruff auto-fix, format)
isolated validation gates (lint/format plus configured type-check, tests, sentrux, coverage)
shadow integration candidate creation
deterministic Python semantic checks on the candidate
integrated-candidate validation with the same settlement gates
merge or reject

Collapsing these roles produces a system that is harder to reason about and easier to fool. Keeping them split lets the kernel record exactly which gate rejected a candidate.

Gate map

Gate	When it runs	What it sees	Mutates?	Failure shape
Structural review	Before worker commit	Worker git status, file claims, tool activity log	No	scope, reserved path, empty diff, review hook
Autofix	Before worker commit	Worker-changed Python files	Yes	autofix command failure
Isolated validation	After worker commit	Worker branch alone	No	lint/format failure; configured type, test, coverage, or sentrux failure
Integration candidate	After isolated validation	Task commit replayed onto target `HEAD`	Temporary workspace only	text conflict
Python semantic gate	On integration candidate	Candidate Python files plus base/task/target symbol tables	No	same-symbol edit, duplicate definition, signature drift, syntax conflict
Candidate validation	After semantic gate	Integrated candidate	No	behavioral mismatch
Final merge	After all gates pass	Target worktree and task branch	Yes	git merge failure

Semantic settlement

A clean git merge is necessary but not sufficient. The semantic layer is deterministic and Python-scoped: it checks the integrated candidate with AST and symbol-table evidence, not an LLM verdict. It catches a subset of Python-level integration conflicts where valid isolated task commits combine poorly.

Failure taxonomy

When semantic settlement rejects a candidate, it classifies the failure:

Failure Class	Meaning
`TEXT_CONFLICT`	Git cannot replay the task commit cleanly on target HEAD
`SYNTAX_CONFLICT`	The integrated file no longer parses
`SAME_SYMBOL_EDIT`	Both sides changed the same Python symbol
`DUPLICATE_DEFINITION`	The integrated code defines the same symbol in multiple files
`SIGNATURE_DRIFT`	A public callable changed its signature relative to base or target
`BEHAVIORAL_MISMATCH`	Parse-level checks pass but settlement gates fail

The class drives remediation: SAME_SYMBOL_EDIT points at coordination, SIGNATURE_DRIFT points at a stale task base. ORDERING_CONFLICT exists in the taxonomy for future gates but is not emitted by the current Python semantic gate.

Risk levels

Before attempting integration, dgov scores risk:

Level	Meaning	Action
`NONE` / `LOW`	No or minimal detected risk	Continue to integration candidate
`MEDIUM`	Elevated detected risk	Continue to integration candidate
`HIGH`	Significant detected risk	Continue to integration candidate
`CRITICAL`	Near-certain conflict signal	Reject before candidate creation

Risk scoring currently considers deterministic Python overlap evidence collected from the task commit, task base, and target HEAD. Non-critical risk is telemetry; CRITICAL risk rejects before candidate creation.

Symbol overlap as evidence

When two tasks touch the same code entity, dgov captures the overlap as structured evidence: symbol name, type, file path, and the line ranges each side modified. Instead of "merge failed," the system reports: "Task A and Task B both edited process_order() in checkout.py — Task A at lines 45–52, Task B at lines 48–55."

Integration candidates

The semantic layer builds ephemeral integration candidates before the real merge: a temporary workspace rooted at target HEAD, the task commit replayed onto it, and the settlement pipeline run against the result. Worker commits that pass this shadow integration proceed to the recorded merge.

Plans as control surfaces

Plans are not just prompts in a file. They are the primary control surface for execution:

dependencies define legal order
file claims define legal concurrency
prompts define local intent
commit messages define merge outcomes

dgov compile exists to produce that normalized artifact. Compile turns a human-edited tree into something the runner can execute without reinterpreting structure on the fly.

Why git worktrees

Git worktrees are the execution primitive because they give dgov three distinct properties: cheap branch-isolated sandboxes, ordinary git merge semantics, and native inspectability with standard git tools. You debug a broken task with git log, git diff, and git show, not with an internal state format.

Tradeoffs

dgov makes some tradeoffs on purpose.

Stronger setup requirements. You need a real local repo, a working toolchain, and an OpenAI-compatible endpoint. That is more than a browser chat box. The payoff is that the system operates on your real repo with real validation.

More structure up front. Plans, file claims, and a compile step add ceremony. The ceremony buys explicit concurrency, event-derived execution state, and fewer silent footguns.

Narrower provider model. dgov supports OpenAI-compatible endpoints rather than every provider-native API shape. Deliberate. It keeps the runtime surface small while still covering Fireworks, OpenAI, OpenRouter, and similar APIs.

Failure model

dgov treats these as hard failures:

invalid plan references stop at compile time
a missing API key fails before dispatch
out-of-scope edits fail review
configured test failures fail settlement
rejected work does not merge

Post-run sentrux degradation is reported separately as a degraded run status so the operator can remediate landed work. Degraded, partial, failed, or --only runs do not refresh the accepted baseline.

Mental model

A deterministic kernel around a probabilistic worker.